Robust methods and model selection

Location	Scale	Properties
Mean	Standard deviation	Efficient at normal but not robust
Median	Interquartile Range	Robust but not efficient
Hodges-Lehmann estimator	\(P_n\)	Good robustness and efficiency properties

Mean

Standard deviation

Efficient at normal but not robust

Median

Interquartile Range

Robust but not efficient

Hodges-Lehmann estimator

\(P_n\)

Good robustness and efficiency properties

Variable	Description
age	Age
sex	Gender
bmi	Body mass index
map	Mean arterial pressure (average blood pressure)
tc	Total cholesterol (mg/dL)
ldl	Low-density lipoprotein ("bad" cholesterol)
hdl	High-density lipoprotein ("good" cholesterol)
tch	Blood serum measurement
ltg	Blood serum measurement
glu	Blood serum measurement (glucose?)
y	A quantitative measure of disease progression one year after baseline

age

Age

sex

Gender

bmi

Body mass index

map

Mean arterial pressure (average blood pressure)

Total cholesterol (mg/dL)

ldl

Low-density lipoprotein ("bad" cholesterol)

hdl

High-density lipoprotein ("good" cholesterol)

tch

Blood serum measurement

ltg

Blood serum measurement

glu

Blood serum measurement (glucose?)

A quantitative measure of disease progression one year after baseline

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41. doi:10.1093/biostatistics/kxm045.

Meinshausen, N, and P Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4): 417–73. doi:10.1111/j.1467-9868.2010.00740.x.

Murray, K, S Heritier, and S Müller. 2013. “Graphical Tools for Model Selection in Generalized Linear Models.” Statistics in Medicine 32 (25): 4438–51. doi:10.1002/sim.5855.

Müller, S, and AH Welsh. 2010. “On Model Selection Curves.” International Statistical Review 78 (2): 240–56. doi:10.1111/j.1751-5823.2010.00108.x.

Tarr, G, S Müller, and NC Weber. 2012. “A Robust Scale Estimator Based on Pairwise Means.” Journal of Nonparametric Statistics 24 (1): 187–99. doi:10.1080/10485252.2011.621424.

———. 2015. “Robust Estimation of Precision Matrices Under Cellwise Contamination.” Computational Statistics & Data Analysis to appear. doi:10.1016/j.csda.2015.02.005.

Tarr, G, S Müller, and AH Welsh. 2015. mplot: Graphical Model Stability and Model Selection Procedures. https://github.com/garthtarr/mplot.

Tarr, G, NC Weber, and S Müller. 2015. “The Difference of Symmetric Quantiles Under Long Range Dependence.” Statistics & Probability Letters 98: 144–50. doi:10.1016/j.spl.2014.12.022.

Outline

1. The past: robust statistics

2. The present: model selection

3. The future: protein data, meat science, joint modelling, data visualisation…

Robust statistics

The past:

PhD and postdoc at Sydney University

\(P_n\): our robust scale estimator

Why another scale estimator?

Why pairwise means?

Bounded influence function

Influence curve for \(P_{n}\)

Bounded influence function

Bounded influence function

Properties

Properties

Cellwise contamination

Cellwise contamination

Cellwise contamination

Cellwise contamination

Cellwise contamination

Financial example

What's the graphical lasso?

Why? Sparsity!

Financial example

Classical approach

Robust approach

Classical approach (extra contamination)

Robust approach (extra contamination)

Take home messages

Robust methods

Cellwise contamination

Model selection

The present:

Model selection

Some notation

A smörgåsbord of tuning parameters…

Information Criterion

Regularisation routines

A stability based approach

Aim

Method

Diabetes example

Variable inclusion plots

Aim

Procedure

References

Diabetes example – VIP

Model stability plots

Aim

Procedure

References

Artificial example – Model stability plot

Adaptive fence

Get it on Github

Main functions

Diabetes example

Take home messages

Concept of "model stability"

Still to do:

The future

Projects underway (or soon to be)

References