Visualising variable selection with mplot

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.10	0.33	-0.31	0.76
x1	0.64	0.69	0.92	0.36
x2	0.26	0.62	0.42	0.68
x3	-0.51	1.24	-0.41	0.68
x4	-0.30	0.25	-1.18	0.24
x5	0.36	0.60	0.59	0.56
x6	-0.54	0.96	-0.56	0.58
x7	-0.43	0.63	-0.68	0.50
x8	0.15	0.62	0.24	0.81
x9	0.40	0.64	0.63	0.53

Estimate

Std. Error

t value

Pr(>|t|)

(Intercept)

-0.10

0.33

-0.31

0.76

0.64

0.69

0.92

0.36

0.26

0.62

0.42

0.68

-0.51

1.24

-0.41

0.68

-0.30

0.25

-1.18

0.24

0.36

0.60

0.59

0.56

-0.54

0.96

-0.56

0.58

-0.43

0.63

-0.68

0.50

0.15

0.62

0.24

0.81

0.40

0.64

0.63

0.53

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.03	0.29	0.11	0.91
x8	0.55	0.05	10.43	0.00

Estimate

Std. Error

t value

Pr(>|t|)

(Intercept)

0.03

0.29

0.11

0.91

0.55

0.05

10.43

0.00

## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x8 
##  8 
## [1] "x8"
## [1] "x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x6 
##  6 
## [1] "x6" "x8"
## [1] "x6+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x1 
##  1 
## [1] "x1" "x6" "x8"
## [1] "x1+x6+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x4 
##  4 
## [1] "x1" "x4" "x6" "x8"
## [1] "x1+x4+x6+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x3 
##  3 
## [1] "x1" "x3" "x4" "x6" "x8"
## [1] "x1+x3+x4+x6+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x7 
##  7 
## [1] "x1" "x3" "x4" "x6" "x7" "x8"
## [1] "x1+x3+x4+x6+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x5 
##  5 
## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8"
## [1] "x1+x3+x4+x5+x6+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x2 
##  2 
## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8"
## [1] "x1+x2+x3+x4+x5+x6+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x5 
## -5 
## [1] "x1" "x2" "x3" "x4" "x6" "x7" "x8"
## [1] "x1+x2+x3+x4+x6+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x6 
## -6 
## [1] "x1" "x2" "x3" "x4" "x7" "x8"
## [1] "x1+x2+x3+x4+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x5 
##  5 
## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8"
## [1] "x1+x2+x3+x4+x5+x7+x8"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x9 
##  9 
## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8" "x9"
## [1] "x1+x2+x3+x4+x5+x7+x8+x9"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x6 
##  6 
## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9"
## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x2 
## -2 
## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8" "x9"
## [1] "x1+x3+x4+x5+x6+x7+x8+x9"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x2 
##  2 
## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9"
## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x3 
## -3 
## [1] "x1" "x2" "x4" "x5" "x6" "x7" "x8" "x9"
## [1] "x1+x2+x4+x5+x6+x7+x8+x9"
## 
## Call:
## lars(x = x, y = y)
## R-squared: 0.751 
## Sequence of LASSO moves:
##      x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3
## Var   8  6  1  4  3  7  5  2 -5 -6  5  9  6 -2  2 -3  3
## Step  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## x3 
##  3 
## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9"
## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9"

Variable	Description
age	Age
sex	Gender
bmi	Body mass index
map	Mean arterial pressure (average blood pressure)
tc	Total cholesterol (mg/dL)
ldl	Low-density lipoprotein ("bad" cholesterol)
hdl	High-density lipoprotein ("good" cholesterol)
tch	Blood serum measurement
ltg	Blood serum measurement
glu	Blood serum measurement (glucose?)
y	A quantitative measure of disease progression one year after baseline

age

Age

sex

Gender

bmi

Body mass index

map

Mean arterial pressure (average blood pressure)

Total cholesterol (mg/dL)

ldl

Low-density lipoprotein ("bad" cholesterol)

hdl

High-density lipoprotein ("good" cholesterol)

tch

Blood serum measurement

ltg

Blood serum measurement

glu

Blood serum measurement (glucose?)

A quantitative measure of disease progression one year after baseline

## R version 3.1.2 (2014-10-31) ## Platform: x86_64-apple-darwin13.4.0 (64-bit) ## ## locale: ## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] xtable_1.7-4 mplot_0.4.9 glmnet_1.9-8 Matrix_1.1-4 ## [5] mvoutlier_2.0.5 sgeostat_1.0-25 shiny_0.10.2.2 googleVis_0.5.7 ## [9] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 bestglm_0.34 ## [13] leaps_2.9 knitr_1.8 ## ## loaded via a namespace (and not attached): ## [1] cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 ## [4] DEoptimR_1.0-2 digest_0.6.8 evaluate_0.5.5 ## [7] formatR_1.0 GGally_0.5.0 ggplot2_1.0.0 ## [10] grid_3.1.2 gtable_0.1.2 htmltools_0.2.6 ## [13] httpuv_1.3.2 lattice_0.20-29 MASS_7.3-35 ## [16] mime_0.2 munsell_0.4.2 mvtnorm_1.0-2 ## [19] pcaPP_1.9-60 pls_2.4-3 plyr_1.8.1 ## [22] proto_0.3-10 R6_2.0.1 Rcpp_0.11.3 ## [25] reshape_0.8.5 reshape2_1.4.1 RJSONIO_1.3-0 ## [28] rmarkdown_0.3.10 robCompositions_1.9.0 robustbase_0.92-2 ## [31] rrcov_1.3-8 scales_0.2.4 stats4_3.1.2 ## [34] stringr_0.6.2 tools_3.1.2 yaml_2.1.13

Overview

A stability based approach

Aim

Method

Some notation

Artificial example

Artificial example

Artificial example – Stepwise

Tuning parameters show up frequently in model selection!

Information Criterion

Regularisation routines

Variable inclusion plots

Variable inclusion plots

Aim

Procedure

References

Artificial example – VIP

Model stability plots

Artificial example – Loss against size

Artificial example – Loss against size

Model stability plots

Aim

Procedure

References

Artificial example – Model stability plot

The adaptive fence

The fence

Notation

Main idea

Illustration

Illustration

Problem: how to choose \(c\)?

Solution: Bootstrap over a range of values of \(c\).

Procedure

What does this look like?

Our implementation

Core innovations

Optional innovations

Artificial example – adaptive fence

Speed (linear models; B=50; n.c=25)

Speed (linear models; B=50; n.c=25)

Bootstrapping glmnet

Artificial example – Lasso

Artificial example – Lasso

Artificial example – Lasso

Artificial example – Lasso

Artificial example – Lasso

Bootstrapping glmnet

Bootstrapping glmnet

Robustness considerations

Everything presented so far is inherently non-robust

The mplot package

Get it on Github

Vignettes

Main functions

Diabetes example

Variables

Diabetes data set

Diabetes data set (with contamination)

Variable importance (clean, no screen)

Variable importance (clean, screen)

Variable importance (cont, no screen)

Variable importance (cont, screen)

Model stability (clean, no screen)

Model stability (clean, screen)

Model stability (contaminated, no screen)

Model stability (contaminated, screen)

Adaptive fence (clean, no screen)

Adaptive fence (clean, screen)

Adaptive fence (cont, no screen)

Adaptive fence (cont, screen)

glmnet VIP (clean, no screen)

glmnet VIP (clean, screen)

glmnet VIP (contaminated, no screen)

glmnet VIP (contaminated, screen)

Underlying technology

R packages

Web

Package development

Session Info