- What's wrong with existing variable selection techniques?
- Variable inclusion plots
- Model stability plots
- Adaptive fence method
- The
mplot
package - Examples with real data
mplot
packagerequire(mplot) lm.art = lm(y~., data=artificialeg)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | -0.10 | 0.33 | -0.31 | 0.76 |
x1 | 0.64 | 0.69 | 0.92 | 0.36 |
x2 | 0.26 | 0.62 | 0.42 | 0.68 |
x3 | -0.51 | 1.24 | -0.41 | 0.68 |
x4 | -0.30 | 0.25 | -1.18 | 0.24 |
x5 | 0.36 | 0.60 | 0.59 | 0.56 |
x6 | -0.54 | 0.96 | -0.56 | 0.58 |
x7 | -0.43 | 0.63 | -0.68 | 0.50 |
x8 | 0.15 | 0.62 | 0.24 | 0.81 |
x9 | 0.40 | 0.64 | 0.63 | 0.53 |
step(lm.art)
## Start: AIC=79.3 ## y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 ## ## Df Sum of Sq RSS AIC ## - x8 1 0.2423 163.94 77.374 ## - x3 1 0.6946 164.39 77.512 ## - x2 1 0.7107 164.41 77.517 ## - x6 1 1.3051 165.00 77.698 ## - x5 1 1.4425 165.14 77.739 ## - x9 1 1.6065 165.31 77.789 ## - x7 1 1.8835 165.58 77.873 ## - x1 1 3.4999 167.20 78.358 ## - x4 1 5.7367 169.44 79.023 ## <none> 163.70 79.301 ## ## Step: AIC=77.37 ## y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x9 ## ...
## ## Call: ## lm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x9, data = artificialeg) ## ## Coefficients: ## (Intercept) x1 x2 x3 x4 ## -0.1143 0.8019 0.4011 -0.8083 -0.3514 ## x5 x6 x7 x9 ## 0.4927 -0.7738 -0.5772 0.5478
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | -0.11 | 0.32 | -0.36 | 0.72 |
x1 | 0.80 | 0.19 | 4.13 | 0.00 |
x2 | 0.40 | 0.18 | 2.26 | 0.03 |
x3 | -0.81 | 0.19 | -4.22 | 0.00 |
x4 | -0.35 | 0.12 | -2.94 | 0.01 |
x5 | 0.49 | 0.19 | 2.55 | 0.01 |
x6 | -0.77 | 0.15 | -5.19 | 0.00 |
x7 | -0.58 | 0.15 | -3.94 | 0.00 |
x9 | 0.55 | 0.19 | 2.90 | 0.01 |
x8
.
The true data generating process is:
\[y = 0.6x_8 + \varepsilon.\]
art.true = lm(y~x8,data=artificialeg)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 0.03 | 0.29 | 0.11 | 0.91 |
x8 | 0.55 | 0.05 | 10.43 | 0.00 |
The true data generating process is:
\[y = 0.6x_8 + \varepsilon.\]
art.true = lm(y~x8,data=artificialeg)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 0.03 | 0.29 | 0.11 | 0.91 |
x8 | 0.55 | 0.05 | 10.43 | 0.00 |
anova(art.true,step.art)
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 48 | 200.97 | ||||
2 | 41 | 163.94 | 7 | 37.03 | 1.32 | 0.2643 |
Tibshirani (1996) suggested doing regression with an \(L_1\) norm penalty and called it the lasso (least absolute shrinkage and selection operator).
The lasso parameter estimates are obtained by minimising the resudual sum of squares subject to the constraint that \[\sum_j |\beta_j| \leq t.\]
"The encouraging results reported here suggest that absolute value constrants might prove to be useful in a wide variety of statistical estimation problems. Further study is needed to investigate these possibilities." - Tibshirani (1996)
require(lars) x = as.matrix(subset(artificialeg, select = -y)) y = as.matrix(subset(artificialeg, select = y)) art.lars = lars(x, y)
require(lars) x = as.matrix(subset(artificialeg, select = -y)) y = as.matrix(subset(artificialeg, select = y)) art.lars = lars(x, y)
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x8 ## 8 ## [1] "x8" ## [1] "x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## 6 ## [1] "x6" "x8" ## [1] "x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x1 ## 1 ## [1] "x1" "x6" "x8" ## [1] "x1+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x4 ## 4 ## [1] "x1" "x4" "x6" "x8" ## [1] "x1+x4+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## 3 ## [1] "x1" "x3" "x4" "x6" "x8" ## [1] "x1+x3+x4+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x7 ## 7 ## [1] "x1" "x3" "x4" "x6" "x7" "x8" ## [1] "x1+x3+x4+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## 5 ## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8" ## [1] "x1+x3+x4+x5+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## 2 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## -5 ## [1] "x1" "x2" "x3" "x4" "x6" "x7" "x8" ## [1] "x1+x2+x3+x4+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## -6 ## [1] "x1" "x2" "x3" "x4" "x7" "x8" ## [1] "x1+x2+x3+x4+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## 5 ## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8" ## [1] "x1+x2+x3+x4+x5+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x9 ## 9 ## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## 6 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## -2 ## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## 2 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## -3 ## [1] "x1" "x2" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## 3 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Concept of model stability independently introduced by Meinshausen and Bühlmann (2010) and Müller and Welsh (2010) for different linear regression situations.
To provide scientists/researchers with tools that give them more information about the model selection choices that they are making.
To visualise inclusion probabilities as a function of the penalty multiplier \(\lambda\in [0,2\log(n)]\).
require(mplot) vis.art = vis(lm.art) plot(vis.art, which = "vip")
plot(vis.art, which = "lvk")
plot(vis.art, which = "lvk", highlight = "x6")
plot(vis.art, which = "lvk", highlight = "x7")
To add value to the loss against size plots by choosing a symbol size proportional to a measure of stability.
plot(vis.art, which = "boot")
The fence (Jiang et al. 2008) is based aroung the inequality: \[ Q(\alpha) \leq Q(\alpha_f) + c \]
Source: Jiang, Nguyen, and Rao (2009)
leaps
and bestglm
packagesaf.art = af(lm.art, B = 100, n.c = 50, n.cores = 2) plot(af.art)
install.packages("devtools") require(devtools) install_github("garthtarr/mplot",quick=TRUE) require(mplot)
vignette("mplot-guide",package="mplot") vignette("mplot-stepwise",package="mplot")
af()
for the adaptive fencevis()
for VIP and model stability plotsmplot()
for an interactive shiny interfaceroxygen2
devtools
is an essential package for building and loading R packagesCores are easy to find:
require(parallel) ncores=1 n=10 expt = function(j) rep(j,4) result = mclapply(1:n, FUN=expt, mc.cores=ncores) simplify2array(result)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## [1,] 1 2 3 4 5 6 7 8 9 10 ## [2,] 1 2 3 4 5 6 7 8 9 10 ## [3,] 1 2 3 4 5 6 7 8 9 10 ## [4,] 1 2 3 4 5 6 7 8 9 10
require(doMC,quietly=TRUE) require(foreach) registerDoMC(cores=ncores) n=10 result = foreach(j = 1:n, .combine=rbind) %dopar% { # EXPERIMENT # last line is returned as a row in the result matrix rep(j,4) } head(result)
## [,1] [,2] [,3] [,4] ## result.1 1 1 1 1 ## result.2 2 2 2 2 ## result.3 3 3 3 3 ## result.4 4 4 4 4 ## result.5 5 5 5 5 ## result.6 6 6 6 6
sessionInfo()
## R version 3.1.1 (2014-07-10) ## Platform: x86_64-apple-darwin13.1.0 (64-bit) ## ## locale: ## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 lars_1.2 ## [5] xtable_1.7-4 mplot_0.4.7 shiny_0.10.2.1 googleVis_0.5.6 ## [9] bestglm_0.34 leaps_2.9 knitr_1.7 ## ## loaded via a namespace (and not attached): ## [1] codetools_0.2-9 compiler_3.1.1 digest_0.6.4 evaluate_0.5.5 ## [5] formatR_1.0 htmltools_0.2.6 httpuv_1.3.2 mime_0.2 ## [9] R6_2.0.1 Rcpp_0.11.3 RJSONIO_1.3-0 rmarkdown_0.3.10 ## [13] stringr_0.6.2 tools_3.1.1 yaml_2.1.13
Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. “Least Angle Regression.” The Annals of Statistics 32 (2): 407–51. doi:10.1214/009053604000000067.
Jiang, Jiming, Thuan Nguyen, and J. Sunil Rao. 2009. “A Simplified Adaptive Fence Procedure.” Statistics & Probability Letters 79 (5): 625–29. doi:10.1016/j.spl.2008.10.014.
Jiang, Jiming, J. Sunil Rao, Zhonghua Gu, and Thuan Nguyen. 2008. “Fence Methods for Mixed Model Selection.” The Annals of Statistics 36 (4): 1669–92. doi:10.1214/07-AOS517.
Meinshausen, Nicolai, and Peter Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4): 417–73. doi:10.1111/j.1467-9868.2010.00740.x.
Murray, K, S Heritier, and Samuel Müller. 2013. “Graphical Tools for Model Selection in Generalized Linear Models.” Statistics in Medicine 32 (25): 4438–51. doi:10.1002/sim.5855.
Müller, Samuel, and Alan H. Welsh. 2010. “On Model Selection Curves.” International Statistical Review 78 (2): 240–56. doi:10.1111/j.1751-5823.2010.00108.x.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological), 267–88. http://www.jstor.org/stable/10.2307/2346178.