- Variable inclusion plots
- Model stability plots
- Adaptive fence method
- Bootstrapping glmnet
- Robustness considerations
- The
mplot
package - Diabetes data example
mplot
packageTo provide scientists/researchers with tools that give them more information about the model selection choices that they are making.
Concept of model stability independently introduced by Meinshausen and Bühlmann (2010) and Müller and Welsh (2010) for different linear regression situations.
require(mplot) lm.art = lm(y~., data=artificialeg)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | -0.10 | 0.33 | -0.31 | 0.76 |
x1 | 0.64 | 0.69 | 0.92 | 0.36 |
x2 | 0.26 | 0.62 | 0.42 | 0.68 |
x3 | -0.51 | 1.24 | -0.41 | 0.68 |
x4 | -0.30 | 0.25 | -1.18 | 0.24 |
x5 | 0.36 | 0.60 | 0.59 | 0.56 |
x6 | -0.54 | 0.96 | -0.56 | 0.58 |
x7 | -0.43 | 0.63 | -0.68 | 0.50 |
x8 | 0.15 | 0.62 | 0.24 | 0.81 |
x9 | 0.40 | 0.64 | 0.63 | 0.53 |
The true data generating process is:
\[y = 0.6x_8 + \varepsilon.\]
art.true = lm(y~x8,data=artificialeg)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 0.03 | 0.29 | 0.11 | 0.91 |
x8 | 0.55 | 0.05 | 10.43 | 0.00 |
With important special cases:
To visualise inclusion probabilities as a function of the penalty multiplier \(\lambda\in [0,2\log(n)]\).
require(mplot) vis.art = vis(lm.art) plot(vis.art, which = "vip")
plot(vis.art, which = "lvk")
plot(vis.art, which = "lvk", highlight = "x6")
To add value to the loss against size plots by choosing a symbol size proportional to a measure of stability.
plot(vis.art, which = "boot")
The fence (Jiang et al. 2008) is based aroung the inequality: \[ Q(\alpha) \leq Q(\alpha_f) + c \]
Source: Jiang, Nguyen, and Rao (2009)
leaps
and bestglm
packagesaf.art = af(lm.art, B = 100, n.c = 50, n.cores = 2) plot(af.art)
## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x8 ## 8 ## [1] "x8" ## [1] "x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## 6 ## [1] "x6" "x8" ## [1] "x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x1 ## 1 ## [1] "x1" "x6" "x8" ## [1] "x1+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x4 ## 4 ## [1] "x1" "x4" "x6" "x8" ## [1] "x1+x4+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## 3 ## [1] "x1" "x3" "x4" "x6" "x8" ## [1] "x1+x3+x4+x6+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x7 ## 7 ## [1] "x1" "x3" "x4" "x6" "x7" "x8" ## [1] "x1+x3+x4+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## 5 ## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8" ## [1] "x1+x3+x4+x5+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## 2 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## -5 ## [1] "x1" "x2" "x3" "x4" "x6" "x7" "x8" ## [1] "x1+x2+x3+x4+x6+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## -6 ## [1] "x1" "x2" "x3" "x4" "x7" "x8" ## [1] "x1+x2+x3+x4+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x5 ## 5 ## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8" ## [1] "x1+x2+x3+x4+x5+x7+x8" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x9 ## 9 ## [1] "x1" "x2" "x3" "x4" "x5" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x6 ## 6 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## -2 ## [1] "x1" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x2 ## 2 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## -3 ## [1] "x1" "x2" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x4+x5+x6+x7+x8+x9" ## ## Call: ## lars(x = x, y = y) ## R-squared: 0.751 ## Sequence of LASSO moves: ## x8 x6 x1 x4 x3 x7 x5 x2 x5 x6 x5 x9 x6 x2 x2 x3 x3 ## Var 8 6 1 4 3 7 5 2 -5 -6 5 9 6 -2 2 -3 3 ## Step 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ## x3 ## 3 ## [1] "x1" "x2" "x3" "x4" "x5" "x6" "x7" "x8" "x9" ## [1] "x1+x2+x3+x4+x5+x6+x7+x8+x9"
bgn.art = bglmnet(lm.art, lambda = seq(0.05, 2.75, 0.1)) plot(bgn.art, which = "models")
bgn.art = bglmnet(lm.art) plot(bgn.art, which = "variables")
It would be great to have a complete suite of robust alternatives.
Used to comparing and contrast robust with non-robust plots to check for any differences.
In the absence of implementing a suite of robust alternatives, I have implemented a very simple alternative, based on a robust initial screening method (Filzmoser, Maronna, and Werner 2008).
install.packages("devtools") require(devtools) install_github("garthtarr/mplot",quick=TRUE) require(mplot)
vignette("mplot-guide",package="mplot") vignette("mplot-stepwise",package="mplot")
af()
for the adaptive fencevis()
for VIP and model stability plotsbglmnet()
bootstrapping glmnetmplot()
for an interactive shiny interfaceVariable | Description |
---|---|
age | Age |
sex | Gender |
bmi | Body mass index |
map | Mean arterial pressure (average blood pressure) |
tc | Total cholesterol (mg/dL) |
ldl | Low-density lipoprotein ("bad" cholesterol) |
hdl | High-density lipoprotein ("good" cholesterol) |
tch | Blood serum measurement |
ltg | Blood serum measurement |
glu | Blood serum measurement (glucose?) |
y | A quantitative measure of disease progression one year after baseline |
Source: Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., (2004). "Least angle regression. The Annals of Statistics 32 (2): 407-499. doi:10.1214/009053604000000067
roxygen2
devtools
is an essential package for building and loading R packagessessionInfo()
## R version 3.1.2 (2014-10-31) ## Platform: x86_64-apple-darwin13.4.0 (64-bit) ## ## locale: ## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] xtable_1.7-4 mplot_0.4.9 glmnet_1.9-8 Matrix_1.1-4 ## [5] mvoutlier_2.0.5 sgeostat_1.0-25 shiny_0.10.2.2 googleVis_0.5.7 ## [9] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 bestglm_0.34 ## [13] leaps_2.9 knitr_1.8 ## ## loaded via a namespace (and not attached): ## [1] cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 ## [4] DEoptimR_1.0-2 digest_0.6.8 evaluate_0.5.5 ## [7] formatR_1.0 GGally_0.5.0 ggplot2_1.0.0 ## [10] grid_3.1.2 gtable_0.1.2 htmltools_0.2.6 ## [13] httpuv_1.3.2 lattice_0.20-29 MASS_7.3-35 ## [16] mime_0.2 munsell_0.4.2 mvtnorm_1.0-2 ## [19] pcaPP_1.9-60 pls_2.4-3 plyr_1.8.1 ## [22] proto_0.3-10 R6_2.0.1 Rcpp_0.11.3 ## [25] reshape_0.8.5 reshape2_1.4.1 RJSONIO_1.3-0 ## [28] rmarkdown_0.3.10 robCompositions_1.9.0 robustbase_0.92-2 ## [31] rrcov_1.3-8 scales_0.2.4 stats4_3.1.2 ## [34] stringr_0.6.2 tools_3.1.2 yaml_2.1.13
Filzmoser, Peter, Ricardo A Maronna, and Mark Werner. 2008. “Outlier Identification in High Dimensions.” Computational Statistics & Data Analysis 52 (3): 1694–1711. doi:10.1016/j.csda.2007.05.018.
Jiang, Jiming, Thuan Nguyen, and J. Sunil Rao. 2009. “A Simplified Adaptive Fence Procedure.” Statistics & Probability Letters 79 (5): 625–29. doi:10.1016/j.spl.2008.10.014.
Jiang, Jiming, J. Sunil Rao, Zhonghua Gu, and Thuan Nguyen. 2008. “Fence Methods for Mixed Model Selection.” The Annals of Statistics 36 (4): 1669–92. doi:10.1214/07-AOS517.
Meinshausen, Nicolai, and Peter Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4): 417–73. doi:10.1111/j.1467-9868.2010.00740.x.
Murray, K, S Heritier, and Samuel Müller. 2013. “Graphical Tools for Model Selection in Generalized Linear Models.” Statistics in Medicine 32 (25): 4438–51. doi:10.1002/sim.5855.
Müller, Samuel, and Alan H. Welsh. 2010. “On Model Selection Curves.” International Statistical Review 78 (2): 240–56. doi:10.1111/j.1751-5823.2010.00108.x.