Location | Scale | Properties |
---|---|---|
Mean | Standard deviation | Efficient at normal but not robust |
Median | Interquartile Range | Robust but not efficient |
Hodges-Lehmann estimator | \(P_n\) | Good robustness and efficiency properties |
Consider 10 observations drawn from \(\mathcal{N}(0,1)\).
The influence curve for a functional \(T\) at distribution \(F\) is \[\operatorname{IF}(x;T,F) = \lim_{\epsilon\downarrow0}\frac{T((1-\epsilon)F+\epsilon\delta_{x}) - T(F)}{\epsilon}\] where \(\delta_{x}\) has all its mass at \(x\).
Assuming that \(F\) has derivative \(f>0\) on \([F^{-1}(\epsilon),F^{-1}(1-\epsilon)]\) for all \(\epsilon>0\),
\[ \begin{aligned} IF(x; \color{blue}{P_{n}},F) & = \left [ \frac{ 0.75 - F(2H_{F}^{-1}(0.75)-x)}{\int f(2H_{F}^{-1}(0.75) - x)f(x)dx} \right. \\ & \qquad\qquad \left. - \frac{0.25 - F(2H_{F}^{-1}(0.25)-x)}{\int f(2H_{F}^{-1}(0.25) - x)f(x)dx} \right]. \end{aligned} \]
A key component of my PhD looked at estimating precision matrices for data contaminated in a cellwise manner.
Important for:
Often sparsity is assumed, i.e. the precision matrix will have many zero entries.
A key component of my PhD looked at estimating precision matrices for data contaminated in a cellwise manner.
Important for:
Often sparsity is assumed, i.e. the precision matrix will have many zero entries.
A key component of my PhD looked at estimating precision matrices for data contaminated in a cellwise manner.
Important for:
Often sparsity is assumed, i.e. the precision matrix will have many zero entries.
A key component of my PhD looked at estimating precision matrices for data contaminated in a cellwise manner.
Important for:
Often sparsity is assumed, i.e. the precision matrix will have many zero entries.
A key component of my PhD looked at estimating precision matrices for data contaminated in a cellwise manner.
Important for:
Often sparsity is assumed, i.e. the precision matrix will have many zero entries.
Aim: to estimate the dependence structure with S&P 500 stocks over the period 01/01/2003 to 01/01/2008 (before the GFC).
How: using the graphical lasso with a robust covariance matrix as the input.
The graphical lasso minimises the penalised negative Gaussian log-likelihood: over non-negative definite matrices \(\boldsymbol{\Theta}\): \[f(\boldsymbol{\Theta})= \text{tr}(\hat{\boldsymbol{\Sigma}}\boldsymbol{\Theta})-\log |\boldsymbol{\Theta}| + \lambda||\boldsymbol{\Theta}||_{1}, \] where \(||\boldsymbol{\Theta}||_1\) is the \(L_1\) norm, \(\lambda\) is a tuning parameter for the amount of shrinkage and \(\hat{\boldsymbol{\Sigma}}\) is a sample covariance matrix.
require(huge) data(stockdata) X = log(stockdata$data[2:1258,]/stockdata$data[1:1257,]) par(mfrow=c(3,2),mar=c(2,4,1,0.1)) for(i in 1:6) ts.plot(X[,i],main=stockdata$info[i,3],ylab="Return")
With important special cases:
To provide scientists and researchers with tools that give them more information about the model selection choices that they are making.
Concept of model stability independently introduced by Meinshausen and Bühlmann (2010) and Müller and Welsh (2010) for different models.
Variable | Description |
---|---|
age | Age |
sex | Gender |
bmi | Body mass index |
map | Mean arterial pressure (average blood pressure) |
tc | Total cholesterol (mg/dL) |
ldl | Low-density lipoprotein ("bad" cholesterol) |
hdl | High-density lipoprotein ("good" cholesterol) |
tch | Blood serum measurement |
ltg | Blood serum measurement |
glu | Blood serum measurement (glucose?) |
y | A quantitative measure of disease progression one year after baseline |
To visualise inclusion probabilities as a function of the penalty multiplier \(\lambda\in [0,2\log(n)]\).
To add value to the loss against size plots by choosing a symbol size proportional to a measure of stability.
install.packages("devtools") require(devtools) install_github("garthtarr/mplot",quick=TRUE) require(mplot)
af()
for the adaptive fencevis()
for VIP and model stability plotsbglmnet()
bootstrapping glmnetmplot()
for an interactive shiny interfaceFriedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2008. “Sparse Inverse Covariance Estimation with the Graphical Lasso.” Biostatistics 9 (3): 432–41. doi:10.1093/biostatistics/kxm045.
Meinshausen, N, and P Bühlmann. 2010. “Stability Selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4): 417–73. doi:10.1111/j.1467-9868.2010.00740.x.
Murray, K, S Heritier, and S Müller. 2013. “Graphical Tools for Model Selection in Generalized Linear Models.” Statistics in Medicine 32 (25): 4438–51. doi:10.1002/sim.5855.
Müller, S, and AH Welsh. 2010. “On Model Selection Curves.” International Statistical Review 78 (2): 240–56. doi:10.1111/j.1751-5823.2010.00108.x.
Tarr, G, S Müller, and NC Weber. 2012. “A Robust Scale Estimator Based on Pairwise Means.” Journal of Nonparametric Statistics 24 (1): 187–99. doi:10.1080/10485252.2011.621424.
———. 2015. “Robust Estimation of Precision Matrices Under Cellwise Contamination.” Computational Statistics & Data Analysis to appear. doi:10.1016/j.csda.2015.02.005.
Tarr, G, S Müller, and AH Welsh. 2015. mplot: Graphical Model Stability and Model Selection Procedures. https://github.com/garthtarr/mplot.
Tarr, G, NC Weber, and S Müller. 2015. “The Difference of Symmetric Quantiles Under Long Range Dependence.” Statistics & Probability Letters 98: 144–50. doi:10.1016/j.spl.2014.12.022.