Summer scholar, honours and PhD projects

The projects below could serve as a starting point for summer vacation projects or honours projects and could serve as a starting point for PhD projects. I’m also open to ideas that prospective students have in mind, especially (but not exclusively limited to) those relating to model selection, robust statistics, functional data analysis, regular data analysis, statistical computing, data visualisation (e.g. building web-based interactive dashboards) and joint modelling (survival and longitudinal data analysis).

Project 1: Outlier identification in functional data

Functional data is where we observe a curve for each sample. Examples of functional data include growth curves, brain electrical activity and colour spectra measurements. It is important to be able to identify any unusual sample curves that do not align closely with the other observations so that they can be dealt with appropriately in any subsequent analysis of the data. This project will look at existing (and perhaps new) approaches to outlier identification in functional data. The research will compare colour measurements from beef carcases obtained using a Hunter colour meter to the colour classification given by a trained human assessor.

Project 2: Classification with functional data

Functional data is where we observe a curve for each sample. Examples of functional data include growth curves, brain electrical activity and colour spectra. A key problem with functional data analysis is classification of curves into multiple categories. An example is assessing the colour of beef carcases for meat grading purposes. This project will investigate appropriate methods of classifying colour spectra readings into multiple categories and assessing the accuracy of such automated approaches to colour measurement.

Project 3: Improved model averaging through better model weights

Model averaging seeks to address the issue post model selection inference by incorporating model uncertainty into the estimation process. This project will investigate different weighting approaches used to obtaining model averaged estimates. Existing approaches will be compared to a new method where model weights are obtained through bootstrapping.

Project 4: The use of approximations in bootstrap model selection

Exhaustive model searches for generalised linear models are computationally burdensome. Hosmer et al (1989, Best Subsets Logistic Regression, Biometrics) considers approximating logistic regression models using a form of weighted least squares. This project will explore using linear models to approximate generalised linear models for the purposes of computationally efficient bootstrap model selection. The results of this research will be added to the mplot R package.

Project 5: Finite sample performance of robust location estimators

Consumer data often exhibits numerous outliers. This project will consider various existing robust location estimators (e.g. median, trimmed mean, Hodges-Lehmann estimator) and assess their small sample performance, especially in samples of size n=10, which corresponds to the number of consumers who eat each piece of meat in standard meat tasting consumer trials. There is an extensive consumer database (40,000+ observations) on which to apply various approaches.

Project 5.1: Optimal robust location estimation in a bounded interval

Consumer testing measurements typically occur within bounded intervals, for example when asked to give a score out of 100. Standard measures of robustness are concerned with the breakdown value of an estimator, though this is less important in bounded data. An extension to Project 5 would include finding an “optimal” robust location estimator for bounded data with a focus on small sample performance.

Potential PhD project: Statistical issues in evaluating the eating quality of meat

The beef industry in Australia is worth $13 billion annually and the sheep meat industry is worth another $4 billion. A key question concerning the red meat industry is the ability to predict the eating quality of cuts of meat. Doing this well has major financial implications for the industry. This project would focus on the statistical issues associated with predicting meat eating quality. Examples of subprojects include: the analysis of consumer data which often contains many outliers; determining the relative importance of eating quality factors such as flavour, tenderness and juiciness; modelling issues in the presence of missing or limited amounts of data – this may include investigations into imputation or borrowing strength from similar data sets. Inclusion of genetic marker information in the modelling process; and comparing the use of new automated techniques to existing manual grading techniques, e.g. the analysis of colour spectral readings or or the use of chemical fat meters. There would also be scope to compare data from Australia with other countries around the world including Ireland, France, Poland, Japan, China and the United States as the global meat industry moves towards an international data sharing model.