My research focuses on predictive modeling, feature selection, and robust statistical methodology for big data in agriculture and bioinformatics. I develop statistical methods and open- source software to help scientists and researchers make more informed decisions about the statistical models they build and make the most efficient use of complex data. I have a wide interdisciplinary collaboration network and industry engagement particularly in the fields of agriculture and health. I contribute to the field of statistics by developing fast, flexible, and robust methodology that is practical, reliable, and easy to use.
My work directly addresses the reproducibility issues in science by investigating the stability of features or variables being selected. I develop statistical methods that explore sensitivity to the choice of tuning parameters and resampling techniques to determine the stability of selected features. A stable and repeatable model is often just as important as accuracy, particularly when communicating with stakeholders.
In many industrial and health applications, the goal is a predictive model that can be used in production to direct resources in an optimal manner. I develop robust multilevel and multiclass methods that enable practitioners to extract as much value as possible from their data sets, for example, through methods that enable different collections of data to be sensibly analysed together (learning from similar subsets in a data driven way) or building predictive methods on one technology that can be transported to another technology or lab (cross platform prediction).