Project overview

Many statistical procedures are based on ideas of the aggregation of information gleaned from summaries of a data set such as subsamples, bootstrap samples or random projections. Although the bootstrap itself is probably the best known such method, there are many other examples, including bagging [1,2,3] for regression or classification (with random forests [4] as a special case), Stability Selection for variable selection [5,6] and random projection ensemble classification [7]. Intuitively, such procedures allow the statistician to understand the stability of observed effects under perturbations of the original data, and appear to be particularly valuable for complex, high-dimensional data. Even though these methods are typically embarrassingly parallelisable, they may nevertheless be computationally intensive.

This project will explore when and why methods such as these can be expected to succeed. The analysis will combine both statistical perspectives and the inherent computational trade-offs. It is hoped that the analysis will suggest other statistical challenges where the aggregation of data summaries can prove an effective tool.

- [1] Brieman, L. (1996) Bagging predictors. Mach. Learn., 24, 123–140.
- [2] Hall, P. and Samworth, R. J. (2005) Properties of bagged nearest neighbour classifiers. J. Roy Statist. Soc. Ser. B, 67, 363–379.
- [3] Samworth, R. J. (2012) Optimal weighted nearest neighbour classifiers. Ann. Statist., 40, 2733–2763.
- [4] Breiman, L. (2001) Random forests. Mach. Learn., 45, 5–32.
- [5] Meinshausen, N. and Buehlmann, P. (2010) Stability selection (with discussion). J. Roy. Statist. Soc. Ser. B, 72, 417–473.
- [6] Shah, R. D. and Samworth, R. J. (2013) Variable selection with error control: another look at stability selection. J. Roy. Statist. Soc. Ser. B, 75, 55-80.
- [7] Cannings, T. I. and Samworth, R. J. (2015) Random projection ensemble classification. Available at http://arxiv.org/abs/1504.04595.

Some very interesting Data analysis! https://t.co/hAxpZTW0aV
View on Twitter