Prediction

Consider a scenario with

  • N observations
  • P predictors (explanatory variables)

possibly with some grouping structure that we would typically associate with random effects.

Now, suppose the scientific question is to build the best predictive model.

Options

There are many options to choose from including

  • Stepwise model selection (AIC, BIC, pvalue, etc)
  • Model averaging (AIC, BIC, etc)
  • LASSO
  • Random forest

Stepwise model selection has severe problems and is not suggested.

Amongst these, the random forest is the preferred methodology. In R, the randomForest is on CRAN.