Introduction

Boosted Regression Tree (BRT) models are a combination of two techniques: decision tree algorithms and boosting methods. Like Random Forest models, BRTs repeatedly fit many decision trees to improve the accuracy of the model. One of the differences between these two methods is the way in which the data to build the trees is selected. Both techniques take a random subset of all data for each new tree that is built. All random subsets have the same number of data points, and are selected from the complete dataset. Used data is placed back in the full dataset and can be selected in subsequent trees. While Random Forest models use the bagging method, which means that each occurrence has an equal probability of being selected in subsequent samples, BRTs use the boosting method in which the input data are weighted in subsequent trees. The weights are applied in such a way that data that was poorly modelled by previous trees has a higher probability of being selected in the new tree. This means that after the first tree is fitted the model will take into account the error in the prediction of that tree to fit the next tree, and so on. By taking into account the fit of previous trees that are built, the model continuously tries to improve its accuracy. This sequential approach is unique to boosting. 

Boosted Regression Trees have two important parameters that need to be specified by the user.

  1. Tree complexity (tc): this controls the number of splits in each tree. A tc value of 1 results in trees with only 1 split, and means that the model does not take into account interactions between environmental variables. A tc value of 2 results in two splits and so on.
  2. Learning rate (lr): this determines the contribution of each tree to the growing model. As small value of lr results in many trees to be built.

These two parameters together determine the number of trees that is required for optimal prediction. The aim is to find the combination of parameters that results in the minimum error for predictions. As a rule of thumb, it is advised to use a combination of tree complexity and learning rate values that result in a model with at least 1000 trees. The optimal ‘tc’ and ‘lr’ values depend on the size of your dataset. For datasets with <500 occurrence points, it is best to model simple trees (‘tc’ = 2 or 3) with small enough learning rates to allow the model to grow at least 1000 trees.

Boosted Regression Trees are a powerful algorithm and work very well with large datasets or when you have a large number of environmental variables compared to the number of observations, and they are very robust to missing values and outliers.

Advantages

  • Can be used with a variety of response types (binomial, gaussian, poisson)
  • Stochastic, which improves predictive performance
  • The best fit is automatically detected by the algorithm
  • Model represents the effect of each predictor after accounting for the effects of other predictors
  • Robust to missing values and outliers


Limitations

  • Needs at least 2 predictor variables to run

Assumptions

No formal distributional assumptions, boosted regression trees are non-parametric and can thus handle skewed and multi-modal data as well as categorical data that are ordinal or non-ordinal.

Requires absence data

Yes.

Configuration options

The BCCVL uses the ‘gbm.step’ function in the ‘dismo’ package. The user can set the following configuration options: 

References

  • De'Ath G (2007) Boosted trees for ecological modeling and prediction. Ecology, 88(1): 243-251.
  • Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. Journal of Animal Ecology, 77(4): 802-813.
  • Franklin J (2010) Mapping species distributions: spatial inference and prediction. Cambridge University Press.