Regularized objective
where
maps each example to its corresponding leaf index. represent the weight of leaf . Unlike decision trees, a regression tree contains a continuous score on each of the leaf. We sum up the score in the corresponding leaves accross all the weak learners.
To learn them we minimize the regularized objective:
where is a differentiable convex loss function and the number of leaves. The second term penalizes model complexity.
Gradient Tree Boosting
Since this ensemble model includes functions as parameters, it cannot be optimized in Euclidean space. The model is trained in an additive manner instead (greedily add weak learner that most improves the model):
See notes on [[gradient_boosting]]
Taking second order approximation of the loss term with respect to the function, we can quickly optimize the (convex) objective by setting its gradient to zero and find the optimal weights given a tree structure .
INCLUDE FORMULAS
We use this score as a quality measure of tree structure .
Since we can't enumerate all tree structures we start from a single leaf and add branches in a greedy fashion.
Loss reduction after a split is given by:
INCLUDE FORMULAS
Shrinkage
Scales newly added weights by a factor after each step of tree boosting. Simliar to a learning rate, reduces influence of previous trees and leaves space to future trees to improve the model.
Column (feature) subsampling prevents overfitting even more than row subsampling.
(Basic) Exact Greedy Algorithm
enumerates over all possible splits over all the features. Computationnally demanding
Approximate algorithm
Exacty greedy algorithm not possible when data does not fit in memory