본문 바로가기

짧은 영어 글들

Boosting vs Bagging: The Battle Against Overfitting in the Forest of Trees

728x90

Overfitting is generally more of a concern with boosting algorithms than with bagging algorithms when increasing the number of trees.

Boosting algorithms like Gradient Boosting and AdaBoost train models sequentially, where each new model is trained to correct the mistakes made by the previous ones. This process can create complex models that fit the training data very well. However, if too many trees are used, the model can become excessively complex, leading to overfitting, where it starts to fit the noise in the data and perform poorly on unseen data.

Bagging algorithms, on the other hand, like Random Forests, train each tree independently on different subsets of the original data (with replacement, a process known as bootstrapping). The final prediction is made by averaging the predictions of all trees (or majority voting for classification). This approach generally reduces variance without increasing bias, making bagging less prone to overfitting. Each individual decision tree might overfit its own bootstrapped sample, but when averaged together, their individual overfitting tendencies statistically cancel out, resulting in a better generalization on unseen data.

728x90