Bagging and Boosting both are Ensemble Learning techniques. Ensemble Methods are an important addition to Data Scientists toolbox. Here, weak learners combined together to become strong learners. And offers better performance than an individual one.
What is an Ensemble Method
The main logic behind the Ensemble Method is the weak learners combined together to form a strong learner. Thus they increase the accuracy of the models.
A single tree base algorithm might not be able to provide better accuracy. Therefore Ensemble Methods combine various tree base algorithms to serve better predictive performance. Ensemble Methods reduce factors like variance and bias which are the main reason to bring the difference in actual values and predicted values.
To understand Ensemble Method think of it as “a group of people are better at making decisions when in a team instead of an individual.” The same is true with Machine Learning and with Ensemble Method, we can get better performance by combining simpler learners.
Bagging or Bootstrap Aggregating improves the accuracy of the model by reducing variance. It avoids overfitting. From the training dataset, we select multiple subsets. Each subset takes a model with the same learning algorithms just like a Decision tree to predict the output of test data. Then we consider its average to be the find the final result. For example, in Random Forest where we use multiple decision trees.
Working of Bagging
Assume we have N number of models and a dataset D. m and n denotes the number of data and features respectively. Firstly we will split the dataset D into a training dataset and testing dataset.
For the first model m1, we will take a sample of records from the training set. For another model m2, we will resample it and take another sample from the training data. We will repeat the same thing for the N number of models.
We are resampling the training dataset and providing the sample to the model is termed as Row Sampling with Replacement. After training the model we want to see the prediction on test data. For binary classification, the output can either be 0 or 1. Let us assume out of N models more than N/2 predicted output as 1. Hence by taking the average of all the models, we can say the final output for the test data is 1.
Some of the famous Boosting techniques are AdaBoost, GRADIENT BOOSTING, XgBOOST. This technique is used in Supervised Machine Learning to reduce bias and variance. This converts weak learners to strong learners. Weak learners are correct only up to a small extent with the actual classification, whereas strong learners are well correlated with the actual classification.
Working of Boosting
We take records from the dataset and pass it to base learners(any model). Let us assume there are m records in the dataset. From m records, we will pass few records to base learner bl1. We will train bl1 and check its accuracy.
The records which are incorrectly classified by the base learner bl1 will be pass to bl2 and simultaneously to bl3. This will go on until and unless we specify some specific number of base learner models we need. At last, we will combine the output from all these base learners and create a strong learner. This will improve the predictive power of the model.
Differences between Bagging and Boosting
|Each model receives equal weightage.||Models are weighted by their performance.|
|Tries to solve the over-fitting problem.||Tries to reduce bias.|
|Aim to decrease variance.||Aim to decrease bias.|
|The easiest way of connecting predictions that belong to the same type.||The easiest way of connecting predictions that belong to a different type.|
|Every model is constructed independently.||New models get affected by the performance of previous models.|
|For unstable classifier, apply bagging.||For steady and straightforward classifier, apply boosting.|