Bagging in Machine Learning
Bagging comes under Ensemble technique. In general English Ensemble means (group,combo).
In machine learning ensemble techniques group of machine learning techniques which work together so that the accuracy or other performance metrics gets together
These simple machine learning models which will work together is called Base Models
There are four types of Ensemble techniques. They are:
- Bagging (Bootstrap Aggregation)
- Boosting
- Stacking
- Cascading
In this article we will discuss about Bagging technique
Bagging is also referred as Bootstrap Aggregation. Bootstrap aggregation is a statistical term. In statistics bootstrap means sampling population with replacement.
The core idea of Bagging is as follows:
Process involved in training of Bagging
Step1: We take entire training data , say the size of training data be ’N’. we do random sample of training data with replacement . Say the size of each sample be ’n’ where n<N
Step2: We train the sample with machine learning model.
Step3: We iteratively do the Step 1 and Step 2 for k times where k could be a large integer
Predicting the class value/class label using Bagging:
When we want to predict new data point using Bagging. We give the data point to each Base models. If the problem is of classification type we do majority vote of the outcomes produced by the base models .
If the problem is regression we do mean or median to the predicted values given by the each base models
In machine learning generalization error that is error on the future data can be decomposed into 3 terms
What should be the nature of base models?
For Bagging technique we need high variance and low bias model
Proof:
We know that ,generalization error=Bias²+variance +irreducible error
Irreducible error: Error cannot be reducible for a given model
Variance: how much a base model changes when there is slight change in training data.If model changes so much for slight change in training data then we say high variance
Bias: these type of errors occurs due to simplification of model like predicting that a class point belongs to majority class. It is like under-fitting the data.
Now coming to the bagging concept lets say training data contains 50 thousand points and lets say we have 50 base models. Even if we delete 500 data points from the training data only some of the models will change and rest of them would not change. So,over all model accuracy will not be significantly affected
So if we use base models having low bias and high variance and if we combine the models using the bagging technique then the overall model has low bias and reduce variance
High variance and low models are like
- In KNN model with small K value
- In decision tree with large depth value
- In SVM model hyper parameter if ‘C’ value is high
If we use Decision Trees with large depth value and use bagging technique then it is called Random Forests