Stacking algorithm in machine learning

Samhith Vasikarla
2 min readNov 9, 2021

--

Stacking method is one of the ensemble techniques. Ensemble means different machine learning models work together in order to predict or classify the data points

The training of stacking algorithm is as follows

i)First we divide the entire dataset into training set and testing set

ii) We take the whole training dataset and give to the ‘M’ different machine learning models. These ‘M’ models need not be of same machine learning model. Ideally we take ‘M’ models from different machine learning models. For eg: one model can be from KNN,other model can be from Naive bayes and so on

iii) These models are developed parallely and independenltly like Bagging Method.

iv) Let the model 1 be h1(x) ,model 2 be h2(x) ………… model mbe hm(x)

v) Once these models are completed we construct another meta classifer model with the dataset D`={X,y} where y is the class label and X ={h1(x),h2(x),……….hm(x))}

whenever a query points comes first it is evaluated by the base learners. The output of base learners will be the features for meta classifer model and meta classifer model predicts the class label

Seems this method is intresting but unfornuately sklearn package doesnot implement Stacking method. So, to use stacking we use mlextend package.

Installation of mlextend is given in “http://rasbt.github.io/mlxtend/installation/

Sample Example :

This example is also taken from http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/

from sklearn import model_selection 
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingClassifier
import numpy as np
import warnings
warnings.simplefilter('ignore')
clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
lr = LogisticRegression()
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], meta_classifier=lr)

here the base models are KNN, RandomForest,NaiveBayes and the meta classifier is Logistic Regression.

You might think that bagging and stacking are almost same. But it is No.

In bagging we take the high variance and low bias. But in stacking we see all the base learners are having balanced variance and bias. In bagging we simply use aggregation(like taking mean or median of the base learners output). But in stacking we use another machine learning model to predict the output.

Ref:http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/

--

--

No responses yet