Feature orthogonality
Before discussing about the feature orthogonality, I would like to discuss about correlation.
Correaltion: Correlation of two variables (features)refers to the statistical association between the two variables.
If the second variable value increases when first variable value increases (or) if the second variable value decreases when the first variable value decreases then we say that two variables are positively correlated. Else there are negatively correlated.
For calculation how much two variables are correlated we use Pearson correlation coefficent which could refered in (“https://en.wikipedia.org/wiki/Pearson_correlation_coefficient”)
So coming to feature orthogonality, the core idea is the more the different features in the model the better would be the model.
It means, Suppose in dataset we have 3 features f1,f2,f3,f4…..fn, and our task is either regression or classification. After finding correlation we concluded that features f1,f2,f3 are higly correlated with the class label. We find correlation between the features.
Lets take 2 cases
- Case 1:all the features are highly correlated with each other. (f1 is highly correlated with f2, f2 is highly correlated with f3, f1 is highly correlated with f3). Then the overall impact of combining these features is less. It is like predicting the class label from one of the features but not with 3 features.
- Case 2: If no features are correlated with each other( f1 is not highly correlated with f2, f2 is not highly correlated with f3, f1 is not highly correlated with f3). Then the overall impact of combining these features will be more than in Case.
So there could a question how we create a new feature which is highly correlated with class label but less correlated with the other important features.
Ans: Intially we have 3 features which are important in predicting the class label. Train them with the machine learning model you like with proper tuning of hyperparameters. Then find the error (absolute difference of class label and predicted label). This could be our new feature and we train them along with new feature. This model can give more performance then the first model.
In this way boosting algorithms works