Predictive Model Structure :
- Root or Internal Node : Represent a feature.
- Leaf Node : Represent the target value.
- Branch : Represents a decision rule.
Type of Decision Trees :
- Classification Trees : The target variable takes categorical values (e.g., male/female).
- Regression Trees : The target variable takes continuous values (e.g., temperature).
How to construct a decision tree ?
Given a dataset, the algorithm :
- Searches for the best rule to split (Gini Index & Information Gain) samples that are dissimilar.
- Repeats the process until the leaves (partitions) reach a certain degree homogeneity.
Limitations :
- Unstable : A small change in the data can lead to a large change in the structure of the obtained decision tree.
- Greedy search : Tree construction has no backtracking, as searching in the space of ALL possible trees is computationally infeasible.
- Relatively Inaccurate : Many other models, such as Support Vector Machines and Neural Networks, often perform better on similar data.
Solutions : Decision Tree Ensembles
- Random Forest : Combines multiple decision trees to reduce overfitting and improve accuracy.
- Gradient Boosting (e.g., XGBoost) : Builds an ensemble of trees sequentially, where each tree corrects errors made by previous trees.