Decision Trees
A decision tree recursively splits the dataset into smaller regions such that the prediction error within each region is minimized.
STEP 1: Objective of Decision Tree
For regression trees, the objective is to minimize 01.00.00 Mean Squared Error:
At each step, the algorithm searches for a split that minimizes the MSE of the resulting partitions.
The goal is to create regions where the target values are as similar as possible.
Code
import pandas as pd
# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"
columns = ["buying", "maint", "doors", "persons", "lug_boot", "safety", "class"]
df = pd.read_csv(url, names=columns)
# Separate features and target
X = df.drop("class", axis=1)
y = df["class"]
df.head(10)
STEP 2: Prediction in Each Region
In a regression tree, the prediction assigned to each region is the mean of the target values within that region.
For a region
The mean is chosen because it minimizes the squared error within that region.
Code
def predict_region(y_values):
return y_values.mean()
Explanation of Code
y_valuesrepresents all target values in a region.- The function returns the mean, which is the optimal prediction for minimizing MSE.
STEP 3: Finding the Best Split
At each node, the decision tree evaluates all possible splits across all features and selects the split that minimizes the total Mean Squared Error after the split.
For a feature
For each split
-
Left region:
-
Right region:
Prediction in each region
Total MSE after split
Objective
Key Idea
The best split is the one that creates two regions where the target values are as homogeneous as possible.
Code
import numpy as np
def find_best_split(X, y):
best_feature = None
best_split = None
best_mse = float("inf")
n, p = X.shape
for feature in range(p):
sorted_idx = np.argsort(X[:, feature])
X_sorted = X[sorted_idx, feature]
y_sorted = y[sorted_idx]
for i in range(1, n):
s = (X_sorted[i-1] + X_sorted[i]) / 2
left_y = y_sorted[:i]
right_y = y_sorted[i:]
y_left_mean = left_y.mean()
y_right_mean = right_y.mean()
mse_left = np.mean((left_y - y_left_mean) ** 2)
mse_right = np.mean((right_y - y_right_mean) ** 2)
mse_split = (len(left_y)/n) * mse_left + (len(right_y)/n) * mse_right
if mse_split < best_mse:
best_mse = mse_split
best_feature = feature
best_split = s
return best_feature, best_split
Explanation of Code
- Loop over each feature
- Sort data along that feature
- Try all midpoint splits
- Compute:
- mean of left and right
- MSE of both sides
- Compute weighted MSE
- Track the split with lowest MSE
STEP 4: Split the Tree
After finding the optimal feature
Prediction in Each Region
For each region, the prediction is the mean of the target values:
The split creates two child nodes, each with its own prediction and its own data subset.
Code
def split_data(X, y, feature, threshold):
left_mask = X[:, feature] <= threshold
right_mask = X[:, feature] > threshold
X_left = X[left_mask]
y_left = y[left_mask]
X_right = X[right_mask]
y_right = y[right_mask]
return X_left, y_left, X_right, y_right
STEP 5: Recursive Splitting (Growing the Tree)
After splitting the data into child nodes, the decision tree applies the same process recursively on each node until a stopping condition is met.
Mathematical View
At each node, we again solve:
but now using only the data in that node.
Each node represents a region
This creates a hierarchical partitioning of the feature space.
Key Idea
Each split reduces the MSE locally, and the tree grows by repeatedly applying:
- Find best split
- Partition data
- Repeat on each child
Code (Recursive Structure)
def build_tree(X, y, depth=0, max_depth=3):
# stopping condition
if len(y) <= 1 or depth == max_depth:
return {"prediction": y.mean()}
feature, threshold = find_best_split(X, y)
X_left, y_left, X_right, y_right = split_data(X, y, feature, threshold)
return {
"feature": feature,
"threshold": threshold,
"left": build_tree(X_left, y_left, depth + 1, max_depth),
"right": build_tree(X_right, y_right, depth + 1, max_depth)
}
Explanation of Code
- At each node:
- Find best split
- Split data
- Call the same function on left and right
- This is recursion
- Tree grows top-down
- Max depth reached
- Number of samples is too small
- MSE reduction is very small
- All target values are the same
STEP 6: Leaf Node (Final Prediction)
A leaf node is a terminal node where no further splitting occurs, and it outputs a prediction.
For a leaf node region
Each leaf stores a constant value which is the mean of the target values in that region.
Code
if len(y) <= 1 or depth == max_depth:
return {"prediction": y.mean()}
Explanation
- When stopping condition is met
- We compute mean of
- That becomes the prediction