Decision Trees

A decision tree recursively splits the dataset into smaller regions such that the prediction error within each region is minimized.

STEP 1: Objective of Decision Tree

For regression trees, the objective is to minimize 01.00.00 Mean Squared Error:

MSE=1ni=1n(yiy^)2

At each step, the algorithm searches for a split that minimizes the MSE of the resulting partitions.

The goal is to create regions where the target values are as similar as possible.

Code

import pandas as pd

# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data"

columns = ["buying", "maint", "doors", "persons", "lug_boot", "safety", "class"]
df = pd.read_csv(url, names=columns)

# Separate features and target
X = df.drop("class", axis=1)
y = df["class"]
df.head(10)

STEP 2: Prediction in Each Region

In a regression tree, the prediction assigned to each region is the mean of the target values within that region.

For a region R:

y^R=1|R|iRyi

The mean is chosen because it minimizes the squared error within that region.

Code

def predict_region(y_values):
    return y_values.mean()

Explanation of Code


STEP 3: Finding the Best Split

At each node, the decision tree evaluates all possible splits across all features and selects the split that minimizes the total Mean Squared Error after the split.

For a feature Xj, we first sort the values and consider candidate splits:

s=xj(k)+xj(k+1)2

For each split s, we divide the data into:

Prediction in each region

y^L=1nLiRLyiy^R=1nRiRRyi

Total MSE after split

MSEsplit=nLniRL(yiy^L)2+nRniRR(yiy^R)2

Objective

minj,s MSEsplit

Key Idea

The best split is the one that creates two regions where the target values are as homogeneous as possible.

Code

import numpy as np

def find_best_split(X, y):
    best_feature = None
    best_split = None
    best_mse = float("inf")

    n, p = X.shape

    for feature in range(p):
        sorted_idx = np.argsort(X[:, feature])
        X_sorted = X[sorted_idx, feature]
        y_sorted = y[sorted_idx]

        for i in range(1, n):
            s = (X_sorted[i-1] + X_sorted[i]) / 2

            left_y = y_sorted[:i]
            right_y = y_sorted[i:]

            y_left_mean = left_y.mean()
            y_right_mean = right_y.mean()

            mse_left = np.mean((left_y - y_left_mean) ** 2)
            mse_right = np.mean((right_y - y_right_mean) ** 2)

            mse_split = (len(left_y)/n) * mse_left + (len(right_y)/n) * mse_right

            if mse_split < best_mse:
                best_mse = mse_split
                best_feature = feature
                best_split = s

    return best_feature, best_split

Explanation of Code

This is a greedy algorithm. It chooses the best split at the current node without considering future splits.

STEP 4: Split the Tree

After finding the optimal feature Xj and split value s from Step 3, the dataset is partitioned into two regions.

RL=xXjsRR=xXj>s

Prediction in Each Region

For each region, the prediction is the mean of the target values:

y^L=1nLiRLyiy^R=1nRiRRyi

The split creates two child nodes, each with its own prediction and its own data subset.

Code

def split_data(X, y, feature, threshold):
    left_mask = X[:, feature] <= threshold
    right_mask = X[:, feature] > threshold

    X_left = X[left_mask]
    y_left = y[left_mask]

    X_right = X[right_mask]
    y_right = y[right_mask]

    return X_left, y_left, X_right, y_right

STEP 5: Recursive Splitting (Growing the Tree)

After splitting the data into child nodes, the decision tree applies the same process recursively on each node until a stopping condition is met.

Mathematical View

At each node, we again solve:

minj,s MSEsplit

but now using only the data in that node.

Each node represents a region R, and we further partition it into:

RL,RR

This creates a hierarchical partitioning of the feature space.

Key Idea

Each split reduces the MSE locally, and the tree grows by repeatedly applying:

Code (Recursive Structure)

def build_tree(X, y, depth=0, max_depth=3):
    # stopping condition
    if len(y) <= 1 or depth == max_depth:
        return {"prediction": y.mean()}

    feature, threshold = find_best_split(X, y)

    X_left, y_left, X_right, y_right = split_data(X, y, feature, threshold)

    return {
        "feature": feature,
        "threshold": threshold,
        "left": build_tree(X_left, y_left, depth + 1, max_depth),
        "right": build_tree(X_right, y_right, depth + 1, max_depth)
    }

Explanation of Code

Recursion stops when:

  1. Max depth reached
  2. Number of samples is too small
  3. MSE reduction is very small
  4. All target values are the same


STEP 6: Leaf Node (Final Prediction)

A leaf node is a terminal node where no further splitting occurs, and it outputs a prediction.

For a leaf node region R:

y^=1|R|iRyi

Each leaf stores a constant value which is the mean of the target values in that region.

Code

if len(y) <= 1 or depth == max_depth:
    return {"prediction": y.mean()}

Explanation


🎯 Interview Questions

02.01.01 Decision Trees - Interview