Random Forest Regression

  • This module explains Random Forest Regression, covering ensemble learning, bagging technique, feature importance, and the advantages of Random Forest models.
  • Ensemble Learning

    What is Ensemble Learning?

    Ensemble Learning means combining multiple models to create a stronger model.

    Basic idea:

    “Many weak learners together form a strong learner.”

    In Random Forest:

    • Base model = Decision Tree

    • Final prediction = Average of all tree predictions

    Example

    Suppose 3 trees predict house price:

    Tree 1 → 300,000
    Tree 2 → 320,000
    Tree 3 → 310,000

    Final Prediction:

    (300000+320000+310000)/3=310000(300000 + 320000 + 310000) / 3 = 310000(300000+320000+310000)/3=310000

    This reduces error and improves stability.


    Bagging Technique (Bootstrap Aggregating)

    Random Forest uses Bagging.

    What is Bagging?

    1. Create multiple random samples from dataset (with replacement).

    2. Train a separate decision tree on each sample.

    3. Average all predictions.

    Why “with replacement”?

    Because:

    • Some rows may repeat

    • Some rows may be skipped

    • Each tree sees slightly different data

    This increases diversity between trees.

    Example

    Original dataset: 100 rows

    Tree 1 → Random 100 rows (some repeated)
    Tree 2 → Different random 100 rows
    Tree 3 → Another random sample

    Each tree learns slightly differently.


    Feature Importance

    Random Forest automatically calculates which features are most important.

    How?

    It measures:

    • How much each feature reduces error across all trees

    More error reduction → Higher importance

    Example (House Price)

    Features:

    • Area

    • Bedrooms

    • Age

    Output:

    Area        → 0.65

    Bedrooms    → 0.25

    Age         → 0.10

    This means Area is the most important feature.


    Advantages of Random Forest

    1. Reduces Overfitting

    Because it averages multiple trees.

    2. High Accuracy

    Often performs better than a single Decision Tree.

    3. Handles Non-Linear Data

    Works well with complex relationships.

    4. Works with Large Datasets

    Can handle many features.

    5. Provides Feature Importance

    Helps in understanding model behavior.

    Example: House Price Prediction

Random Forest Regression for House Price Prediction

This code demonstrates how to use a Random Forest Regressor in Python to predict house prices based on features like area and number of bedrooms. The dataset is split into training and testing sets, the model is trained, and a price prediction is made for a new house. It also displays the importance of each feature used in the model.

# Step 1: Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Step 2: Create Dataset
X = np.array([
    [1000, 2],
    [1500, 3],
    [2000, 4],
    [2500, 4],
    [3000, 5]
])

y = np.array([200000, 300000, 400000, 500000, 600000])

# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Step 4: Create Model
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Step 5: Train Model
model.fit(X_train, y_train)

# Step 6: Predict
prediction = model.predict([[1800, 3]])

print("Predicted Price:", prediction[0])

# Step 7: Feature Importance
print("Feature Importance:", model.feature_importances_)
  • Output:

    Predicted Price: 323000.0

    Feature Importance: [0.59953618 0.40046382]

    Random Forest vs Decision Tree

    Feature

    Decision Tree

    Random Forest

    Overfitting

    High

    Low

    Accuracy

    Moderate

    High

    Stability

    Less stable

    More stable

    Training Speed

    Fast

    Slower