Random Forest Regression
- This module explains Random Forest Regression, covering ensemble learning, bagging technique, feature importance, and the advantages of Random Forest models.
Ensemble Learning
What is Ensemble Learning?
Ensemble Learning means combining multiple models to create a stronger model.
Basic idea:
“Many weak learners together form a strong learner.”
In Random Forest:
Base model = Decision Tree
Final prediction = Average of all tree predictions
Example
Suppose 3 trees predict house price:
Tree 1 → 300,000
Tree 2 → 320,000
Tree 3 → 310,000Final Prediction:
(300000+320000+310000)/3=310000(300000 + 320000 + 310000) / 3 = 310000(300000+320000+310000)/3=310000
This reduces error and improves stability.
Bagging Technique (Bootstrap Aggregating)
Random Forest uses Bagging.
What is Bagging?
Create multiple random samples from dataset (with replacement).
Train a separate decision tree on each sample.
Average all predictions.
Why “with replacement”?
Because:
Some rows may repeat
Some rows may be skipped
Each tree sees slightly different data
This increases diversity between trees.
Example
Original dataset: 100 rows
Tree 1 → Random 100 rows (some repeated)
Tree 2 → Different random 100 rows
Tree 3 → Another random sampleEach tree learns slightly differently.
Feature Importance
Random Forest automatically calculates which features are most important.
How?
It measures:
How much each feature reduces error across all trees
More error reduction → Higher importance
Example (House Price)
Features:
Area
Bedrooms
Age
Output:
Area → 0.65
Bedrooms → 0.25
Age → 0.10
This means Area is the most important feature.
Advantages of Random Forest
1. Reduces Overfitting
Because it averages multiple trees.
2. High Accuracy
Often performs better than a single Decision Tree.
3. Handles Non-Linear Data
Works well with complex relationships.
4. Works with Large Datasets
Can handle many features.
5. Provides Feature Importance
Helps in understanding model behavior.
Example: House Price Prediction
Random Forest Regression for House Price Prediction
This code demonstrates how to use a Random Forest Regressor in Python to predict house prices based on features like area and number of bedrooms. The dataset is split into training and testing sets, the model is trained, and a price prediction is made for a new house. It also displays the importance of each feature used in the model.
# Step 1: Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Step 2: Create Dataset
X = np.array([
[1000, 2],
[1500, 3],
[2000, 4],
[2500, 4],
[3000, 5]
])
y = np.array([200000, 300000, 400000, 500000, 600000])
# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Step 4: Create Model
model = RandomForestRegressor(n_estimators=100, random_state=42)
# Step 5: Train Model
model.fit(X_train, y_train)
# Step 6: Predict
prediction = model.predict([[1800, 3]])
print("Predicted Price:", prediction[0])
# Step 7: Feature Importance
print("Feature Importance:", model.feature_importances_)
Output:
Predicted Price: 323000.0
Feature Importance: [0.59953618 0.40046382]
Random Forest vs Decision Tree