❮ Previous Next ❯

Random Forest Classifier

Random Forest is an ensemble machine learning algorithm that combines multiple decision trees to produce more accurate and stable predictions.

Voting Mechanism
Random Forest uses majority voting to decide the final class.
How it Works:
1. Multiple trees predict the class of a sample.
2. Count votes for each class.
3. Class with most votes is chosen.
Example
Suppose 5 trees predict a sample:
Tree
Prediction
1
Class A
2
Class B
3
Class A
4
Class A
5
Class B
- Votes: Class A = 3, Class B = 2
- Final Prediction → Class A
This is called majority voting.

Bagging (Bootstrap Aggregating)
Random Forest uses Bagging to create multiple datasets for each tree.
Steps:
1. Create multiple random samples with replacement from the training data.
2. Train a decision tree on each sample.
3. Combine predictions of all trees (voting for classification).
Why Bagging Helps?
- Reduces variance
- Prevents overfitting
- Each tree sees slightly different data → more robust model
Feature Selection (Random Subspace Method)
Random Forest introduces random feature selection:
- Each tree considers a random subset of features when splitting nodes
- Not all features are used at every split
Benefits:
- Reduces correlation between trees
- Increases model diversity
- Improves generalization
Model Stability
Random Forest is more stable than a single Decision Tree:
- Less sensitive to noise in data
- Less overfitting
- Performance is more consistent across datasets
Example: Pass/Fail Classification

Random Forest Classification Example in Python for Student Pass/Fail Prediction

This Python example demonstrates how to use the Random Forest Classifier to predict whether a student will pass or fail based on study hours and attendance. The code creates a dataset, splits it into training and testing sets, trains a Random Forest model with multiple decision trees, and evaluates the model’s accuracy. It also displays feature importance to show which factors contribute more to the prediction.

# Step 1: Import Libraries
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Step 2: Create Dataset
X = np.array([
    [2, 50],
    [3, 60],
    [5, 80],
    [6, 90],
    [1, 40]
])  # Features: [Study Hours, Attendance]

y = np.array([0, 0, 1, 1, 0])  # Labels: 0=Fail, 1=Pass

# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Step 4: Create Random Forest Model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Step 5: Train Model
model.fit(X_train, y_train)

# Step 6: Predict
prediction = model.predict(X_test)
print("Prediction:", prediction)
print("Accuracy:", model.score(X_test, y_test))

# Step 7: Feature Importance
print("Feature Importance:", model.feature_importances_)

Output:

Prediction: [0 1]

Accuracy: 0.5

Key Concepts

Concept	Meaning
Voting Mechanism	Majority voting among trees
Bagging	Random sampling with replacement
Feature Selection	Random subset of features at each split
Model Stability	Reduced overfitting, robust predictions

❮ Previous Next ❯

Tree	Prediction
1	Class A
2	Class B
3	Class A
4	Class A
5	Class B

Voting Mechanism

How it Works:

Example

Bagging (Bootstrap Aggregating)

Steps:

Why Bagging Helps?

Feature Selection (Random Subspace Method)

Benefits:

Model Stability

Example: Pass/Fail Classification

Random Forest Classification Example in Python for Student Pass/Fail Prediction

Prediction: [0 1]

Accuracy: 0.5

Key Concepts

Login

Create Account