Goodness of Fit Metrics

  • Goodness of fit metrics evaluate how well a regression model explains the variance in the dataset.
  • R² Score (Coefficient of Determination)

    R² measures the proportion of variance in the dependent variable that is predictable from the independent variables.

    R2=1−∑i=1n(yi−y^i)2∑i=1n(yi−yˉ)2R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}R2=1−∑i=1n​(yi​−yˉ​)2∑i=1n​(yi​−y^​i​)2​

    Where:

    • yiy_iyi​ = Actual value

    • y^i\hat{y}_iy^​i​ = Predicted value

    • yˉ\bar{y}yˉ​ = Mean of actual values

    Characteristics

    • Value ranges from 0 to 1 (sometimes negative if model is very poor)

    • R² = 1 → Perfect prediction

    • R² = 0 → Model does no better than mean prediction

    Example

    • Actual = [3, 5, 2]

    • Predicted = [2, 4, 3]

    yˉ=3+5+23=3.33\bar{y} = \frac{3+5+2}{3} = 3.33yˉ​=33+5+2​=3.33 R2=1−(3−2)2+(5−4)2+(2−3)2(3−3.33)2+(5−3.33)2+(2−3.33)2=0.5R^2 = 1 - \frac{(3-2)^2 + (5-4)^2 + (2-3)^2}{(3-3.33)^2 + (5-3.33)^2 + (2-3.33)^2} = 0.5R2=1−(3−3.33)2+(5−3.33)2+(2−3.33)2(3−2)2+(5−4)2+(2−3)2​=0.5


    Adjusted R²

    Adjusted R² adjusts the R² value based on the number of predictors in the model.
    It penalizes adding irrelevant features that don’t improve the model.

    Adjusted R2=1−(1−R2)(n−1)n−p−1\text{Adjusted } R^2 = 1 - \frac{(1-R^2)(n-1)}{n-p-1}Adjusted R2=1−n−p−1(1−R2)(n−1)​

    Where:

    • nnn = Number of observations

    • ppp = Number of independent variables

    Characteristics

    • Prevents overestimation of model performance when adding unnecessary features

    • Can decrease if added features do not improve the model

    Example: R² and Adjusted R²

R² Score and Adjusted R² Calculation in Python using Linear Regression

This Python example demonstrates how to evaluate a Linear Regression model using the R² Score and Adjusted R². The code creates a small dataset, trains a Linear Regression model, predicts values, and calculates the R² score to measure how well the model explains the variance in the data. It also computes Adjusted R², which adjusts the score based on the number of predictors and observations.

# Step 1: Import Libraries
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Step 2: Dataset
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Step 3: Create Model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Step 4: R² Score
r2 = r2_score(y, y_pred)
print("R² Score:", r2)

# Step 5: Adjusted R²
n = X.shape[0]  # number of observations
p = X.shape[1]  # number of predictors
adj_r2 = 1 - (1-r2)*(n-1)/(n-p-1)
print("Adjusted R²:", adj_r2)
  • Output:

    R² Score: 0.6000000000000001

    Adjusted R²: 0.4666666666666668