Naive Bayes

  • Naive Bayes is a probabilistic machine learning algorithm based on Bayes theorem used for classification tasks like spam filtering and text analysis.
  • Bayes Theorem

    Bayes Theorem calculates the probability of a class given features.

    P(C∣X)=P(X∣C)⋅P(C)P(X)P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)⋅P(C)​

    Where:

    • P(C∣X)P(C|X)P(C∣X) → Probability of class C given features X (posterior)

    • P(X∣C)P(X|C)P(X∣C) → Probability of features X given class C (likelihood)

    • P(C)P(C)P(C) → Probability of class C (prior)

    • P(X)P(X)P(X) → Probability of features X (evidence)

    Example

    Predict if an email is Spam given a word “offer”:

    P(Spam∣offer)=P(offer∣Spam)⋅P(Spam)P(offer)P(Spam|offer) = \frac{P(offer|Spam) \cdot P(Spam)}{P(offer)}P(Spam∣offer)=P(offer)P(offer∣Spam)⋅P(Spam)​

    • Spam emails containing “offer” → Likelihood

    • Overall Spam proportion → Prior


    Conditional Probability

    Naive Bayes uses conditional probability to calculate likelihood for each feature:

    P(X∣C)=P(x1∣C)⋅P(x2∣C)⋅...⋅P(xn∣C)P(X|C) = P(x_1|C) \cdot P(x_2|C) \cdot ... \cdot P(x_n|C)P(X∣C)=P(x1​∣C)⋅P(x2​∣C)⋅...⋅P(xn​∣C)

    • Assumes features are independent

    • Multiply probabilities of individual features


    Types of Naive Bayes

    1. Gaussian Naive Bayes

    • For continuous numerical data

    • Assumes features follow Gaussian (normal) distribution

    P(xi∣C)=12πσ2exp⁡(−(xi−μ)22σ2)P(x_i|C) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\Big(-\frac{(x_i-\mu)^2}{2\sigma^2}\Big)P(xi​∣C)=2πσ2​1​exp(−2σ2(xi​−μ)2​)

    2. Multinomial Naive Bayes

    • For discrete data / count data

    • Often used in text classification

    Example: Count of words in documents

    3. Bernoulli Naive Bayes

    • Binary features (0/1)

    • Example: Word present or absent in document


    Text Classification Example

    Naive Bayes is widely used for spam detection, sentiment analysis.

    Gaussian Naive Bayes

Naive Bayes Classification Example in Python for Student Pass/Fail Prediction

This Python example demonstrates how to use the Gaussian Naive Bayes algorithm to predict whether a student will pass or fail based on study hours and attendance. The code creates a dataset, splits it into training and testing sets, trains the Gaussian Naive Bayes model, and evaluates its performance by making predictions and calculating accuracy.

# Step 1: Import Libraries
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

# Step 2: Dataset (Study Hours vs Result)
X = np.array([[2, 60],
              [5, 85],
              [1, 40],
              [6, 90],
              [3, 70]])
y = np.array([0, 1, 0, 1, 0])  # 0=Fail, 1=Pass

# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Create Model
model = GaussianNB()

# Step 5: Train Model
model.fit(X_train, y_train)

# Step 6: Predict
prediction = model.predict(X_test)
print("Prediction:", prediction)
print("Accuracy:", model.score(X_test, y_test))
  • Output:

    Prediction: [0]

    Accuracy: 0.0