Naive Bayes
- Naive Bayes is a probabilistic machine learning algorithm based on Bayes theorem used for classification tasks like spam filtering and text analysis.
Bayes Theorem
Bayes Theorem calculates the probability of a class given features.
P(C∣X)=P(X∣C)⋅P(C)P(X)P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}P(C∣X)=P(X)P(X∣C)⋅P(C)
Where:
P(C∣X)P(C|X)P(C∣X) → Probability of class C given features X (posterior)
P(X∣C)P(X|C)P(X∣C) → Probability of features X given class C (likelihood)
P(C)P(C)P(C) → Probability of class C (prior)
P(X)P(X)P(X) → Probability of features X (evidence)
Example
Predict if an email is Spam given a word “offer”:
P(Spam∣offer)=P(offer∣Spam)⋅P(Spam)P(offer)P(Spam|offer) = \frac{P(offer|Spam) \cdot P(Spam)}{P(offer)}P(Spam∣offer)=P(offer)P(offer∣Spam)⋅P(Spam)
Spam emails containing “offer” → Likelihood
Overall Spam proportion → Prior
Conditional Probability
Naive Bayes uses conditional probability to calculate likelihood for each feature:
P(X∣C)=P(x1∣C)⋅P(x2∣C)⋅...⋅P(xn∣C)P(X|C) = P(x_1|C) \cdot P(x_2|C) \cdot ... \cdot P(x_n|C)P(X∣C)=P(x1∣C)⋅P(x2∣C)⋅...⋅P(xn∣C)
Assumes features are independent
Multiply probabilities of individual features
Types of Naive Bayes
1. Gaussian Naive Bayes
For continuous numerical data
Assumes features follow Gaussian (normal) distribution
P(xi∣C)=12πσ2exp(−(xi−μ)22σ2)P(x_i|C) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\Big(-\frac{(x_i-\mu)^2}{2\sigma^2}\Big)P(xi∣C)=2πσ21exp(−2σ2(xi−μ)2)
2. Multinomial Naive Bayes
For discrete data / count data
Often used in text classification
Example: Count of words in documents
3. Bernoulli Naive Bayes
Binary features (0/1)
Example: Word present or absent in document
Text Classification Example
Naive Bayes is widely used for spam detection, sentiment analysis.
Gaussian Naive Bayes
Naive Bayes Classification Example in Python for Student Pass/Fail Prediction
This Python example demonstrates how to use the Gaussian Naive Bayes algorithm to predict whether a student will pass or fail based on study hours and attendance. The code creates a dataset, splits it into training and testing sets, trains the Gaussian Naive Bayes model, and evaluates its performance by making predictions and calculating accuracy.
# Step 1: Import Libraries
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
# Step 2: Dataset (Study Hours vs Result)
X = np.array([[2, 60],
[5, 85],
[1, 40],
[6, 90],
[3, 70]])
y = np.array([0, 1, 0, 1, 0]) # 0=Fail, 1=Pass
# Step 3: Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Create Model
model = GaussianNB()
# Step 5: Train Model
model.fit(X_train, y_train)
# Step 6: Predict
prediction = model.predict(X_test)
print("Prediction:", prediction)
print("Accuracy:", model.score(X_test, y_test))
Output:
Prediction: [0]
Accuracy: 0.0