Summary & Patterns

  • Learn to summarize datasets and detect meaningful patterns.
  • Summary Statistics

    What are Summary Statistics?

    Summary statistics are numerical values that describe key features of a dataset.

    They help answer:

    • What is the average?

    • How spread out is the data?

    • What is the minimum and maximum value?


    Important Summary Measures

    Central Tendency

    • Mean → Average

    • Median → Middle value

    • Mode → Most frequent value


    Variability

    • Range → Max − Min

    • Variance

    • Standard Deviation


    Distribution Shape

    • Skewness

    • Kurtosis

    Example in Python

import pandas as pd

df = pd.read_csv("data.csv")

df.describe()
  • describe() gives:

    • count

    • mean

    • std

    • min

    • 25%, 50%, 75%

    • max


    🎯 Why Summary Statistics Matter?

    ✔ Quickly understand dataset
    ✔ Detect unusual values
    ✔ Compare different groups
    ✔ Prepare for modeling



    Correlation Analysis

    What is Correlation?

    Correlation measures the relationship between two variables.

    It tells:

    • Do they increase together?

    • Does one increase while other decreases?

    • Or no relationship?


    Correlation Value Range

    −1≤r≤1-1 \leq r \leq 1−1≤r≤1

    Value

    Meaning

    +1

    Perfect positive correlation

    0

    No correlation

    -1

    Perfect negative correlation



    Example

    Study Hours vs Marks

    • If study hours increase & marks increase → Positive correlation

    • If one increases & other decreases → Negative correlation


    Calculate Correlation in Python

df.corr()
  • Visualizing Correlation (Heatmap)

Visualizing Correlation with Heatmap

This code uses Seaborn to create a heatmap of the correlation matrix, visually showing relationships between numerical variables in the dataset.

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.show()
  • Real-World Example

    Banking

    Income vs Loan Amount → Positive correlation

    E-commerce

    Discount vs Profit → Possibly negative correlation

    Education

    Attendance vs Marks → Positive correlation