Data Summary & Patterns | Identifying Trends in Data

❮ Previous Next ❯

Summary & Patterns

Learn to summarize datasets and detect meaningful patterns.

Summary Statistics
What are Summary Statistics?
Summary statistics are numerical values that describe key features of a dataset.
They help answer:
- What is the average?
- How spread out is the data?
- What is the minimum and maximum value?
Important Summary Measures
Central Tendency
- Mean → Average
- Median → Middle value
- Mode → Most frequent value
Variability
- Range → Max − Min
- Variance
- Standard Deviation
Distribution Shape
- Skewness
- Kurtosis
Example in Python

import pandas as pd

df = pd.read_csv("data.csv")

df.describe()

describe() gives:
- count
- mean
- std
- min
- 25%, 50%, 75%
- max
🎯 Why Summary Statistics Matter?
✔ Quickly understand dataset
✔ Detect unusual values
✔ Compare different groups
✔ Prepare for modeling

Correlation Analysis
What is Correlation?
Correlation measures the relationship between two variables.
It tells:
- Do they increase together?
- Does one increase while other decreases?
- Or no relationship?
Correlation Value Range
−1≤r≤1-1 \leq r \leq 1−1≤r≤1
Value
Meaning
+1
Perfect positive correlation
0
No correlation
-1
Perfect negative correlation

Example
Study Hours vs Marks
- If study hours increase & marks increase → Positive correlation
- If one increases & other decreases → Negative correlation
Calculate Correlation in Python

df.corr()

Visualizing Correlation (Heatmap)

Visualizing Correlation with Heatmap

This code uses Seaborn to create a heatmap of the correlation matrix, visually showing relationships between numerical variables in the dataset.

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(), annot=True, cmap="coolwarm")
plt.show()

Real-World Example
Banking
Income vs Loan Amount → Positive correlation
E-commerce
Discount vs Profit → Possibly negative correlation
Education
Attendance vs Marks → Positive correlation

❮ Previous Next ❯