Descriptive Statistics

  • Learn how to summarize datasets using descriptive statistics.
  • Descriptive Statistics

    Descriptive statistics helps us summarize and understand data.
    Before using visualization tools like Seaborn or Matplotlib, we first calculate basic statistical measures.


    Mean (Average)

    The Mean is the sum of all values divided by the total number of values.

    Formula:

    Mean=∑XNMean = \frac{\sum X}{N}Mean=N∑X​

    Example:

    Student marks:
    50, 60, 70, 80, 90

    Mean=(50+60+70+80+90)/5=70Mean = (50 + 60 + 70 + 80 + 90) / 5 = 70Mean=(50+60+70+80+90)/5=70

    The average marks = 70

    Example:

Calculating Mean using NumPy

This code calculates and prints the average (mean) of a list of marks using NumPy’s

import numpy as np

marks = [50, 60, 70, 80, 90]
print("Mean:", np.mean(marks))
  • Median

    The Median is the middle value when data is arranged in ascending order.

    Example 1 (Odd numbers):

    Data:
    10, 20, 30, 40, 50

    Middle value = 30


    Example 2 (Even numbers):

    Data:
    10, 20, 30, 40

    Middle values = 20 and 30

    Median=(20+30)/2=25Median = (20 + 30) / 2 = 25Median=(20+30)/2=25

    Example:

Calculating Median using NumPy

This code calculates and prints the median (middle value) of the given dataset using NumPy’s median() function.

import numpy as np

data = [10, 20, 30, 40]
print("Median:", np.median(data))
  • 👉 Median is useful when data has outliers.



    Mode

    The Mode is the value that appears most frequently.

    Example:

    Data:
    5, 10, 10, 20, 20, 20, 30

    Mode = 20 (appears 3 times)

    Example:

Calculating Mode using SciPy

This code uses scipy.stats.mode() to calculate and display the mode (most frequently occurring value) from the given dataset.

from scipy import stats

data = [5, 10, 10, 20, 20, 20, 30]
print("Mode:", stats.mode(data))
  • 👉 Mode is useful for categorical data.


    Practical Example with Real Dataset

    Example: Monthly Sales (₹)

    Sales data:
    20000, 22000, 25000, 22000, 27000, 30000, 22000

    Measure

    Value

    Mean

    24000

    Median

    22000

    Mode

    22000

    Interpretation:

    • Average sales ≈ ₹24000

    • Middle sales value = ₹22000

    • Most frequent sales = ₹22000