Next

Histogram Plot

  • This module explains how to create histograms in Seaborn to analyze data distribution. You will learn about the bins concept, difference between frequency and density, and how to use hue for comparing multiple categories in Python.
  • What is a Histogram?

    A Histogram is used to visualize the distribution of numerical data.

    It shows:

    ✔ How data is distributed
    ✔ Frequency of values
    ✔ Shape of data
    ✔ Skewness
    ✔ Spread

    Unlike bar charts, histograms are used for continuous numerical variables.


    Understanding Data Distribution

    Theory

    Histogram divides numerical data into:

    Intervals (bins)
    Counts how many values fall in each bin

    It helps identify:

    • Normal distribution

    • Skewed distribution

    • Uniform distribution

    • Bimodal distribution

    Example Code

Distribution of Total Bill (Histogram)

This visualization uses a histogram to show the distribution of total bill amounts in the dataset.

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

sns.histplot(x="total_bill", data=tips)
plt.title("Distribution of Total Bill")
plt.show()
Lesson image
  • Output Explanation

    • X-axis → Total Bill values

    • Y-axis → Frequency

    • Bars → Count of values in each interval

    If most bars are in middle → Data is centered.
    If bars stretch more on right → Right skewed.


    Bins Concept

    What are Bins?

    Bins are intervals that group numeric values.

    Example:

    If total_bill ranges from 0 to 50
    Bins = 5

    Then intervals may be:

    0–10
    10–20
    20–30
    30–40
    40–50


    Control Number of Bins

Histogram with 5 Bins

This visualization displays the distribution of total bill amounts using a histogram with a controlled number of bins (bins=5).

sns.histplot(x="total_bill", data=tips, bins=5)
plt.title("Histogram with 5 Bins")
plt.show()
Lesson image
  • Effect of Bins

    Fewer Bins

    More Bins

    Less detail

    More detail

    Smoother view

    More granular

    May hide patterns

    May show noise

    Choosing correct bin size is important.


    Frequency vs Density

    Frequency (Default)

    Shows count of values in each bin.

    sns.histplot(x="total_bill", data=tips)

    Y-axis → Count


    Density

    Shows probability distribution instead of count.

    sns.histplot(x="total_bill", data=tips, stat="density")

    Y-axis → Density
    Total area under curve = 1

Lesson image
  • Add KDE Curve

sns.histplot(x="total_bill", data=tips, kde=True)
  • KDE = Smooth density curve over histogram.
Lesson image
  • Difference Table

    Feature

    Frequency

    Density

    Shows

    Count

    Probability

    Y-axis

    Numbers

    Proportion

    Total sum

    Total observations

    1


    Using Hue (Multiple Distributions)

    Why Use Hue?

    To compare distributions of two categories.

    Example:

    • Male vs Female spending

    • Smoker vs Non-Smoker

    Example

Total Bill Distribution by Gender (Histogram with KDE)

This visualization shows the distribution of total bill amounts separated by gender using a histogram with a Kernel Density Estimate (KDE) curve.

sns.histplot(x="total_bill",
            hue="sex",
            data=tips,
            kde=True)
plt.title("Total Bill Distribution by Gender")
plt.show()
Lesson image
  • Output Explanation

    • Different colors → Male & Female

    • Compare:

      • Distribution shape

      • Skewness

      • Spread

      • Which group spends more

    Stacked Histogram

sns.histplot(x="total_bill",
            hue="sex",
            data=tips,
            multiple="stack")
Lesson image
  • Options:

    • layer (default)

    • stack

    • dodge

    • fill

Next