Data Cleaning & Transformation (Power Query)

  • Clean and transform datasets using Power Query in Power BI.
  • Importance of Data Cleaning

    Raw data is usually:

    • Inconsistent

    • Contains duplicates

    • Has missing values

    • Wrong data types

    • Extra spaces / formatting issues

    If data is not cleaned:
    Wrong totals
    Incorrect KPIs
    Confusing dashboards

    Golden Rule:

    Garbage In → Garbage Out

    Power Query helps convert messy raw data into structured, reliable data.


    Removing Duplicates

    Why Remove Duplicates?

    Duplicate rows can:

    • Double-count sales

    • Inflate revenue

    • Distort analysis

    How to Remove:

    1. Select column(s)

    2. Go to Home → Remove Rows

    3. Click “Remove Duplicates”

    Always decide carefully:
    Remove duplicates based on ID? Email? Order number?


    Handling Null / Missing Values

    Missing values are common in real datasets.

    Problems Caused by Nulls:

    • Blank visuals

    • Calculation errors

    • Incorrect averages

    Solutions:

    Replace with default value (0, “Unknown”)
    Remove rows
    Fill Down / Fill Up
    Use conditional column

    Example:
    Replace null in Sales → 0


    Correcting Data Types

    Very important step ⚠

    If data type is wrong:

    • Sum won’t work

    • Dates won’t group

    • Sorting fails

    Common Data Types:

    • Text

    • Whole Number

    • Decimal Number

    • Date

    • Date/Time

    • Boolean

    How to Fix:

    Select column → Change Data Type

    Example:
    If “Sales” is Text → Change to Decimal Number.


    Split Columns

    Used when multiple values exist in one column.

    Example:

    Full Name → “Daksha Rajput”

    Split by space →
    First Name | Last Name

    How to Split:

    Select column → Transform → Split Column →

    • By Delimiter

    • By Number of Characters

    Useful for:

    • Dates in text format

    • Codes

    • Product categories


    Merge Queries

    Used to combine data from multiple tables.

    Similar to SQL JOIN.

    Example:

    Sales Table + Customer Table

    Merge using Customer ID

    Types of Join:

    • Left Outer (most common)

    • Inner

    • Full Outer

    How to Merge:

    Home → Merge Queries →
    Select common column

    Essential for building star schema.


    Basic Data Transformations

    Power Query provides many transformations:

    Remove Columns

    Delete unnecessary columns to improve performance.

    Rename Columns

    Use clean naming conventions.
    Example:
    cust_nm ❌
    Customer_Name ✅

    Filter Rows

    Remove unwanted records.
    Example:
    Remove sales before 2022

    Group By

    Summarize data.
    Example:
    Total Sales by Region

    Add Custom Column

    Create new logic.
    Example:
    If Sales > 10000 → “High” else “Low”

    Trim & Clean Text

    Remove extra spaces.
    Transform → Format → Trim


    Typical Data Cleaning Workflow

Import Data
 ⬇
 Remove unnecessary columns
 ⬇
 Fix data types
 ⬇
 Handle nulls
 ⬇
 Remove duplicates
 ⬇
 Split / Merge if needed
 ⬇
 Close & Apply