Data Preparation

  • Prepare and clean data to ensure accurate and meaningful analysis results.
  • 1. Remove Duplicates

    Go to: Data → Remove Duplicates

    Example:
    If the same Employee ID appears twice → remove the duplicate entry.


    2. Handling Missing Values

    Step 1: Identify Blank Cells

    Use filters or conditional formatting.

    Step 2: Fill Missing Values With:

    • 0 (for numeric data)

    • "Unknown" (for text data)

    • Average value (for numerical columns)

    Example Formula:

=IF(A2="", "Unknown", A2)
  • 3. Trim Extra Spaces

    Remove unwanted spaces:

=TRIM(A2)
  • 4. Change Data Type

    • Convert Text to Number

    • Correct Date format

    • Apply Percentage formatting

    Incorrect data types can affect calculations and dashboards.


    5. Text Cleaning

    Uppercase:

=UPPER(A2)
  • Lowercase:

=LOWER(A2)
  • Data Cleaning in SQL

    SQL is used for cleaning large datasets stored in databases.

    Example Table: employees

    id

    name

    salary

    city

    1

    Rahul

    30000

    Delhi

    2

    Rahul

    30000

    Delhi


    1. Remove Duplicates

SELECT DISTINCT * FROM employees;
  • 2. Handle NULL Values

SELECT 
  COALESCE(salary, 0) AS salary
FROM employees;
  • 3. Delete NULL Rows

DELETE FROM employees
WHERE salary IS NULL;
  • 4. Standardize Text

UPDATE employees
SET city = UPPER(city);
  • Data Cleaning in Python (Pandas)

    Python is powerful for advanced data transformation and automation.


    Import Pandas

import pandas as pd
  • Load Data

df = pd.read_csv("data.csv")
  • 1. Check Missing Values

df.isnull().sum()
  • 2. Fill Missing Values

df["salary"].fillna(0, inplace=True)
  • 3. Remove Duplicates

df.drop_duplicates(inplace=True)
  • 4. Change Data Type

df["salary"] = df["salary"].astype(int)
  • 5. Strip Extra Spaces

df["name"] = df["name"].str.strip()
  • Excel vs SQL vs Python vs Power BI

    Tool

    Best For

    Excel

    Small datasets

    SQL

    Database cleaning

    Python

    Advanced transformation

    Power BI

    Visualization & dashboards