Data Preparation
-
Prepare and clean data to ensure accurate and meaningful analysis results.
1. Remove Duplicates
Go to: Data → Remove Duplicates
Example:
If the same Employee ID appears twice → remove the duplicate entry.2. Handling Missing Values
Step 1: Identify Blank Cells
Use filters or conditional formatting.
Step 2: Fill Missing Values With:
0 (for numeric data)
"Unknown" (for text data)
Average value (for numerical columns)
Example Formula:
=IF(A2="", "Unknown", A2)
3. Trim Extra Spaces
Remove unwanted spaces:
=TRIM(A2)
4. Change Data Type
Convert Text to Number
Correct Date format
Apply Percentage formatting
Incorrect data types can affect calculations and dashboards.
5. Text Cleaning
Uppercase:
=UPPER(A2)
- Lowercase:
=LOWER(A2)
Data Cleaning in SQL
SQL is used for cleaning large datasets stored in databases.
Example Table: employees
1. Remove Duplicates
SELECT DISTINCT * FROM employees;
- 2. Handle NULL Values
SELECT
COALESCE(salary, 0) AS salary
FROM employees;
- 3. Delete NULL Rows
DELETE FROM employees
WHERE salary IS NULL;
- 4. Standardize Text
UPDATE employees
SET city = UPPER(city);
Data Cleaning in Python (Pandas)
Python is powerful for advanced data transformation and automation.
Import Pandas
import pandas as pd
- Load Data
df = pd.read_csv("data.csv")
- 1. Check Missing Values
df.isnull().sum()
- 2. Fill Missing Values
df["salary"].fillna(0, inplace=True)
- 3. Remove Duplicates
df.drop_duplicates(inplace=True)
- 4. Change Data Type
df["salary"] = df["salary"].astype(int)
- 5. Strip Extra Spaces
df["name"] = df["name"].str.strip()
Excel vs SQL vs Python vs Power BI