DataFrame Operations & Manipulation
-
Learn how to filter, sort, select, and transform Pandas DataFrames efficiently.
- Indexing & Selecting Data
Why Indexing Matters
Indexing allows you to access specific rows and columns from a DataFrame quickly and accurately.
Sample DataFrame
import pandas as pd
data = {
"Name": ["Aman", "Riya", "Neha", "Karan"],
"Age": [22, 25, 23, 30],
"City": ["Delhi", "Mumbai", "Pune", "Delhi"],
"Marks": [85, 90, 88, 92]
}
df = pd.DataFrame(data)
print(df)
- Selecting Columns
df["Name"]
df[["Name", "Marks"]]
Selecting Rows by Label (loc)
df.loc[0]
df.loc[1:3]
- Selecting Rows by Position (iloc)
df.iloc[0]
df.iloc[0:2]
loc vs iloc
loc → label-based
iloc → position-based
Filtering Rows & Columns
What is Filtering?
Filtering means selecting data based on conditions.
Filtering Rows
df[df["Age"] > 23]
- Multiple Conditions
df[(df["Age"] > 23) & (df["City"] == "Delhi")]
- Filtering Columns
df.loc[:, ["Name", "Marks"]]
Using isin()
df[df["City"].isin(["Delhi", "Pune"])]
Adding & Removing Columns
Adding a New Column
df["Passed"] = df["Marks"] >= 40
print(df)
- Adding Column Using Calculation
df["Bonus"] = df["Marks"] + 5
- Renaming Columns
df.rename(columns={"Marks": "Score"}, inplace=True)
- Removing a Column
df.drop("City", axis=1, inplace=True)
- Removing Multiple Columns
df.drop(["Age", "Passed"], axis=1, inplace=True)
Basic Data Cleaning
Why Data Cleaning is Important
Real-world data is often incomplete, inconsistent, or incorrect. Data cleaning ensures accurate analysis.
Handling Missing Values
df.isnull()
df.isnull().sum()
- Filling Missing Values
df["Score"].fillna(df["Score"].mean(), inplace=True)
- Dropping Missing Values
df.dropna(inplace=True)
- Removing Duplicate Rows
df.drop_duplicates(inplace=True)
- Changing Data Types
df["Score"] = df["Score"].astype(int)
- Replacing Values
df["City"].replace("Delhi", "New Delhi", inplace=True)
Real-World Use Cases
Cleaning CSV files
Preparing datasets for ML
Business reports
Data validation
Analytics dashboards