DataFrame Operations & Manipulation

  • Learn how to filter, sort, select, and transform Pandas DataFrames efficiently.
  • Indexing & Selecting Data

    Why Indexing Matters

    Indexing allows you to access specific rows and columns from a DataFrame quickly and accurately.

    Sample DataFrame

import pandas as pd

data = {
    "Name": ["Aman", "Riya", "Neha", "Karan"],
    "Age": [22, 25, 23, 30],
    "City": ["Delhi", "Mumbai", "Pune", "Delhi"],
    "Marks": [85, 90, 88, 92]
}

df = pd.DataFrame(data)
print(df)
  • Selecting Columns

df["Name"]
df[["Name", "Marks"]]
  • Selecting Rows by Label (loc)

df.loc[0]
df.loc[1:3]
  • Selecting Rows by Position (iloc)

df.iloc[0]
df.iloc[0:2]
  • loc vs iloc

    • loc → label-based

    • iloc → position-based


    Filtering Rows & Columns

    What is Filtering?

    Filtering means selecting data based on conditions.

    Filtering Rows

df[df["Age"] > 23]
  • Multiple Conditions

df[(df["Age"] > 23) & (df["City"] == "Delhi")]
  • Filtering Columns

df.loc[:, ["Name", "Marks"]]
  • Using isin()

df[df["City"].isin(["Delhi", "Pune"])]
  • Adding & Removing Columns

    Adding a New Column

df["Passed"] = df["Marks"] >= 40
print(df)
  • Adding Column Using Calculation

df["Bonus"] = df["Marks"] + 5
  • Renaming Columns

df.rename(columns={"Marks": "Score"}, inplace=True)
  • Removing a Column

df.drop("City", axis=1, inplace=True)
  • Removing Multiple Columns

df.drop(["Age", "Passed"], axis=1, inplace=True)
  • Basic Data Cleaning

    Why Data Cleaning is Important

    Real-world data is often incomplete, inconsistent, or incorrect. Data cleaning ensures accurate analysis.

    Handling Missing Values

df.isnull()
df.isnull().sum()
  • Filling Missing Values

df["Score"].fillna(df["Score"].mean(), inplace=True)
  • Dropping Missing Values

df.dropna(inplace=True)
  • Removing Duplicate Rows

df.drop_duplicates(inplace=True)
  • Changing Data Types

df["Score"] = df["Score"].astype(int)
  • Replacing Values

df["City"].replace("Delhi", "New Delhi", inplace=True)
  • Real-World Use Cases

    • Cleaning CSV files

    • Preparing datasets for ML

    • Business reports

    • Data validation

    • Analytics dashboards