Pandas DataFrame

  • Learn Pandas DataFrame to store, analyze, and manipulate tabular data in Python.
  • What is a DataFrame?

    A Pandas DataFrame is a two-dimensional data structure that stores data in rows and columns, where:

    • Each column has a name

    • Each row has an index

    • Columns can have different data types

    Key Characteristics

    • Tabular (row × column) format

    • Labeled columns and indexes

    • Mutable (add/remove columns easily)

    • Built on top of NumPy arrays

    Simple Example

import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 28],
    "City": ["Delhi", "Mumbai", "Bangalore"]
}
df = pd.DataFrame(data)
print(df)
  • Creating DataFrames

    Creating from List of Lists

data = [
    [1, "Laptop", 50000],
    [2, "Mobile", 20000],
    [3, "Tablet", 30000]
]
df = pd.DataFrame(data, columns=["ID", "Product", "Price"])
print(df)
  • Creating from Dictionary

data = {
    "Student": ["Aman", "Riya", "Neha"],
    "Marks": [85, 90, 88]
}

df = pd.DataFrame(data)
print(df)
  • Creating from CSV File

df = pd.read_csv("students.csv")
print(df.head())
  • DataFrame Structure (Rows & Columns)

    Viewing Structure

df.shape      # (rows, columns)
df.columns    # column names
df.index      # row indexes
df.dtypes     # data types of columns
  • Example

print(df.shape)
print(df.columns)
print(df.dtypes)
  • Concept

    • Rows → Observations / records

    • Columns → Features / attributes


    Basic DataFrame Operations

    Viewing Data

df.head()     # first 5 rows
df.tail()     # last 5 rows
  • Selecting Columns

df["Name"]
df[["Name", "Age"]]
  • Selecting Rows

df.loc[0]       # by label
df.iloc[1]      # by position
  • Filtering Data

df[df["Age"] > 25]
  • Adding a New Column

df["Country"] = "India"
print(df)
  • Updating Values

df.loc[0, "Age"] = 26
  • Deleting a Column

df.drop("City", axis=1, inplace=True)
  • Sorting Data

df.sort_values(by="Age")
df.sort_values(by="Age", ascending=False)
  • Basic Aggregations

df["Age"].mean()
df["Age"].max()
df["Age"].min()
  • Real-World Use Cases

    • Reading CSV / Excel data

    • Cleaning datasets

    • Business reports

    • Data analysis & visualization

    • Machine learning preprocessing