Pandas DataFrame
-
Learn Pandas DataFrame to store, analyze, and manipulate tabular data in Python.
- What is a DataFrame?
A Pandas DataFrame is a two-dimensional data structure that stores data in rows and columns, where:
Each column has a name
Each row has an index
Columns can have different data types
Key Characteristics
Tabular (row × column) format
Labeled columns and indexes
Mutable (add/remove columns easily)
Built on top of NumPy arrays
Simple Example
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 28],
"City": ["Delhi", "Mumbai", "Bangalore"]
}
df = pd.DataFrame(data)
print(df)
Creating DataFrames
Creating from List of Lists
data = [
[1, "Laptop", 50000],
[2, "Mobile", 20000],
[3, "Tablet", 30000]
]
df = pd.DataFrame(data, columns=["ID", "Product", "Price"])
print(df)
Creating from Dictionary
data = {
"Student": ["Aman", "Riya", "Neha"],
"Marks": [85, 90, 88]
}
df = pd.DataFrame(data)
print(df)
- Creating from CSV File
df = pd.read_csv("students.csv")
print(df.head())
DataFrame Structure (Rows & Columns)
Viewing Structure
df.shape # (rows, columns)
df.columns # column names
df.index # row indexes
df.dtypes # data types of columns
Example
print(df.shape)
print(df.columns)
print(df.dtypes)
Concept
Rows → Observations / records
Columns → Features / attributes
Basic DataFrame Operations
Viewing Data
df.head() # first 5 rows
df.tail() # last 5 rows
Selecting Columns
df["Name"]
df[["Name", "Age"]]
- Selecting Rows
df.loc[0] # by label
df.iloc[1] # by position
Filtering Data
df[df["Age"] > 25]
- Adding a New Column
df["Country"] = "India"
print(df)
- Updating Values
df.loc[0, "Age"] = 26
- Deleting a Column
df.drop("City", axis=1, inplace=True)
- Sorting Data
df.sort_values(by="Age")
df.sort_values(by="Age", ascending=False)
Basic Aggregations
df["Age"].mean()
df["Age"].max()
df["Age"].min()
Real-World Use Cases
Reading CSV / Excel data
Cleaning datasets
Business reports
Data analysis & visualization
Machine learning preprocessing