❮ Previous

Next ❯

Merging & Joining

Learn how to combine multiple datasets using merge, join, and concatenate in Pandas.

merge()
What is merge()?
merge() is used to combine two DataFrames based on a common column, similar to SQL JOIN.
It is the most powerful and flexible method for combining datasets.
Example Dataset

Creating Employees and Departments DataFrames

This code creates two Pandas DataFrames — one for employees and one for departments — which can later be used to perform merge (join) operations based on the common dept_id column.

import pandas as pd

employees = pd.DataFrame({
    "emp_id": [1, 2, 3, 4],
    "name": ["Amit", "Sara", "John", "Riya"],
    "dept_id": [101, 102, 101, 103]
})

departments = pd.DataFrame({
    "dept_id": [101, 102, 103],
    "department": ["IT", "HR", "Finance"]
})

print(employees)
print(departments)

Inner Join (Default)
Returns only matching rows.

Performing Inner Join using Pandas

This code merges the employees and departments DataFrames using an inner join on dept_id, returning only the rows where matching department IDs exist in both tables.

result = pd.merge(employees, departments, on="dept_id")
print(result)

Left Join
Returns all rows from left DataFrame.

Performing Left Join using Pandas

This code merges the employees and departments DataFrames using a left join on dept_id, returning all rows from the employees DataFrame and matching rows from departments. If no match is found, missing values will appear as NaN.

result = pd.merge(employees, departments, on="dept_id", how="left")
print(result)

Right Join
Returns all rows from right DataFrame.

Performing Right Join using Pandas

This code merges the employees and departments DataFrames using a right join on dept_id, returning all rows from the departments DataFrame and matching rows from employees. If no match is found, missing values will appear as NaN.

result = pd.merge(employees, departments, on="dept_id", how="right")
print(result)

Outer Join
Returns all rows from both DataFrames.

Performing Outer Join using Pandas

This code merges the employees and departments DataFrames using an outer join on dept_id, returning all rows from both DataFrames. If there is no matching value in either table, the missing fields will be filled with NaN.

result = pd.merge(employees, departments, on="dept_id", how="outer")
print(result)

Merge on Different Column Names

pd.merge(df1, df2, left_on="id1", right_on="id2")

join()
What is join()?
join() is used to combine DataFrames based on their index.
It is simpler than merge but less flexible.
Example

Combining DataFrames Using join()

This code demonstrates how to combine two DataFrames based on their index using join(), creating a single DataFrame with both employee names and salaries.

emp = pd.DataFrame({
    "emp_id": [1, 2, 3],
    "name": ["Amit", "Sara", "John"]
}).set_index("emp_id")

salary = pd.DataFrame({
    "emp_id": [1, 2, 3],
    "salary": [50000, 60000, 55000]
}).set_index("emp_id")

result = emp.join(salary)
print(result)

Join with how Parameter

emp.join(salary, how="inner")
emp.join(salary, how="outer")

concat()
What is concat()?
concat() is used to combine DataFrames along rows or columns.
Unlike merge, it does NOT require a common column.
Concatenate Rows (Default)

Concatenating DataFrames by Rows

This code uses pd.concat() to combine two DataFrames vertically (row-wise), stacking all rows from both DataFrames into a single DataFrame.

df1 = pd.DataFrame({
    "Name": ["Amit", "Sara"],
    "Age": [25, 30]
})

df2 = pd.DataFrame({
    "Name": ["John", "Riya"],
    "Age": [28, 32]
})

result = pd.concat([df1, df2])
print(result)

Reset Index After Concat

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Concatenate Columns

result = pd.concat([df1, df2], axis=1)
print(result)

Difference Between merge(), join(), concat()
Feature
merge()
join()
concat()
Based On
Column
Index
Axis
SQL-style join
Yes
Limited
No
Combine Rows
Yes
No
Yes
Combine Columns
Yes
Yes
Yes
Most Flexible
Yes
Medium
Simple stacking

❮ Previous

Next ❯

Feature	merge()	join()	concat()
Based On	Column	Index	Axis
SQL-style join	Yes	Limited	No
Combine Rows	Yes	No	Yes
Combine Columns	Yes	Yes	Yes
Most Flexible	Yes	Medium	Simple stacking

merge()

What is merge()?

Creating Employees and Departments DataFrames

Performing Inner Join using Pandas

Performing Left Join using Pandas

Performing Right Join using Pandas

Performing Outer Join using Pandas

join()

What is join()?

Combining DataFrames Using join()

concat()

What is concat()?

Concatenating DataFrames by Rows

Login

Create Account