Aggregation & Vectorization

  • Learn aggregation and vectorization in NumPy to perform efficient calculations on arrays.
  • Aggregation Operations

    What are Aggregation Operations?

    Aggregation operations combine multiple values in an array to produce a single summarized result.

    These operations are commonly used to analyze datasets, calculate totals, averages, and identify trends.

    Common Aggregation Functions

    Function

    Purpose

    sum()

    Total of all values

    mean()

    Average value

    min()

    Smallest value

    max()

    Largest value

    count()

    Number of elements

    std()

    Standard deviation

    var()

    Variance

    Example – Basic Aggregation

import numpy as np
data = np.array([10, 20, 30, 40, 50])
print(np.sum(data))   # 150
print(np.mean(data))  # 30.0
print(np.min(data))   # 10
print(np.max(data))   # 50
  • Aggregation on 2D Arrays

    Aggregation can be applied row-wise or column-wise using the axis parameter.

matrix = np.array([[10, 20, 30],
                   [40, 50, 60]])

print(np.sum(matrix, axis=0))  # Column-wise
print(np.sum(matrix, axis=1))  # Row-wise
  • Axis Summary

    • axis=0 → Works down columns

    • axis=1 → Works across rows

    Why Aggregation Matters

    • Helps summarize large datasets

    • Used in reporting and dashboards

    • Essential for statistical analysis

    • Improves data-driven decisions


    Vectorization

    What is Vectorization?

    Vectorization is the process of performing operations on entire arrays at once, instead of using loops.

    NumPy uses optimized C-based operations, making vectorized code much faster than traditional Python loops.

    Loop-based Approach (Slow)

result = []
for i in range(len(data)):
    result.append(data[i] * 2)
  • Vectorized Approach (Fast)

result = data * 2
  • Why Vectorization is Powerful

    • Eliminates explicit loops

    • Improves performance significantly

    • Code becomes shorter and cleaner

    • Ideal for large datasets

    Example – Vectorized Operations

prices = np.array([100, 200, 300])
taxed_prices = prices * 1.18
print(taxed_prices)
  • Vectorization with Conditions

scores = np.array([45, 78, 88, 32, 90])
passed = scores >= 50
print(passed)
  • Vectorized Mathematical Functions

angles = np.array([0, 30, 60, 90])
print(np.sin(angles))
print(np.log(prices))
  • Real-World Use Cases

    • Data preprocessing

    • Financial modeling

    • Machine learning pipelines

    • Scientific simulations

    • Business analytics