Gated Recurrent Unit (GRU)

  • This lesson explains Gated Recurrent Unit networks and how they efficiently process sequential data in deep learning.
  • Gated Recurrent Unit (GRU)

    GRU is a simplified and more efficient version of LSTM.

    It was introduced to solve:

    • Vanishing Gradient Problem

    • Long-term dependency issues

    But with simpler architecture than LSTM.


    GRU vs LSTM

    Feature

    LSTM

    GRU

    Gates

    3 (Forget, Input, Output)

    2 (Update, Reset)

    Cell State

    Separate (Cₜ)

    ❌ No separate cell state

    Hidden State

    Yes

    Yes

    Parameters

    More

    Fewer

    Speed

    Slower

    Faster

    Performance

    Very strong

    Comparable

    GRU combines Cell State + Hidden State into one.

    So it is computationally lighter.


    GRU Architecture

    GRU has:

    • Hidden State (hₜ)

    • 2 Gates:

      • Update Gate (zₜ)

      • Reset Gate (rₜ)

    There is no separate cell state like LSTM.

    GRU Flow

    Previous Hidden State (hₜ₋₁)

              ↓

         Reset Gate

              ↓

         Candidate State

              ↓

         Update Gate

              ↓

         New Hidden State (hₜ)


    Update Gate (zₜ)

    Purpose:

    Decides how much past information to keep.

    Formula:

    zt=σ(Wzxt+Uzht−1)z_t = \sigma(W_z x_t + U_z h_{t-1})zt​=σ(Wz​xt​+Uz​ht−1​)

    Range: 0 to 1

    • Close to 1 → Keep old memory

    • Close to 0 → Replace with new memory

    Hidden State Update:

    ht=(1−zt)h~t+ztht−1h_t = (1 - z_t) \tilde{h}_t + z_t h_{t-1}ht​=(1−zt​)h~t​+zt​ht−1​

    So update gate controls memory retention.


    Reset Gate (rₜ)

    Purpose:

    Decides how much past information to forget.

    Formula:

    rt=σ(Wrxt+Urht−1)r_t = \sigma(W_r x_t + U_r h_{t-1})rt​=σ(Wr​xt​+Ur​ht−1​)

    Candidate hidden state:

    h~t=tanh(Wxt+rt∗Uht−1)\tilde{h}_t = tanh(W x_t + r_t * U h_{t-1})h~t​=tanh(Wxt​+rt​∗Uht−1​)

    If reset gate ≈ 0:

    • Forget past completely

    If reset gate ≈ 1:

    • Use full past memory


    Why GRU is Efficient

    Fewer parameters
    Faster training
    Requires less memory
    Works well on smaller datasets
    Simpler than LSTM


    Efficiency Comparison

    Metric

    LSTM

    GRU

    Training Speed

    Slower

    Faster

    Memory Usage

    Higher

    Lower

    Computational Cost

    High

    Lower

    Long Sequences

    Excellent

    Very Good

    In many practical tasks:
    GRU performs almost same as LSTM
    But trains faster


    GRU Code Example (Time Series)

GRU Time Series Prediction Example in Python using TensorFlow Keras

This Python example demonstrates how to build and train a GRU (Gated Recurrent Unit) network for sequence prediction using TensorFlow Keras. The code reshapes the input data into 3D (samples, time steps, features), creates a GRU layer followed by a Dense layer, compiles the model with the Adam optimizer and Mean Squared Error loss, trains it for multiple epochs, and makes predictions on the input sequence data.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense

# Dummy sequence data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])

# Reshape (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))

model = Sequential([
    GRU(50, activation='relu', input_shape=(3,1)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

prediction = model.predict(X)
print(prediction)