❮ Previous Next ❯

Gated Recurrent Unit (GRU)

This lesson explains Gated Recurrent Unit networks and how they efficiently process sequential data in deep learning.

Gated Recurrent Unit (GRU)
GRU is a simplified and more efficient version of LSTM.
It was introduced to solve:
- Vanishing Gradient Problem
- Long-term dependency issues
But with simpler architecture than LSTM.

GRU vs LSTM
Feature
LSTM
GRU
Gates
3 (Forget, Input, Output)
2 (Update, Reset)
Cell State
Separate (Cₜ)
❌ No separate cell state
Hidden State
Yes
Yes
Parameters
More
Fewer
Speed
Slower
Faster
Performance
Very strong
Comparable
GRU combines Cell State + Hidden State into one.
So it is computationally lighter.

GRU Architecture
GRU has:
- Hidden State (hₜ)
- 2 Gates:
There is no separate cell state like LSTM.
GRU Flow
Previous Hidden State (hₜ₋₁)
          ↓
     Reset Gate
          ↓
     Candidate State
          ↓
     Update Gate
          ↓
     New Hidden State (hₜ)

Update Gate (zₜ)
Purpose:
Decides how much past information to keep.
Formula:
zt=σ(Wzxt+Uzht−1)z_t = \sigma(W_z x_t + U_z h_{t-1})zt=σ(Wzxt+Uzht−1)
Range: 0 to 1
- Close to 1 → Keep old memory
- Close to 0 → Replace with new memory
Hidden State Update:
ht=(1−zt)h~t+ztht−1h_t = (1 - z_t) \tilde{h}_t + z_t h_{t-1}ht=(1−zt)h~t+ztht−1
So update gate controls memory retention.

Reset Gate (rₜ)
Purpose:
Decides how much past information to forget.
Formula:
rt=σ(Wrxt+Urht−1)r_t = \sigma(W_r x_t + U_r h_{t-1})rt=σ(Wrxt+Urht−1)
Candidate hidden state:
h~t=tanh(Wxt+rt∗Uht−1)\tilde{h}_t = tanh(W x_t + r_t * U h_{t-1})h~t=tanh(Wxt+rt∗Uht−1)
If reset gate ≈ 0:
- Forget past completely
If reset gate ≈ 1:
- Use full past memory
Why GRU is Efficient
Fewer parameters
Faster training
Requires less memory
Works well on smaller datasets
Simpler than LSTM

Efficiency Comparison
Metric
LSTM
GRU
Training Speed
Slower
Faster
Memory Usage
Higher
Lower
Computational Cost
High
Lower
Long Sequences
Excellent
Very Good
In many practical tasks:
GRU performs almost same as LSTM
But trains faster

GRU Code Example (Time Series)

GRU Time Series Prediction Example in Python using TensorFlow Keras

This Python example demonstrates how to build and train a GRU (Gated Recurrent Unit) network for sequence prediction using TensorFlow Keras. The code reshapes the input data into 3D (samples, time steps, features), creates a GRU layer followed by a Dense layer, compiles the model with the Adam optimizer and Mean Squared Error loss, trains it for multiple epochs, and makes predictions on the input sequence data.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense

# Dummy sequence data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])

# Reshape (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))

model = Sequential([
    GRU(50, activation='relu', input_shape=(3,1)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

prediction = model.predict(X)
print(prediction)

❮ Previous Next ❯

Feature	LSTM	GRU
Gates	3 (Forget, Input, Output)	2 (Update, Reset)
Cell State	Separate (Cₜ)	❌ No separate cell state
Hidden State	Yes	Yes
Parameters	More	Fewer
Speed	Slower	Faster
Performance	Very strong	Comparable

Metric	LSTM	GRU
Training Speed	Slower	Faster
Memory Usage	Higher	Lower
Computational Cost	High	Lower
Long Sequences	Excellent	Very Good

Gated Recurrent Unit (GRU)

GRU vs LSTM

GRU Architecture

GRU Flow

Update Gate (zₜ)

Purpose:

Hidden State Update:

Reset Gate (rₜ)

Purpose:

Why GRU is Efficient

Efficiency Comparison

GRU Code Example (Time Series)

GRU Time Series Prediction Example in Python using TensorFlow Keras

Login

Create Account