Gated Recurrent Unit (GRU)
- This lesson explains Gated Recurrent Unit networks and how they efficiently process sequential data in deep learning.
Gated Recurrent Unit (GRU)
GRU is a simplified and more efficient version of LSTM.
It was introduced to solve:
Vanishing Gradient Problem
Long-term dependency issues
But with simpler architecture than LSTM.
GRU vs LSTM
GRU combines Cell State + Hidden State into one.
So it is computationally lighter.
GRU Architecture
GRU has:
Hidden State (hₜ)
2 Gates:
Update Gate (zₜ)
Reset Gate (rₜ)
There is no separate cell state like LSTM.
GRU Flow
Previous Hidden State (hₜ₋₁)
↓
Reset Gate
↓
Candidate State
↓
Update Gate
↓
New Hidden State (hₜ)
Update Gate (zₜ)
Purpose:
Decides how much past information to keep.
Formula:
zt=σ(Wzxt+Uzht−1)z_t = \sigma(W_z x_t + U_z h_{t-1})zt=σ(Wzxt+Uzht−1)
Range: 0 to 1
Close to 1 → Keep old memory
Close to 0 → Replace with new memory
Hidden State Update:
ht=(1−zt)h~t+ztht−1h_t = (1 - z_t) \tilde{h}_t + z_t h_{t-1}ht=(1−zt)h~t+ztht−1
So update gate controls memory retention.
Reset Gate (rₜ)
Purpose:
Decides how much past information to forget.
Formula:
rt=σ(Wrxt+Urht−1)r_t = \sigma(W_r x_t + U_r h_{t-1})rt=σ(Wrxt+Urht−1)
Candidate hidden state:
h~t=tanh(Wxt+rt∗Uht−1)\tilde{h}_t = tanh(W x_t + r_t * U h_{t-1})h~t=tanh(Wxt+rt∗Uht−1)
If reset gate ≈ 0:
Forget past completely
If reset gate ≈ 1:
Use full past memory
Why GRU is Efficient
Fewer parameters
Faster training
Requires less memory
Works well on smaller datasets
Simpler than LSTMEfficiency Comparison
In many practical tasks:
GRU performs almost same as LSTM
But trains fasterGRU Code Example (Time Series)
GRU Time Series Prediction Example in Python using TensorFlow Keras
This Python example demonstrates how to build and train a GRU (Gated Recurrent Unit) network for sequence prediction using TensorFlow Keras. The code reshapes the input data into 3D (samples, time steps, features), creates a GRU layer followed by a Dense layer, compiles the model with the Adam optimizer and Mean Squared Error loss, trains it for multiple epochs, and makes predictions on the input sequence data.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense
# Dummy sequence data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])
# Reshape (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))
model = Sequential([
GRU(50, activation='relu', input_shape=(3,1)),
Dense(1)
])
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)
prediction = model.predict(X)
print(prediction)