Long Short-Term Memory (LSTM)
- This lesson explains Long Short-Term Memory networks and how they improve sequential data learning in deep learning models.
Long Short-Term Memory (LSTM)
LSTM is a special type of RNN designed to solve the vanishing gradient problem and learn long-term dependencies.
Limitations of Basic RNN
Basic RNN suffers from:
1. Vanishing Gradient Problem
During Backpropagation Through Time (BPTT):
Gradient→very small\text{Gradient} \rightarrow \text{very small}Gradient→very small
Result:
Model forgets old information
Cannot learn long sequences
Example:
“The movie I watched last year was not good.”
RNN may forget the word “not”.
2. Short-Term Memory Only
Basic RNN struggles with long sentences or long time-series patterns.
LSTM Architecture
LSTM improves RNN by introducing:
Cell State (Long-term memory)
Gates (Control information flow)LSTM Components
An LSTM cell contains:
Cell State (Cₜ) → Long-term memory
Hidden State (hₜ) → Short-term output
3 Gates:
Forget Gate
Input Gate
Output Gate
LSTM Flow
Previous Cell State (Cₜ₋₁)
↓
Forget Gate
↓
Input Gate
↓
Updated Cell State (Cₜ)
↓
Output Gate
↓
Hidden State (hₜ)
Forget Gate
Purpose:
Decides what information to remove from cell state.
Formula:
ft=σ(Wfxt+Ufht−1)f_t = \sigma(W_f x_t + U_f h_{t-1})ft=σ(Wfxt+Ufht−1)
Output range: 0 to 1
0 → Forget completely
1 → Keep completely
Example:
In sentence:
“The movie was not good”
When processing “good”, LSTM remembers “not” because forget gate keeps it.
Input Gate
Purpose:
Decides what new information to store.
Two parts:
it=σ(Wixt+Uiht−1)i_t = \sigma(W_i x_t + U_i h_{t-1})it=σ(Wixt+Uiht−1) C~t=tanh(Wcxt+Ucht−1)\tilde{C}_t = tanh(W_c x_t + U_c h_{t-1})C~t=tanh(Wcxt+Ucht−1)
Updated Cell State:
Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_tCt=ft∗Ct−1+it∗C~t
So:
Forget old info
Add new important info
Output Gate
Purpose:
Decides what to send as output.
ot=σ(Woxt+Uoht−1)o_t = \sigma(W_o x_t + U_o h_{t-1})ot=σ(Woxt+Uoht−1) ht=ot∗tanh(Ct)h_t = o_t * tanh(C_t)ht=ot∗tanh(Ct)
Hidden state is used for:
Next time step
Final prediction
Why LSTM Works Better
Preserves long-term memory
Avoids vanishing gradient
Controlled information flow
Works well for long sequencesTime Series Forecasting Using LSTM
LSTM is widely used for time-dependent data.
Example:
Stock Prices
Input:
[100, 105, 110, 115]Model predicts:
Next price → 120Real Applications
Stock market prediction
Weather forecasting
Energy demand forecasting
Sales prediction
Speech recognition
Text generation
RNN vs LSTM
Core Idea
LSTM = Smart RNN with Memory Control System
It decides:
What to forget
What to remember
What to output
Simple LSTM Code Example (Time Series)
LSTM Time Series Prediction Example in Python using TensorFlow Keras
This Python example demonstrates how to build and train an LSTM (Long Short-Term Memory) network for simple time series prediction using TensorFlow Keras. The code reshapes the input data to 3D (samples, time steps, features), creates an LSTM layer followed by a Dense layer, compiles the model with the Adam optimizer and MSE loss, trains it, and makes predictions on the input data.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# Dummy time series data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])
# Reshape for LSTM (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))
model = Sequential([
LSTM(50, activation='relu', input_shape=(3,1)),
Dense(1)
])
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)
prediction = model.predict(X)
print(prediction)