❮ Previous Next ❯

Long Short-Term Memory (LSTM)

This lesson explains Long Short-Term Memory networks and how they improve sequential data learning in deep learning models.

Long Short-Term Memory (LSTM)
LSTM is a special type of RNN designed to solve the vanishing gradient problem and learn long-term dependencies.

Limitations of Basic RNN
Basic RNN suffers from:
1. Vanishing Gradient Problem
During Backpropagation Through Time (BPTT):
Gradient→very small\text{Gradient} \rightarrow \text{very small}Gradient→very small
Result:
- Model forgets old information
- Cannot learn long sequences
Example:
“The movie I watched last year was not good.”
RNN may forget the word “not”.
2. Short-Term Memory Only
Basic RNN struggles with long sentences or long time-series patterns.

LSTM Architecture
LSTM improves RNN by introducing:
Cell State (Long-term memory)
Gates (Control information flow)
LSTM Components
An LSTM cell contains:
- Cell State (Cₜ) → Long-term memory
- Hidden State (hₜ) → Short-term output
- 3 Gates:
LSTM Flow
Previous Cell State (Cₜ₋₁)
        ↓
     Forget Gate
        ↓
     Input Gate
        ↓
     Updated Cell State (Cₜ)
        ↓
     Output Gate
        ↓
     Hidden State (hₜ)

Forget Gate
Purpose:
Decides what information to remove from cell state.
Formula:
ft=σ(Wfxt+Ufht−1)f_t = \sigma(W_f x_t + U_f h_{t-1})ft=σ(Wfxt+Ufht−1)
Output range: 0 to 1
- 0 → Forget completely
- 1 → Keep completely
Example:
In sentence:
“The movie was not good”
When processing “good”, LSTM remembers “not” because forget gate keeps it.

Input Gate
Purpose:
Decides what new information to store.
Two parts:
it=σ(Wixt+Uiht−1)i_t = \sigma(W_i x_t + U_i h_{t-1})it=σ(Wixt+Uiht−1) C~t=tanh(Wcxt+Ucht−1)\tilde{C}_t = tanh(W_c x_t + U_c h_{t-1})C~t=tanh(Wcxt+Ucht−1)
Updated Cell State:
Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_tCt=ft∗Ct−1+it∗C~t
So:
- Forget old info
- Add new important info
Output Gate
Purpose:
Decides what to send as output.
ot=σ(Woxt+Uoht−1)o_t = \sigma(W_o x_t + U_o h_{t-1})ot=σ(Woxt+Uoht−1) ht=ot∗tanh(Ct)h_t = o_t * tanh(C_t)ht=ot∗tanh(Ct)
Hidden state is used for:
- Next time step
- Final prediction
Why LSTM Works Better
Preserves long-term memory
Avoids vanishing gradient
Controlled information flow
Works well for long sequences

Time Series Forecasting Using LSTM
LSTM is widely used for time-dependent data.
Example:
Stock Prices
Input:
[100, 105, 110, 115]
Model predicts:
Next price → 120
Real Applications
- Stock market prediction
- Weather forecasting
- Energy demand forecasting
- Sales prediction
- Speech recognition
- Text generation
RNN vs LSTM
Feature
RNN
LSTM
Long-term memory
❌ Weak
✅ Strong
Vanishing Gradient
High
Very Low
Gates
❌ No
✅ Yes
Performance
Moderate
High
Core Idea
LSTM = Smart RNN with Memory Control System
It decides:
- What to forget
- What to remember
- What to output
Simple LSTM Code Example (Time Series)

LSTM Time Series Prediction Example in Python using TensorFlow Keras

This Python example demonstrates how to build and train an LSTM (Long Short-Term Memory) network for simple time series prediction using TensorFlow Keras. The code reshapes the input data to 3D (samples, time steps, features), creates an LSTM layer followed by a Dense layer, compiles the model with the Adam optimizer and MSE loss, trains it, and makes predictions on the input data.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Dummy time series data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])

# Reshape for LSTM (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))

model = Sequential([
    LSTM(50, activation='relu', input_shape=(3,1)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

prediction = model.predict(X)
print(prediction)

❮ Previous Next ❯

Feature	RNN	LSTM
Long-term memory	❌ Weak	✅ Strong
Vanishing Gradient	High	Very Low
Gates	❌ No	✅ Yes
Performance	Moderate	High

Long Short-Term Memory (LSTM)

Limitations of Basic RNN

1. Vanishing Gradient Problem

2. Short-Term Memory Only

LSTM Architecture

LSTM Components

LSTM Flow

Forget Gate

Purpose:

Formula:

Example:

Input Gate

Purpose:

Output Gate

Purpose:

Why LSTM Works Better

Time Series Forecasting Using LSTM

Example:

Real Applications

RNN vs LSTM

Core Idea

Simple LSTM Code Example (Time Series)

LSTM Time Series Prediction Example in Python using TensorFlow Keras

Login

Create Account