Long Short-Term Memory (LSTM)

  • This lesson explains Long Short-Term Memory networks and how they improve sequential data learning in deep learning models.
  • Long Short-Term Memory (LSTM)

    LSTM is a special type of RNN designed to solve the vanishing gradient problem and learn long-term dependencies.


    Limitations of Basic RNN

    Basic RNN suffers from:

    1. Vanishing Gradient Problem

    During Backpropagation Through Time (BPTT):

    Gradient→very small\text{Gradient} \rightarrow \text{very small}Gradient→very small

    Result:

    • Model forgets old information

    • Cannot learn long sequences

    Example:

    “The movie I watched last year was not good.”

    RNN may forget the word “not”.

    2. Short-Term Memory Only

    Basic RNN struggles with long sentences or long time-series patterns.


    LSTM Architecture

    LSTM improves RNN by introducing:

    Cell State (Long-term memory)
    Gates (Control information flow)

    LSTM Components

    An LSTM cell contains:

    • Cell State (Cₜ) → Long-term memory

    • Hidden State (hₜ) → Short-term output

    • 3 Gates:

      • Forget Gate

      • Input Gate

      • Output Gate

    LSTM Flow

    Previous Cell State (Cₜ₋₁)

            ↓

         Forget Gate

            ↓

         Input Gate

            ↓

         Updated Cell State (Cₜ)

            ↓

         Output Gate

            ↓

         Hidden State (hₜ)


    Forget Gate

    Purpose:

    Decides what information to remove from cell state.

    Formula:

    ft=σ(Wfxt+Ufht−1)f_t = \sigma(W_f x_t + U_f h_{t-1})ft​=σ(Wf​xt​+Uf​ht−1​)

    Output range: 0 to 1

    • 0 → Forget completely

    • 1 → Keep completely

    Example:

    In sentence:

    “The movie was not good”

    When processing “good”, LSTM remembers “not” because forget gate keeps it.


    Input Gate

    Purpose:

    Decides what new information to store.

    Two parts:

    it=σ(Wixt+Uiht−1)i_t = \sigma(W_i x_t + U_i h_{t-1})it​=σ(Wi​xt​+Ui​ht−1​) C~t=tanh(Wcxt+Ucht−1)\tilde{C}_t = tanh(W_c x_t + U_c h_{t-1})C~t​=tanh(Wc​xt​+Uc​ht−1​)

    Updated Cell State:

    Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_tCt​=ft​∗Ct−1​+it​∗C~t​

    So:

    • Forget old info

    • Add new important info


    Output Gate

    Purpose:

    Decides what to send as output.

    ot=σ(Woxt+Uoht−1)o_t = \sigma(W_o x_t + U_o h_{t-1})ot​=σ(Wo​xt​+Uo​ht−1​) ht=ot∗tanh(Ct)h_t = o_t * tanh(C_t)ht​=ot​∗tanh(Ct​)

    Hidden state is used for:

    • Next time step

    • Final prediction


    Why LSTM Works Better

    Preserves long-term memory
    Avoids vanishing gradient
    Controlled information flow
    Works well for long sequences


    Time Series Forecasting Using LSTM

    LSTM is widely used for time-dependent data.

    Example:

    Stock Prices

    Input:
    [100, 105, 110, 115]

    Model predicts:
    Next price → 120

    Real Applications

    • Stock market prediction

    • Weather forecasting

    • Energy demand forecasting

    • Sales prediction

    • Speech recognition

    • Text generation

    RNN vs LSTM

    Feature

    RNN

    LSTM

    Long-term memory

    ❌ Weak

    ✅ Strong

    Vanishing Gradient

    High

    Very Low

    Gates

    ❌ No

    ✅ Yes

    Performance

    Moderate

    High

    Core Idea

    LSTM = Smart RNN with Memory Control System

    It decides:

    • What to forget

    • What to remember

    • What to output

    Simple LSTM Code Example (Time Series)

LSTM Time Series Prediction Example in Python using TensorFlow Keras

This Python example demonstrates how to build and train an LSTM (Long Short-Term Memory) network for simple time series prediction using TensorFlow Keras. The code reshapes the input data to 3D (samples, time steps, features), creates an LSTM layer followed by a Dense layer, compiles the model with the Adam optimizer and MSE loss, trains it, and makes predictions on the input data.

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Dummy time series data
X = np.array([[1,2,3],[2,3,4],[3,4,5]])
y = np.array([4,5,6])

# Reshape for LSTM (samples, time steps, features)
X = X.reshape((X.shape[0], X.shape[1], 1))

model = Sequential([
    LSTM(50, activation='relu', input_shape=(3,1)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

prediction = model.predict(X)
print(prediction)