Objective:
To understand how LSTMs solve the problem of long-term dependencies in text generation, and to implement a simple word-based text generator using an LSTM network. Let us see text generation example as follow


Before LSTM: What Were the Problems?

In previous models, such as RNNs and simpler methods like Markov Chains and n-grams, one of the major issues was the vanishing gradient problem. This problem arises because standard RNNs struggle to remember long-term dependencies. As a result, these models often fail when generating long sequences of text, since they forget important context from earlier parts of the sequence.


What is an LSTM?

Long Short-Term Memory (LSTM) networks are a special kind of RNN designed to better retain and manage information over long sequences. Unlike standard RNNs, which have a simple recurrent cell, LSTMs have a more complex structure that includes gates to control the flow of information in and out of memory.

Key Components of LSTM:

  1. Forget Gate: Decides what part of the memory to forget.
  2. Input Gate: Decides what new information to store in the memory.
  3. Output Gate: Determines what information to output at the current time step.

This design allows LSTMs to remember important information for longer periods and discard irrelevant details.


How LSTMs Work (Step-by-Step)

  1. Input Processing: Similar to RNNs, LSTMs take an input sequence one step at a time. However, they use gates to decide what information from previous steps should be remembered or forgotten.
  2. Memory Cell: The key to an LSTM’s ability to remember long-term dependencies is its memory cell, which carries information across multiple steps. The gates control the flow of information into and out of this memory.
  3. Gates Mechanism:
    • Forget Gate: Removes irrelevant information from the memory.
    • Input Gate: Adds new, relevant information to the memory.
    • Output Gate: Decides what part of the memory should be output as the current hidden state.

RNN vs. LSTM

Here’s a comparison table highlighting the key differences between RNNs and LSTMs:

FeatureRNNLSTM
ArchitectureSimple recurrent structureComplex structure with memory cells and gates
MemoryLimited memory due to vanishing gradientsBetter memory retention for long-term dependencies
GatesNo gates; only a single hidden stateForget, input, and output gates manage information flow
Training TimeGenerally faster due to simpler structureSlower due to more complex computations
Handling Long SequencesStruggles with long sequencesCapable of handling long sequences effectively, not too too large sentences
Use CasesBasic sequence prediction tasksMore complex tasks like text generation, language modeling
Gradient FlowProne to vanishing/exploding gradientsDesigned to mitigate vanishing gradient issues
ComplexitySimpler, easier to implementMore complex, requires more tuning

How LSTMs Solved Earlier Problems

  1. Retaining Long-Term Dependencies: LSTMs are explicitly designed to retain information over long sequences. This makes them ideal for tasks like text generation, where context from earlier words can significantly impact the current word.
  2. Vanishing Gradient Problem: By introducing the gates and memory cell, LSTMs reduce the effect of vanishing gradients, allowing them to learn and remember over longer sequences.

Implementing Text Generation with LSTM

In this section, we’ll implement a word-based text generation system using an LSTM network. The goal is to train the model on a body of text and generate new sentences word by word.

Dataset Preparation

For our implementation, we’ll use a simple dataset containing the text:

“This is a simple example for text generation using LSTM in Python.”

This text is limited but will serve our purpose of demonstrating LSTM functionality.

1. Prepare the Dataset

We will tokenize the text, create input sequences, and prepare the labels for training.

# Import Libraries
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Sample text data
data = "This is a simple example for text generation using LSTM in Python. "

# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])

# Check tokens
tokenizer.word_index
# Check and calculate total words
total_words = len(tokenizer.word_index) + 1
total_words

# Output: 13
# Create input sequences
input_sequences = []
for i in range(1, len(tokenizer.texts_to_sequences([data])[0])):
    input_sequences.append(tokenizer.texts_to_sequences([data])[0][:i+1])

input_sequences
# Pad sequences to ensure uniform length
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
input_sequences
# Create predictors and labels
X, y = input_sequences[:,:-1], input_sequences[:,-1]
X
# check target
y
# convert target into categorical
y = to_categorical(y, num_classes=total_words)
y

Explanation:

  • We start by importing the necessary libraries.
  • The text is tokenized using the Tokenizer class from Keras, which converts words into unique integers.
  • We create input sequences by iterating through the tokenized words and collecting sequences of n-grams.
  • The sequences are padded to ensure they all have the same length, which is required for training the model.
  • Finally, we prepare the predictors (input sequences) and labels (next words) for training.

2. Build the LSTM Model

Next, we’ll build the LSTM model using Keras.

# Import libraries
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense
# Define the model architecture
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len - 1))
model.add(LSTM(150, return_sequences=False))
model.add(Dense(total_words, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
history = model.fit(X, y, epochs=100, verbose=5) # keep more epochs for model
# Check model Summary
model.summary()

Explanation:

  • We define a sequential model and add layers to it.
  • The Embedding layer converts the integer sequences into dense vectors of fixed size.
  • An LSTM layer is added to capture long-term dependencies from the input sequences.
  • The final layer is a Dense layer with a softmax activation function to predict the next word in the sequence.
  • The model is compiled using categorical crossentropy loss and the Adam optimizer.
  • We then fit the model to the data for a specified number of epochs.

3. Generate Text with the LSTM

Once the model is trained, we can generate new text by providing a seed word and predicting the next word in the sequence.

def generate_text(seed_text, next_words, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = tf.keras.preprocessing.sequence.pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        predicted_word = tokenizer.index_word[np.argmax(predicted)]
        seed_text += " " + predicted_word
    return seed_text
# Generate new text
seed_text = "This is"
generated_text = generate_text(seed_text, 10, max_sequence_len)
print(generated_text)

Even i have changed small corpus to big new article available at

https://www.newindianexpress.com/states/tamil-nadu/2024/Oct/14/tn-govt-declares-holiday-for-schools-colleges-in-chennai-and-other-districts-following-rain-alert

and kept seed text as “Why Holiday” with 50 next words, then i got following generated text which is quite impressive.

We know that still this is not the perfect text generated but it is quite good compare to earlier language models.

Explanation:

  • We define a function generate_text that takes a seed text and the number of words to generate.
  • The seed text is tokenized and padded to match the model’s input shape.
  • The model predicts the next word, and the predicted index is converted back to a word using the tokenizer.
  • This process is repeated for the desired number of words to be generated.
  • Finally, we call the function with a seed text to generate new sentences.

Types of RNNs

Before we proceed, it’s helpful to understand that LSTMs are one type of RNN. Here are other types:

  1. Vanilla RNN: The simplest form, which struggles with long-term dependencies.
  2. GRU (Gated Recurrent Unit): A simplified version of LSTM that has fewer gates but performs similarly.
  3. Bidirectional RNN: Processes the input sequence in both forward and backward directions to improve context understanding.
  4. Deep RNN: Stacks multiple RNN layers to capture complex patterns.

Among these, LSTM and GRU are the most commonly used due to their ability to handle long sequences.


Limitations of LSTMs

  1. Training Time: LSTMs are computationally expensive and slow to train, especially on large datasets.
  2. Memory Consumption: Since they retain information for longer, they require more memory.
  3. Difficulty in Capturing Very Long Sequences: While LSTMs perform better than RNNs, they still struggle with extremely long dependencies.

Conclusion

LSTMs have revolutionized sequence-based tasks like text generation by addressing the limitations of traditional RNNs. They retain memory of important information over long sequences, making them particularly useful for generating human-like text. However, their computational expense remains a challenge, which newer architectures like Transformers aim to address. But for many tasks, LSTMs remain a powerful and effective tool.

Leave a Reply