Text generation has become an integral part of many applications, including chatbots, content creation, and automated storytelling. Among the various models used for text generation, Long Short-Term Memory (LSTM) networks have played a significant role due to their ability to capture long-range dependencies in sequential data.

However, Bidirectional LSTMs (BiLSTMs) have emerged as a powerful enhancement over traditional LSTMs. This article explores the concept of Bidirectional LSTMs, their advantages over LSTMs, and how they can be effectively used for text generation tasks.

Understanding LSTMs

What is an LSTM?

LSTMs are a type of recurrent neural network (RNN) specifically designed to handle the vanishing gradient problem that standard RNNs face. They achieve this through a unique architecture that includes memory cells and gating mechanisms, which allow them to maintain information over long sequences.

How LSTMs Work

An LSTM network consists of three gates:

  1. Forget Gate: Decides what information to discard from the cell state.
  2. Input Gate: Determines which new information to store in the cell state.
  3. Output Gate: Decides what information to output from the cell state.

This architecture allows LSTMs to learn dependencies across time steps effectively, making them suitable for text generation tasks where context is crucial.

Limitations of LSTMs

While LSTMs are powerful, they still have limitations:

  1. Sequential Processing: LSTMs process input sequences in a single direction (usually forward), which may lead to a lack of context for certain words or phrases. For example, when generating text, the model might miss relevant information that comes later in the sequence.
  2. Training Time: The sequential nature of LSTMs can lead to longer training times, as they cannot leverage parallel processing.

Introducing Bidirectional LSTMs

What is a Bidirectional LSTM?

Bidirectional LSTMs are an extension of traditional LSTMs that process the input sequences in both forward and backward directions. This means that for each time step, the model takes into account the information from both past and future contexts.

How BiLSTMs Work

In a Bidirectional LSTM:

  • Two separate LSTMs are trained simultaneously:
    • One processes the input sequence from the start to the end (forward LSTM).
    • The other processes the input sequence from the end to the start (backward LSTM).
  • The outputs from both LSTMs are combined, typically by concatenating them, to create a comprehensive representation of the input at each time step.

Advantages of Bidirectional LSTMs Over LSTMs

1. Enhanced Contextual Understanding

By processing sequences in both directions, BiLSTMs capture context more effectively. This allows the model to consider not only the previous words but also the subsequent words when making predictions. This is particularly useful in text generation, where the meaning of a word can depend significantly on its surrounding context.

2. Improved Performance in Sequence Prediction

The dual processing of sequences in BiLSTMs leads to better performance in tasks such as language modeling and text generation. By incorporating future context, BiLSTMs can generate more coherent and contextually relevant text, reducing issues like repetitiveness and ambiguity.

3. Versatility Across Applications

BiLSTMs are versatile and can be applied to various natural language processing (NLP) tasks, including sentiment analysis, named entity recognition, and machine translation. Their ability to understand context makes them suitable for a wide range of applications beyond just text generation.

4. Parallel Processing

While LSTMs process sequences sequentially, BiLSTMs can take advantage of parallel processing to some extent, reducing training time. Although the training still needs to consider both forward and backward passes, the increased context gained can outweigh the computational costs.

Implementing Bidirectional LSTMs for Text Generation

Step-by-Step Implementation

Here’s a detailed implementation of a Bidirectional LSTM for text generation using TensorFlow and Keras. This example demonstrates how to train a model on a simple text dataset.

1. Import Libraries

First, we need to import the necessary libraries:

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import LSTM, Bidirectional, Dense, Embedding, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

Explanation:

  • numpy: A library for numerical operations in Python.
  • tensorflow: The deep learning framework we will use to build our model.
  • Sequential: A Keras model type that allows stacking layers linearly.
  • LSTM: The Long Short-Term Memory layer used in our model.
  • Bidirectional: A wrapper that allows LSTM to process input in both forward and backward directions.
  • Dense: A fully connected layer.
  • Embedding: A layer to transform integer sequences into dense vectors of fixed size.
  • Dropout: A regularization technique to prevent overfitting.
  • Tokenizer: A utility for converting text into sequences of integers.
  • pad_sequences: A utility to ensure all sequences have the same length.

2. Prepare the Data

Next, we need a dataset. For simplicity, let’s use a small corpus of text. You can replace this with any larger text dataset for better results.

data = ["CHENNAI: Following the very heavy rain alert for Chennai and surrounding districts, the state government has declared a holiday for schools and colleges in Chennai, Tiruvallur, Kancheepuram, and Chengalpattu districts on Tuesday. The decision was taken at a review meeting chaired by Chief Minister MK Stalin on Monday.",
"Additionally, the government has also asked private information technology companies to advise their employees to work from home from Tuesday till Saturday.",
"The School Education Department has issued a circular to all schools in the state, directing them to implement precautionary measures to ensure student safety during the rainy season. These measures include monitoring electricity cables, clearing drainage systems, pruning trees and ensuring that water does not stagnate on school rooftops. Schools have been advised to avoid using dilapidated or structurally weak buildings.",
"Schools can also utilise MGNREGA workers and the support of School Management Committee (SMC) members to clean the premises and rooftops during the holidays. Schools are also required to inspect their buildings for safety and carry out necessary maintenance work.",
"Headmasters and teachers have been instructed to inform parents not to allow their children near waterbodies. Students should be encouraged to use raincoats and umbrellas, the circular stated.",
"Furthermore, schools are required to provide details of the staff in charge to the revenue department in case the premises need to be used to accommodate rain-affected people, it further said.",
"CHENNAI: The Regional Meteorological Centre (RMC) has issued a red alert to nine districts in Tamil Nadu including Chennai and surrounding districts for Wednesday.",
"It said that heavy to very heavy rain at a few places with extremely heavy rain at one or two places is likely to occur over Tiruvallur, Chennai, Kancheepuram, Chengalpattu, Cuddalore, Villupuram, Mayiladuthurai, Nagapattinam and Tiruvarur districts, Puducherry and Karaikal area.",
"An extremely heavy rainfall alert is given when there is a possibility of some places in the region receiving more than 20.4 cm of rainfall.",
"This apart, an orange alert has been issued to Ranipet, Tiruvannamalai, Kallakurichi, Perambalur, Ariyalur and Thanjavur districts for heavy to very heavy rains (11.5 cm to 20.5 cm) for the same day.",
"The RMC has given an orange alert for the possibility of heavy to very heavy rain in a few places with extremely heavy rain at one or two places in Mayiladuthurai, Nagapattinam and Tiruvarur districts and Karaikal area on Tuesday.",
"Heavy to very heavy rain is likely to occur at isolated places over Ranipet, Tiruvallur, Chennai, Chengalpattu, Kancheepuram, Tiruvannamalai, Villupuram, Cuddalore, Kallakurichi, Ariyalur, Perambalur and Thanjavur districts and Puducherry on Tuesday.",
"RMC has also confirmed the formation of a low-pressure area over southeast Bay of Bengal which is expected to become well-marked and move towards north Tamil Nadu in the next two days."]
# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(data)
tokenizer.word_index

# Output - 
{'to': 1,
 'the': 2,
 'and': 3,
 'heavy': 4,
 'in': 5,
 'districts': 6,
 'a': 7,
 'chennai': 8,
 'rain': 9,
 'for': 10,
 'has': 11,
 'of': 12,
 'schools': 13,
 'places': 14,
 'very': 15,
 'alert': 16,
 'on': 17,
 'at': 18,
 'is': 19,
 'tuesday': 20,
 'also': 21,
 'tiruvallur': 22,
 'kancheepuram': 23,
 'chengalpattu': 24,
 'their': 25,
 'school': 26,
 'issued': 27,
 'been': 28,
 'or': 29,
 'rmc': 30,
 'extremely': 31,
 'two': 32,
 'over': 33,
 'area': 34,
 'an': 35,
 'cm': 36,
 'surrounding': 37,
 'state': 38,
 'government': 39,
 'work': 40,
 'from': 41,
 'department': 42,
 'circular': 43,
 'measures': 44,
 'safety': 45,
 'during': 46,
 'that': 47,
 'not': 48,
 'rooftops': 49,
 'have': 50,
 'buildings': 51,
 'premises': 52,
 'are': 53,
 'required': 54,
 'be': 55,
 'it': 56,
 'said': 57,
 'tamil': 58,
 'nadu': 59,
 'few': 60,
 'with': 61,
 'one': 62,
 'likely': 63,
 'occur': 64,
 'cuddalore': 65,
 'villupuram': 66,
 'mayiladuthurai': 67,
 'nagapattinam': 68,
 'tiruvarur': 69,
 'puducherry': 70,
 'karaikal': 71,
 'rainfall': 72,
 'given': 73,
 'possibility': 74,
 '20': 75,
 'orange': 76,
 'ranipet': 77,
 'tiruvannamalai': 78,
 'kallakurichi': 79,
 'perambalur': 80,
 'ariyalur': 81,
 'thanjavur': 82,
 '5': 83,
 'following': 84,
 'declared': 85,
 'holiday': 86,
 'colleges': 87,
 'decision': 88,
 'was': 89,
 'taken': 90,
 'review': 91,
 'meeting': 92,
 'chaired': 93,
 'by': 94,
 'chief': 95,
 'minister': 96,
 'mk': 97,
 'stalin': 98,
 'monday': 99,
 'additionally': 100,
 'asked': 101,
 'private': 102,
 'information': 103,
 'technology': 104,
 'companies': 105,
 'advise': 106,
 'employees': 107,
 'home': 108,
 'till': 109,
 'saturday': 110,
 'education': 111,
 'all': 112,
 'directing': 113,
 'them': 114,
 'implement': 115,
 'precautionary': 116,
 'ensure': 117,
 'student': 118,
 'rainy': 119,
 'season': 120,
 'these': 121,
 'include': 122,
 'monitoring': 123,
 'electricity': 124,
 'cables': 125,
 'clearing': 126,
 'drainage': 127,
 'systems': 128,
 'pruning': 129,
 'trees': 130,
 'ensuring': 131,
 'water': 132,
 'does': 133,
 'stagnate': 134,
 'advised': 135,
 'avoid': 136,
 'using': 137,
 'dilapidated': 138,
 'structurally': 139,
 'weak': 140,
 'can': 141,
 'utilise': 142,
 'mgnrega': 143,
 'workers': 144,
 'support': 145,
 'management': 146,
 'committee': 147,
 'smc': 148,
 'members': 149,
 'clean': 150,
 'holidays': 151,
 'inspect': 152,
 'carry': 153,
 'out': 154,
 'necessary': 155,
 'maintenance': 156,
 'headmasters': 157,
 'teachers': 158,
 'instructed': 159,
 'inform': 160,
 'parents': 161,
 'allow': 162,
 'children': 163,
 'near': 164,
 'waterbodies': 165,
 'students': 166,
 'should': 167,
 'encouraged': 168,
 'use': 169,
 'raincoats': 170,
 'umbrellas': 171,
 'stated': 172,
 'furthermore': 173,
 'provide': 174,
 'details': 175,
 'staff': 176,
 'charge': 177,
 'revenue': 178,
 'case': 179,
 'need': 180,
 'used': 181,
 'accommodate': 182,
 'affected': 183,
 'people': 184,
 'further': 185,
 'regional': 186,
 'meteorological': 187,
 'centre': 188,
 'red': 189,
 'nine': 190,
 'including': 191,
 'wednesday': 192,
 'when': 193,
 'there': 194,
 'some': 195,
 'region': 196,
 'receiving': 197,
 'more': 198,
 'than': 199,
 '4': 200,
 'this': 201,
 'apart': 202,
 'rains': 203,
 '11': 204,
 'same': 205,
 'day': 206,
 'isolated': 207,
 'confirmed': 208,
 'formation': 209,
 'low': 210,
 'pressure': 211,
 'southeast': 212,
 'bay': 213,
 'bengal': 214,
 'which': 215,
 'expected': 216,
 'become': 217,
 'well': 218,
 'marked': 219,
 'move': 220,
 'towards': 221,
 'north': 222,
 'next': 223,
 'days': 224}
# check total words
total_words = len(tokenizer.word_index) + 1  # +1 for padding
total_words

# 25

Explanation:

  • corpus: A list of sentences representing our text data. You can use a larger and more diverse dataset for better results.
  • Tokenizer: It learns the vocabulary of the text.
  • fit_on_texts(): It creates a word index based on the corpus.
  • total_words: This variable holds the total number of unique words, which we’ll need later for our model’s output layer.

3. Create Input Sequences

We will create input sequences and labels from the corpus for training the model.

# Create input sequences
input_sequences = []
for line in data:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i + 1]
        input_sequences.append(n_gram_sequence)
input_sequences

# Output
[[8, 84],
 [8, 84, 2],
 [8, 84, 2, 15],
 [8, 84, 2, 15, 4],
 [8, 84, 2, 15, 4, 9],
 [8, 84, 2, 15, 4, 9, 16],
 [8, 84, 2, 15, 4, 9, 16, 10],
 [8, 84, 2, 15, 4, 9, 16, 10, 8],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11, 85],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11, 85, 7],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11, 85, 7, 86],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11, 85, 7, 86, 10],
 [8, 84, 2, 15, 4, 9, 16, 10, 8, 3, 37, 6, 2, 38, 39, 11, 85, 7, 86, 10, 13],
......
......
......
# Pad sequences
max_sequence_length = max(len(x) for x in input_sequences)
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')
input_sequences
# Create predictors and labels
X, y = input_sequences[:,:-1], input_sequences[:,-1]
X
# check target
y
# convert target into categorical
y = to_categorical(y, num_classes=total_words)
y

Explanation:

  • input_sequences: This list will hold our sequences of integers that represent words.
  • token_list: Converts each line of text into a sequence of integers based on the word index.
  • n_gram_sequence: For each word in the line, it creates sequences that include all preceding words.
  • max_sequence_length: Determines the maximum length of the sequences.
  • pad_sequences: Ensures all sequences have the same length by padding shorter sequences.
  • X and y: Here, X holds the input sequences (all but the last word), and y holds the labels (the last word). We also one-hot encode the labels using to_categorical().

4. Define the Model

Now we will define a Bidirectional LSTM model.

# Build the Bidirectional LSTM model
model = Sequential()
model.add(Embedding(input_dim=total_words, output_dim=50, input_length=max_sequence_length - 1))
model.add(Bidirectional(LSTM(units=100, return_sequences=True)))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(units=100)))
model.add(Dropout(0.2))
model.add(Dense(total_words, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Explanation:

  • Embedding Layer: Converts the integer sequences into dense vectors of fixed size (50 in this case). The input_dim is the size of the vocabulary, and input_length is the length of the input sequences.
  • Bidirectional Layer: The first BiLSTM processes the input in both directions and returns sequences to feed into the next layer. The return_sequences=True argument ensures that we get outputs at each time step.
  • Dropout Layers: These layers randomly set a fraction (20% here) of the input units to 0 during training to prevent overfitting.
  • Dense Layer: This output layer predicts the next word in the sequence using the softmax activation function, providing probabilities for each word in the vocabulary.
  • model.compile(): Compiles the model with categorical cross-entropy loss (suitable for multi-class classification) and the Adam optimizer.

5. Train the Model

Now we can train our Bidirectional LSTM model.

# Train the model
model.fit(X, y, epochs=100, verbose=5)  # keep more epochs for better model
# Output

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

Explanation:

  • model.fit(): Trains the model on the input sequences X and labels y for 100 epochs. The verbose=1 argument provides detailed output about the training process.

6. Generate Text

After training, we can use the model to generate text based on a seed sequence.

def generate_text(seed_text, next_words, model, max_sequence_length):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_length - 1, padding='pre')
        predicted = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted, axis=-1)
        output_word = ""
        
        for word, index in tokenizer.word_index.items():
            if index == predicted_word_index:
                output_word = word
                break
                
        seed_text += " " + output_word
        
    return seed_text
# Generate text
seed_text = "It said that"
generated_text = generate_text(seed_text, next_words=10, model=model, max_sequence_length=max_sequence_length)
print(generated_text)

Explanation:

  • generate_text(): This function generates new text given a seed phrase.
    • It converts the seed text into a sequence of integers and pads it to match the input shape.
    • The model predicts the next word based on this padded sequence.
    • The predicted word is determined by finding the index with the highest probability.
    • The predicted word is appended to the seed text, and the process repeats for the specified number of next_words.
  • seed_text: The initial phrase provided to generate new text. You can change this to experiment with different starting points.

7. Results

After running the code, you will see the generated text printed to the console. The output will be a continuation of the seed text, reflecting the patterns and context learned from the training corpus.

Limitations of BILSTM

Bidirectional LSTMs provide a significant advantage over traditional LSTMs, but here are the key limitations of Bidirectional LSTMs (BiLSTMs):

  1. Increased Computational Complexity: BiLSTMs require more computational resources (memory and processing power) due to processing sequences in both directions.
  2. Longer Training Time: Training BiLSTMs takes more time than standard LSTMs because of the dual processing of sequences.
  3. Dependency on Sequence Length: For very long sequences, BiLSTMs may still struggle with long-range dependencies, similar to regular LSTMs.
  4. Requires Complete Input Sequence: BiLSTMs need the entire input sequence beforehand, making them unsuitable for real-time tasks where future context is not available.
  5. Overfitting Risk: Due to the increased complexity and parameters, BiLSTMs are prone to overfitting, especially on small datasets.
  6. Limited Scalability: For very large datasets, scaling BiLSTMs efficiently can be challenging without optimized hardware or techniques like distributed computing.

These limitations can be mitigated by alternatives like Transformer models, which handle long-range dependencies more efficiently.

Conclusion

Bidirectional LSTMs provide a significant advantage over traditional LSTMs in terms of context awareness and performance in sequence prediction tasks. By incorporating both past and future information, they enhance the ability of models to generate coherent and contextually relevant text. This article demonstrated how to implement a Bidirectional LSTM for text generation using a simple dataset, illustrating the steps from data preparation to text generation. As you explore more complex datasets and applications, the benefits of BiLSTMs will become even more apparent.

Leave a Reply