Text Generation with Markov Chains: How It Works and Their Limitations

Generative AI has various approaches to create new content, and one of the simplest and widely-used techniques for text generation is the Markov Chain model. Markov Chains are great for understanding and creating text sequences based on probabilities derived from the text data they were trained on.

In this article, we will explore how Markov Chains work for text generation, understand their limitations, and see why we need more advanced techniques in the future.


What is a Markov Chain?

A Markov Chain is a mathematical system that transitions from one state to another based on certain probabilities. In the context of text generation, a “state” refers to a word or character, and the next word or character is chosen based on the previous one, according to a learned probability distribution.

Key Concepts:

  • State: Each unique word or character in the text is considered a “state”.
  • Transition: The probability of moving from one word/character to the next in a sequence.
  • Markov Property: The probability of each state depends only on the previous state (or a limited number of previous states).

In text generation using Markov Chains, the model learns how words or characters follow each other in the training data. It uses this information to predict the next word in a sequence, ultimately generating new sentences.


How Markov Chains Work for Text Generation

To understand how Markov Chains work for text generation, let’s look at a simple example:

  1. Training Data: Suppose our training text is:
    "The cat sits. The cat runs. The dog barks."
  2. Modeling Transitions: The model learns the probability of each word following another:
    • “The” → “cat” (appears twice), “dog” (appears once).
    • “cat” → “sits” (once), “runs” (once).
    • “dog” → “barks” (once).
    Based on this training data, the word “The” has a 66% chance of being followed by “cat” and a 33% chance of being followed by “dog”.
  3. Text Generation: Using the learned probabilities, we can generate new sentences by randomly selecting the next word based on the current word’s transition probabilities.

Example Project: Text Generation using Markov Chains in Python

Let’s create a simple project where we generate text using Markov Chains.

Step-by-Step Guide:

Step 1: Import Libraries

First, we need to import the necessary Python libraries:

import random
import networkx as nx
from pprint import pprint
import matplotlib.pyplot as plt
from collections import defaultdict

Step 2: Prepare the Training Text

We’ll take a sample text for training the Markov Chain model. You can replace this with any text you want.

text = "The cat sits. The cat runs. The dog barks."
words = text.split()
words

Step 3: Build the Markov Chain

We will now build the Markov Chain by creating a dictionary that maps each word to the words that follow it.

def build_markov_chain(words):
    markov_chain = defaultdict(list)
    
    for i in range(len(words) - 1):
        current_word = words[i]
        next_word = words[i + 1]
        markov_chain[current_word].append(next_word)
    
    return markov_chain

Explanation:
This function creates a dictionary where each word (state) is associated with a list of possible next words based on the training text.

Step 4: Generate Text

Now, let’s write a function to generate a random sequence of words based on the Markov Chain.

def generate_text(chain, start_word, length=10):
    word = start_word
    sentence = [word]
    
    for _ in range(length - 1):
        word = random.choice(chain[word])
        sentence.append(word)
    
    return ' '.join(sentence)

Explanation:

  • The function starts with a start_word and uses the Markov Chain to randomly pick the next word based on probabilities. This process continues until we generate the desired number of words.

Step 5: Run the Model

We can now run the text generation process:

# Build the Markov Chain
markov_chain = build_markov_chain(words)
pprint(markov_chain)
# Print this markov chain in graph format
G = nx.DiGraph()

# Add edges to the graph from the Markov Chain
for word, next_words in markov_chain.items():
    for next_word in next_words:
        G.add_edge(word, next_word)

# Draw the graph
plt.figure(figsize=(12, 4))
pos = nx.spring_layout(G)  # Layout for visualizing the graph
nx.draw(G, pos, with_labels=True, node_color='skyblue', node_size=2000, font_size=10, font_weight='bold', edge_color='gray')
plt.title('Markov Chain Graph')
plt.show()
# Generate random text
start_word = "The"
generated_sentence = generate_text(markov_chain, start_word, length=6)
print(generated_sentence)

How It Works

  1. Training: The model learns from the input text and records which words tend to follow each other.
  2. Text Generation: Based on learned transitions, the model randomly generates text by starting with a word and selecting the next word based on the transition probabilities until it forms a sentence.

Markov Property:

  • The Markov Chain uses only the last word (state) to predict the next word. This is why it is a “memoryless” model—each transition depends only on the current state, not the entire history of the sequence.

Where Can Markov Chain Text Generation Be Used?

  • Text Prediction: Auto-completion tools can use simple Markov Chains to suggest the next word in a sentence.
  • Story Generation: Simple stories or dialogues can be created using basic Markov Chains.
  • Chatbots: Early chatbot systems used Markov Chains to generate responses based on previous conversations.
  • Procedural Content: Generating game dialogues or random in-game text events.

Limitations of Markov Chains

While Markov Chains are great for generating text, they have several limitations:

  1. Limited Context: Markov Chains only consider the last word (or a few words) to generate the next word. This leads to very shallow text understanding and generation.
    • Example: If the model only looks at the previous word, it won’t know how to maintain a logical flow over longer sentences.
  2. Repetitive Output: Markov Chains may generate repetitive or nonsensical sentences because they cannot “understand” the meaning of words.
    • Example: It might generate a sentence like “The dog barks. The dog barks. The dog barks.”
  3. No Semantic Understanding: Markov Chains do not have any understanding of the meaning of the words or context, leading to grammatically incorrect or illogical sentences.
  4. Difficulty in Long Sequences: For long sequences of text, Markov Chains tend to lose coherence since each word is only based on the immediate previous word.

Conclusion

Markov Chains offer a simple and understandable way to generate text based on learned transitions between words. While they are effective for short and simple text generation tasks, their limitations become apparent when dealing with longer or more complex sequences. This is why more advanced models like RNNs, LSTMs, and Transformers are used today for generating high-quality text.

By starting with Markov Chains, you’ve learned the basic idea of text generation. The next step is to explore advanced models that can generate more coherent and meaningful text by remembering the context over longer sequences.

Leave a Reply