Types of Data: How to Generate Structured, Semi-Structured, and Unstructured Data Using Python

Data is information that is collected and stored so it can be used for different purposes. It comes in many forms, and based on how it’s organized, we can classify data into different types. Understanding the different types of data is crucial because it helps us decide how to store, analyze, and make decisions based on that data.

In this article, we’ll learn about:

  • Types of data.
  • How to create different types of data (CSV, JSON, XML, text, audio, and image) using Python.
  • Comparing data types and their examples.

Types of Data

There are several types of data, but the most common ones are:

  1. Structured Data:
    • This type of data is organized in a table format (rows and columns), like in Excel or databases.
    • Example: Student details with columns like ID, Name, and Age.
  2. Semi-Structured Data:
    • This type of data doesn’t follow a strict table format but still has some structure, such as key-value pairs.
    • Examples: JSON, XML, and HTML.
  3. Unstructured Data:
    • This type of data has no specific format or organization. It could be anything from text files, audio, images, or videos.
    • Examples: A simple text file, an audio recording, a picture.

Comparison of Data Types

Data TypeFormatExampleUsesHow It’s Stored
Structured DataTable (Rows/Columns)CSV, ExcelEasy to store, analyze, and queryDatabases, Spreadsheets
Semi-Structured DataKey-Value, HierarchicalJSON, XMLFlexibility with some structureNoSQL databases, APIs
Unstructured DataNo specific structureText, Audio, Image, VideoRequires more processing to analyzeFiles, Media Storage

Examples of Different Types of Data

Here’s a quick look at the different types of data, along with examples of how to create each using Python:

  1. Structured Data:
    • CSV File: Like a table in Excel.
    • Example: A CSV file containing names, ages, and departments of employees.
  2. Semi-Structured Data:
    • JSON File: A collection of key-value pairs that describe something.
    • XML File: Similar to JSON but organized in a tag structure.
    • Example: A JSON or XML file containing details of a product (name, price, and features).
  3. Unstructured Data:
    • Text File: A file containing plain text, like a paragraph or a book.
    • Audio File: A sound file, such as a recording of a voice or music.
    • Image File: A picture or image, such as a photo or a drawing.
    • Example: A text file with some random information, an audio file of a speech, or an image of a landscape.

Step-by-Step Guide to Create Different Data Types Using Python

Now, let’s walk through how to create these different types of data using Python. Below are the code snippets for each type. To run each of the following file use command as : python filename.py


1. Creating Structured Data (CSV File)

CSV stands for Comma Separated Values. It’s a simple text format that stores data in rows and columns.

Here’s the Python code to create a CSV file:

import pandas as pd

# Create a sample structured DataFrame
data = {
    'ID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 35],
    'Department': ['HR', 'Finance', 'IT', 'Marketing']
}

# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(data)

# Save the DataFrame to a CSV file
df.to_csv('sample_structured_data.csv', index=False)

print("Structured data (CSV) created successfully.")

Explanation: This code creates a CSV file containing employee data like ID, Name, Age, and Department.


2. Creating Semi-Structured Data (JSON File)

JSON stands for JavaScript Object Notation. It’s used for storing and exchanging data in a key-value format.

Here’s the Python code to create a JSON file:

import json

json_data = [
    {
        "ID": 1,
        "Name": "Alice",
        "Age": 25
    },
    {
        "ID": 2,
        "Name": "Bob",
        "Age": 30,
        "Skills": {
            "Programming": "Java",
            "Experience": 5
        }
    },
    {
        "ID": 3,
        "Name": "Charlie",
        "Skills": {
            "Programming": "JavaScript",
            "Experience": 2
        }
    }
]

# Save the data to a JSON file
with open('sample_semi_structured_data.json', 'w') as f:
    json.dump(json_data, f, indent=4)

print("Semi-structured data (JSON) created successfully.")

Explanation: This code creates a JSON file with details about employees and their programming skills.


3. Creating Semi-Structured Data (XML File)

XML stands for eXtensible Markup Language. It stores data in a hierarchical format using tags.

Here’s the Python code to create an XML file:

import xml.etree.ElementTree as ET
from xml.dom import minidom

# Function to create an XML file
def create_semi_structured_xml():
    root = ET.Element('Employees')
    
    employees = [
        {"ID": 1, "Name": "Alice", "Age": 25, "Department": "HR"},
        {"ID": 2, "Name": "Bob", "Age": 30, "Department": "Finance"}
    ]
    
    for emp in employees:
        employee = ET.SubElement(root, 'Employee')
        ET.SubElement(employee, 'ID').text = str(emp["ID"])
        ET.SubElement(employee, 'Name').text = emp["Name"]
        ET.SubElement(employee, 'Age').text = str(emp["Age"])
        ET.SubElement(employee, 'Department').text = emp["Department"]

    xml_str = ET.tostring(root, 'utf-8')
    parsed_xml = minidom.parseString(xml_str)
    pretty_xml_str = parsed_xml.toprettyxml(indent="  ")

    with open('sample_semi_structured_data.xml', 'w') as f:
        f.write(pretty_xml_str)

    print("XML file created successfully.")

create_semi_structured_xml()

Explanation: This code creates an XML file containing employee data, structured with tags.


4. Creating a Text File

Text files are the simplest form of unstructured data. Let’s create a simple text file.

Here’s the Python code:

# Function to create sample text data
def create_sample_text_data():
    # Sample text content
    sample_text = """
    Welcome to Data AI Revolution!
    This platform is dedicated to sharing the latest information on AI, Data Science, Machine Learning, and related technologies.

    Topics covered:
    - Artificial Intelligence (AI)
    - Machine Learning (ML)
    - Deep Learning (DL)
    - Natural Language Processing (NLP)
    - Data Visualization
    - Databases and more!

    Join us to explore real-world projects and tutorials that help you stay ahead in the tech industry.
    
    Explore more at www.dataairevolution.com
    """

    # Write the text content to a file
    with open('sample_text_data.txt', 'w') as f:
        f.write(sample_text)
    
    print("Sample text data created successfully.")

# Call the function to generate the sample text data
create_sample_text_data()

Explanation: This code creates a text file with some simple content.


5. Creating an Audio File

You can generate sounds and save them as audio files. For this, we’ll use the pydub library.

Here’s the Python code:

from pydub.generators import Sine
from pydub import AudioSegment

# Function to create a sample audio file
def create_sample_audio_data():
    # Generate a 440 Hz sine wave tone (A4) for 5 seconds
    tone = Sine(440).to_audio_segment(duration=5000)  # 5 seconds duration

    # Optionally, create silence for 1 second after the tone
    silence = AudioSegment.silent(duration=1000)  # 1 second of silence

    # Concatenate tone and silence
    combined_audio = tone + silence

    # Export the audio to a file
    combined_audio.export("sample_audio_data.wav", format="wav")
    
    print("Sample audio data (WAV file) created successfully.")

# Call the function to create the sample audio file
create_sample_audio_data()

Explanation: This code generates a 5-second sound & 1 second silence and saves it as an audio file.


6. Creating an Image File

To create images, we’ll use the Pillow library.

Here’s the Python code:

from PIL import Image, ImageDraw, ImageFont

# Function to create a sample image
def create_sample_image():
    # Create a blank image (RGB mode) with white background
    img = Image.new('RGB', (400, 400), color='white')
    # Initialize ImageDraw to draw on the image
    draw = ImageDraw.Draw(img)
    # Draw a rectangle (filled)
    draw.rectangle([(50, 50), (350, 200)], fill="lightblue", outline="black", width=3)
    # Draw a circle (filled)
    draw.ellipse([(150, 220), (250, 320)], fill="lightgreen", outline="black", width=3)
    # Add some text (choose a font if available)
    try:
        # Load a TTF font (adjust path as needed)
        font = ImageFont.truetype("arial.ttf", 20)
    except IOError:
        # Use default PIL font if TTF is not available
        font = ImageFont.load_default()
    draw.text((100, 350), "Sample Image", font=font, fill="black")
    # Save the image to a file
    img.save('sample_image.png')
    print("Sample image (PNG) created successfully.")

# Call the function to create the sample image
create_sample_image()

Explanation: This code creates a simple image with a rectangle and some text, and saves it as a PNG file.


Conclusion

In this article, you learned how to create different types of data: structured (CSV), semi-structured (JSON and XML), and unstructured (text, audio, and images). Each of these data types has its unique use cases and benefits, and the examples provided will help you understand how to generate them programmatically.

Leave a Reply