House Price Prediction Project: Stacking ML Models explained easily

Introduction

In the world of machine learning, improving prediction accuracy is always a goal. One powerful technique that helps achieve this is stacking, or stacked generalization. In this article, we’ll take you through a complete machine learning project using stacking, allowing you to understand not just what stacking is, but how it works in practice.

What is Stacking?

Stacking is an ensemble learning method that combines multiple different models to produce more accurate predictions. Instead of relying on a single model, stacking uses a collection of different models, referred to as base models. The predictions from these models are then combined using another model, known as a meta-model.

The key idea behind stacking is that different models can learn different patterns from the data, and combining their predictions can lead to better performance. It’s like gathering opinions from multiple experts before making a decision—the more diverse the perspectives, the more informed the final outcome.

How Does Stacking Work?

  1. Data Preparation: We start by splitting our dataset into training and testing sets.
  2. Training Base Models: We train multiple base models on the training data. These models can be of various types, such as decision trees, support vector machines, or linear regression.
  3. Generating Predictions: After training, we use the base models to make predictions on a validation set or the training set itself (often via cross-validation). This helps to create a new dataset of predictions.
  4. Training the Meta-Model: We then use the predictions from the base models as input features to train the meta-model, which learns how to best combine these predictions.
  5. Final Predictions: Finally, we use the trained meta-model to make predictions on new, unseen data.

Project Overview: Predicting California Housing Prices with Stacking

In this project, we will use the California housing dataset to predict house prices. We will implement stacking using three different base models and a linear regression meta-model.

Step-by-Step Implementation

1. Import Necessary Libraries

First, we need to import the libraries we’ll use in our project.

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor, StackingRegressor
from sklearn.metrics import mean_squared_error

2. Load and Prepare the Dataset

Next, we load the California housing dataset and prepare it for training and testing.

# Load California housing dataset
california_housing = fetch_california_housing()
X = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
y = pd.Series(california_housing.target)
housing_df = pd.concat([X, y], axis=1)
housing_df = housing_df.rename(columns={0:'Price'})
housing_df
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])

3. Define Base Models and Meta-Model

Now we define our base models and the meta-model that will combine their predictions.

# Define base models
base_models = [
    ('dt', DecisionTreeRegressor(random_state=42)),
    ('svr', SVR()),
    ('rf', RandomForestRegressor(random_state=42, n_estimators=100))
]

# Define meta-model
meta_model = LinearRegression()

4. Train Base Models and Show Individual Results

Before creating the stacking model, let’s train the base models individually and print their results.

# Train base models and show individual results
for name, model in base_models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f'Mean Squared Error of {name}: {mse:.2f}')

5. Create Stacking Regressor

We create the stacking regressor using our base models and meta-model.

# Create stacking regressor
stacked_model = StackingRegressor(estimators = base_models, final_estimator = meta_model)

6. Train the Stacking Model

We fit the stacking model using the training data.

# Fit the stacking model
stacked_model.fit(X_train, y_train)

After expanding each of above components

7. Make Predictions and Evaluate All Models

Now we can make predictions using all models and evaluate their performance.

# Check stacked model Mean Squared Error
y_stacked_pred = stacked_model.predict(X_test)
stacked_mse = mean_squared_error(y_test, y_stacked_pred)

print(f'\nMean Squared Error of Stacking Model: {stacked_mse:.2f}')
# Displaying individual model predictions
for name, model in base_models:
    y_pred = model.predict(X_test)
    print(f'Predictions from {name}:', y_pred[:5])  # Print first 5 predictions
print('Stacking Model Predictions:', y_stacked_pred[:5])  # Print first 5 predictions

Understanding the Results

After running the code, you will obtain Mean Squared Error (MSE) scores for each individual model and the stacked model. The lower the MSE, the better the model is at predicting house prices. By comparing the individual model results with the stacked model results, you can see how stacking improves prediction accuracy. You can experiment different different algorithms so more error will be reduced and hence accuracy will be improved.

Conclusion

In this project, we built a complete machine learning project using stacking. By combining multiple base models through a meta-model, we enhanced prediction accuracy and robustness. Stacking allows us to leverage the strengths of different algorithms, making it a powerful technique in the machine learning toolkit.

Future Work

To further enhance the performance of your stacking model, consider experimenting with different base models, meta-models, and hyperparameter tuning. Additionally, exploring other datasets and applying stacking in various scenarios can provide deeper insights into its effectiveness.

With this understanding, you should now have a clear grasp of what stacking is, how it works, and how to implement it in a complete machine learning project.

Leave a Reply