Predicting Customer Churn Using Machine Learning

Introduction

Predicting customer churn is critical for businesses to retain customers and maintain profitability. This project focuses on predicting customer churn using machine learning techniques.

Business Use Case

Problem: A telecom company wants to predict which customers are likely to churn based on their usage patterns.
Importance: Identifying potential churners helps in targeted retention strategies.
Impact: Reduced customer attrition and increased customer lifetime value.

Learning Points

  • Understanding classification algorithms.
  • Preprocessing data for machine learning models.
  • Evaluating model performance using metrics like precision, recall, and F1-score.

Dataset

Structure: The dataset includes customer demographic information, usage patterns, and churn status. Sample Data (100 records):

| CustomerID | Age | MonthlyCharges | Tenure | InternetService | Contract     | Churn |
|------------|-----|----------------|--------|-----------------|--------------|-------|
| 1          | 45  | 80             | 24     | Fiber optic     | Month-to-month | Yes   |
| 2          | 30  | 50             | 12     | DSL             | One year     | No    |
| ...        | ... | ...            | ...    | ...             | ...          | ...   |
| 100        | 55  | 95             | 36     | Fiber optic     | Two year     | No    |

Complete dataset: Available on this link to download click here

Input Explanation:

  • Age: Age of the customer.
  • MonthlyCharges: Monthly charges paid by the customer.
  • Tenure: Number of months the customer has been with the company.
  • InternetService: Type of internet service subscribed (DSL, Fiber optic, etc.).
  • Contract: Type of contract (Month-to-month, One year, Two year).
  • Churn: Target variable indicating whether the customer churned (Yes) or not (No).

Techniques Used

  • Data Cleaning and Handling Missing Values
  • Feature Engineering (e.g., creating new features like total charges)
  • Logistic Regression or Random Forest Classification

Step-by-Step Information

  1. Load the dataset: Load the dataset into a pandas DataFrame.
  2. Data cleaning: Handle missing values and outliers.
  3. Feature engineering: Create new features like total charges or average monthly usage.
  4. Split the data: Divide the data into training and testing sets.
  5. Model training: Train a classification model (e.g., logistic regression) on the training data.
  6. Model evaluation: Evaluate the model’s performance on the test data using accuracy score and classification report.

Code

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# Step 1: Load dataset
dataset_url = "https://raw.githubusercontent.com/goradbj1/dataairevolution/main/datasets/customer_churn_dataset.csv"
df = pd.read_csv(dataset_url)

# Step 2: Data cleaning (if needed)
# No explicit data cleaning is done here but if required you can do it

# Step 3: Feature Engineering
df = df.drop('CustomerID', axis=1)
label_encoder = LabelEncoder()
df['InternetService'] = label_encoder.fit_transform(df['InternetService'])
df['Contract'] = label_encoder.fit_transform(df['Contract'])
df['Churn'] = label_encoder.fit_transform(df['Churn'])

# Prepare input (X) and output (y) variables
X = df.drop('Churn', axis=1)
y = df['Churn']

# Step 4: Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Model training
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Step 6: Predictions
y_pred = model.predict(X_test)

# Step 7: Evaluation
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n {report}')

Future Work

  • Implementing ensemble methods like Gradient Boosting or XGBoost for improved performance.
  • Incorporating more customer interaction data (e.g., call logs, customer service interactions) to enhance prediction accuracy.
  • Using advanced feature selection techniques to identify the most predictive variables.

Conclusion

Predicting customer churn using machine learning enables businesses to proactively retain customers, thereby enhancing customer satisfaction and reducing revenue loss.

Data AI Admin

Senior AI Lead having overall Experience of 10+ years in IT, Data Science, Machine Learning, AI and related fields.

Related Posts

Exploring Different Pandas File Formats

Data handling and manipulation are fundamental in data analysis. Pandas, a powerful Python library, supports various file formats for reading, writing, and converting data. Understanding these formats and their benefits…

Read more

Making Beautiful Plots with Seaborn in Python

Welcome to the sixth tutorial in our series on data analysis with Python! In this article, we’ll introduce you to Seaborn, a powerful Python visualization library built on top of…

Read more

Leave a Reply

You Missed

Exploring Different Pandas File Formats

  • June 28, 2024
Exploring Different Pandas File Formats

Making Beautiful Plots with Seaborn in Python

  • June 28, 2024
Making Beautiful Plots with Seaborn in Python

Mastering Data Visualization with Matplotlib

  • June 28, 2024
Mastering Data Visualization with Matplotlib

Data Cleaning and Preprocessing with Pandas

  • June 27, 2024
Data Cleaning and Preprocessing with Pandas

Exploring Data with Pandas: Series and DataFrames

  • June 27, 2024
Exploring Data with Pandas: Series and DataFrames

NumPy : Basic Operations and Arrays

  • June 27, 2024
NumPy : Basic Operations and Arrays