Predicting Customer Churn Using Machine Learning
Introduction
Predicting customer churn is critical for businesses to retain customers and maintain profitability. This project focuses on predicting customer churn using machine learning techniques.
Business Use Case
Problem: A telecom company wants to predict which customers are likely to churn based on their usage patterns.
Importance: Identifying potential churners helps in targeted retention strategies.
Impact: Reduced customer attrition and increased customer lifetime value.
Learning Points
- Understanding classification algorithms.
- Preprocessing data for machine learning models.
- Evaluating model performance using metrics like precision, recall, and F1-score.
Dataset
Structure: The dataset includes customer demographic information, usage patterns, and churn status. Sample Data (100 records):
| CustomerID | Age | MonthlyCharges | Tenure | InternetService | Contract | Churn |
|------------|-----|----------------|--------|-----------------|--------------|-------|
| 1 | 45 | 80 | 24 | Fiber optic | Month-to-month | Yes |
| 2 | 30 | 50 | 12 | DSL | One year | No |
| ... | ... | ... | ... | ... | ... | ... |
| 100 | 55 | 95 | 36 | Fiber optic | Two year | No |
Complete dataset: Available on this link to download click here
Input Explanation:
- Age: Age of the customer.
- MonthlyCharges: Monthly charges paid by the customer.
- Tenure: Number of months the customer has been with the company.
- InternetService: Type of internet service subscribed (DSL, Fiber optic, etc.).
- Contract: Type of contract (Month-to-month, One year, Two year).
- Churn: Target variable indicating whether the customer churned (Yes) or not (No).
Techniques Used
- Data Cleaning and Handling Missing Values
- Feature Engineering (e.g., creating new features like total charges)
- Logistic Regression or Random Forest Classification
Step-by-Step Information
- Load the dataset: Load the dataset into a pandas DataFrame.
- Data cleaning: Handle missing values and outliers.
- Feature engineering: Create new features like total charges or average monthly usage.
- Split the data: Divide the data into training and testing sets.
- Model training: Train a classification model (e.g., logistic regression) on the training data.
- Model evaluation: Evaluate the model’s performance on the test data using accuracy score and classification report.
Code
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
# Step 1: Load dataset
dataset_url = "https://raw.githubusercontent.com/goradbj1/dataairevolution/main/datasets/customer_churn_dataset.csv"
df = pd.read_csv(dataset_url)
# Step 2: Data cleaning (if needed)
# No explicit data cleaning is done here but if required you can do it
# Step 3: Feature Engineering
df = df.drop('CustomerID', axis=1)
label_encoder = LabelEncoder()
df['InternetService'] = label_encoder.fit_transform(df['InternetService'])
df['Contract'] = label_encoder.fit_transform(df['Contract'])
df['Churn'] = label_encoder.fit_transform(df['Churn'])
# Prepare input (X) and output (y) variables
X = df.drop('Churn', axis=1)
y = df['Churn']
# Step 4: Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 5: Model training
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Step 6: Predictions
y_pred = model.predict(X_test)
# Step 7: Evaluation
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n {report}')
Future Work
- Implementing ensemble methods like Gradient Boosting or XGBoost for improved performance.
- Incorporating more customer interaction data (e.g., call logs, customer service interactions) to enhance prediction accuracy.
- Using advanced feature selection techniques to identify the most predictive variables.
Conclusion
Predicting customer churn using machine learning enables businesses to proactively retain customers, thereby enhancing customer satisfaction and reducing revenue loss.