AI-Powered Marketing Campaign Optimizer

Drive smarter marketing strategies with predictive modeling and A/B testing

Project Overview

This project uses machine learning and A/B testing to predict whether a customer will subscribe to a term deposit. By analyzing patterns in bank marketing data, we built an explainable model that identifies key customer traits, visualizes campaign effectiveness, and automates the entire workflow. It integrates feature selection, model training, evaluation, SHAP interpretation, and experimental design into one seamless pipeline.

Objective & Business Context

In an era of customer-centric marketing, traditional one-size-fits-all campaigns fail to achieve strong ROI. Businesses need intelligent systems to predict which customers are likely to convert and tailor strategies based on data-driven insights.

Marketing teams in banking often struggle to predict which customers are likely to respond to term deposit campaigns. Manual targeting can be inefficient and lead to missed conversions or over-marketing to uninterested users.

This project simulates an AI-powered system that:

  • Predicts which customers will subscribe to a term deposit (marketing success)

  • Identifies key features driving decisions (e.g., age, previous contact, balance)

  • Optimizes campaigns using feature-based segmentation and A/B/C group allocation

  • Evaluates campaign performance using statistical significance testing

  • Enhances explainability through SHAP plots for trust and insight

Business Value and Real-World Scope
  • Marketing Teams: Precision targeting, segment-specific messaging

  • Campaign Managers: Better allocation of effort, smarter promotions

  • Product Teams: Understand customer behavior patterns and bottlenecks

  • Strategy Leaders: Simulate and analyze A/B testing outcomes before investment

Implementation Flow
  • Source: UCI Bank Marketing Dataset (bank.csv)

  • Size: 11,162 rows, 17 columns

  • Target: deposit – binary classification (yes, no)

  • Notable Features:

    • Categorical: job, marital, education, contact, month, outcome

    • Numerical: age, balance, duration, campaign, previous

Dataset Overview
Data Preprocessing Workflow
  • Cleaned missing values

  • Label-encoded all categorical features

  • Scaled numerical columns using StandardScaler

  • Saved to bank_processed.csv

Feature Engineering
  • Performed Mutual Information analysis

  • Selected top 5 features influencing deposit behavior

  • Saved to bank_selected_features.csv

Model Training
  • Used RandomForestClassifier (100 trees)

  • Trained on selected features

  • Evaluated using accuracy score

  • Model saved as campaign_model.pkl

Model Evaluation
Campaign Optimization & SHAP Analysis
  • Generated classification_report and confusion_matrix

  • Accuracy: ~89%

  • Evaluation shown via CLI output

  • Extracted feature_importance and plotted

  • Used SHAP to explain model decisions

  • Visualized global summary plot for transparency

  • Output plots saved in reports/figures/

A/B/C Testing Simulation
  • Customers segmented into:

    • A: High Balance

    • B: General

    • C: Returning Customers

  • Campaign assignments made based on segments

  • Chi-Square test used to check statistical significance

  • Conversion rates plotted and saved

Automated Pipeline Execution
  • All scripts executed using automated_pipeline.py

  • Ensures reproducibility and streamlined deployment

Visual Analytics & Interpretations
  • feature_importance.png

    • Bar chart showing top 5 features influencing deposit predictions (e.g., duration, balance, month, etc.) via Random Forest feature importance.

  • shap_summary_campaign.png

    • SHAP interaction summary plot for the top feature (duration), revealing its effect on model output across all customers.

  • shap_summary_plot.png

    • SHAP interaction summary plot for another top feature (pdays), giving transparency into how the number of days since the last contact influences predictions.

  • ab_test_results.png

    • Bar plot of deposit conversion rates for Campaign Groups A, B, and C, derived from A/B/C segmentation strategy.

✅ A complete data preprocessing pipeline implemented in data_preprocessing.py that handles missing values, encodes categorical features, and scales numerical ones.
Feature selection logic using Mutual Information in feature_engineering.py with output saved to bank_selected_features.csv.
Model training script model_training.py that builds and saves a Random Forest model for predicting deposit conversions.
Model evaluation performed via model_evaluation.py with accuracy score, classification report, and confusion matrix printed for validation.
✅ A dedicated SHAP analysis pipeline (shap_explain.py) and integration in campaign_optimizer.py, generating deep visual explanations for model behavior.
✅ A campaign optimization script (campaign_optimizer.py) to visualize feature importance and top SHAP factors affecting customer decisions.
✅ An A/B/C testing module in ab_testing.py to simulate campaign performance and analyze conversion rates using Chi-Square statistics.
Automated end-to-end pipeline in automated_pipeline.py that executes all steps sequentially for ease of testing and reproducibility.
✅ A set of insightful visualizations saved under reports/figures/, including SHAP plots, feature importance charts, and A/B/C test results.
✅ All datasets (bank.csv, bank_processed.csv, bank_selected_features.csv) and trained models (campaign_model.pkl, campaign_model_pipeline.pkl) versioned and saved for reuse and deployment.
✅ Linked conceptual study document to explain the theory, design decisions, and real-world relevance of the entire solution.

Key Deliverables
Tools and Libraries Used
  • pandas – Data manipulation, cleaning, and loading CSV files

  • numpy – Numerical computations and array operations

  • scikit-learn – Machine learning algorithms (Random Forest), preprocessing (LabelEncoder, StandardScaler), evaluation metrics

  • matplotlib – Creating bar plots and SHAP visualizations

  • seaborn – Enhanced visualization styling for plots

  • shap – Model interpretability using SHAP value analysis

  • joblib – Saving and loading trained models

  • subprocess – Automating script execution in the pipeline

  • os – Handling file paths, directories, and validations

  • VS Code Terminal – Used to run individual scripts and monitor outputs step-by-step

  • Integrate automated hyperparameter tuning to boost model performance

  • Build a lightweight Streamlit dashboard for real-time marketing insights

  • Extend SHAP-based insights across customer segments

  • Containerize the full pipeline for scalable deployment

  • Introduce time-based trends to capture seasonal effects

Possible Next Steps & Conclusion
Conclusion

This project showcases how data-driven intelligence can elevate traditional marketing approaches by identifying what truly influences customer decisions. Through feature selection, predictive modeling, SHAP-based explainability, and conversion-focused A/B/C testing, we’ve built a robust and interpretable pipeline that mirrors real-world decision flows in campaign planning. Each module is modular, scalable, and geared for practical use, making it a solid fit for business teams and analysts. With clear outputs and actionable insights, this solution provides a strong foundation for smarter targeting, improved ROI, and long-term marketing efficiency.

Dive into the foundational concepts, algorithms, and real-world relevance behind this project. From machine learning principles to business strategy insights, this conceptual study bridges the gap between technical implementation and applied decision-making—helping you understand not just how it works, but why it matters.

Key Concepts
GitHub Repository

Want to dive deeper into how this project actually works?

We’ve made the complete codebase and resources available for you on GitHub

👉 Access the full repository here:

Whether you're a learner, recruiter, or collaborator — there's something for everyone.