Customer Category Predictor

June 13, 2025

Link to the GitHub repository

Customer Category Predictor - Process Overview

Below is an overview of the process used in this project:

Data Preparation
- Imported the dataset (train.csv) and examined its structure.
- Handled missing values by creating dedicated dummy variables to indicate whether a cell was NaN or not.
- Converted categorical variables into multiple dummy features to capture all distinct categories.
Feature Engineering
- Retained only relevant numerical and dummy variables for modeling.
- Removed unnecessary identifiers (like customer_id) and original categorical columns.
Train/Test Split
- Separated the dataset into training (80%) and testing (20%) subsets.
- Ensured a fixed random state for reproducibility.
Modeling
- Logistic Regression
  - Served as a baseline model to quickly assess predictive performance.
  - Provided straightforward interpretability of coefficients.
- Support Vector Machine (SVM)
  - Conducted hyperparameter tuning with GridSearchCV to identify optimal C and gamma values.
  - Evaluated multiple parameter combinations for improved performance.
Performance Evaluation
- Computed confusion matrices to examine true positives, true negatives, false positives, and false negatives.
- Calculated accuracy, recall, precision, and F1 scores to provide a comprehensive view of each model’s effectiveness.
ROC Curve Analysis
- Plotted ROC curves for both Logistic Regression and SVM on the same graph.
- Measured the Area Under the Curve (AUC) to summarize model performance at various classification thresholds.
- A higher AUC indicates that the model is generally better at distinguishing between the classes across different thresholds.
- The diagonal line (from (0,0) to (1,1)) in the ROC plot represents a random guess baseline. The closer a model’s ROC curve is to the top-left corner, the more effective it is at discriminating between positive and negative classes.

Share on

Mastodon Twitter Facebook LinkedIn

Henam Singla

Customer Category Predictor - Process Overview

Share on