Targeted Marketing Campaign
Targeted marketing identifies an audience likely to buy services or products and promotes those services or products to that audience. Once these specific groups are recognized, companies develop marketing campaigns and specific products for those preferred market segments. This means, that the different promotional messages or advertisements will be sent to a specific group of people in order to avoid the extra expenses of sending out these promotions to segments without keeping the specific characteristics under consideration
These characteristics are based on the product. For example, if we consider the Smartwater brand Glaceau which is owned by Coca-Cola used Targeted Marketing for its vitamin-enriched water. They used the characteristics like age, interests in health and fitness to narrow down their customers. This helped them grow by 28 percent in less than a year and now, it is the number 1 premium water brand in the US.
Targeted marketing is better than mass marketing since it reduces the expenses and also converts a large number of audience.
We have created a model for the same, which actually goes through the data provided by the customer and looks at the different features provided and after training it predicts the response of the customers on the test data.
We used the dataset from https://www.kaggle.com/rodsaldanha/arketing-campaign
In the dataset, we have numerous columns like \’marital\’, \’education\’ and much more which actually describe the customers and on the basis of these different descriptions, we find out the relation of these with the response.
Exploratory Data Analysis
We analyse the different columns of the data with each other to extract out valuable insights from the data. For the \’DT_Customer\’ column which indicates the date, we change the datatype from object to date-time format using pandas
The different numerical and categorical columns are grouped together
numerical = [features for features in data.columns if data[features].dtypes != \'O\']categorical = [features for features in data.columns if data[features].dtypes == \'O\']print(\'Numerical Features are: \', numerical)print(\'Categorical Features are: \', categorical)
Purpose of doing this is to find the interdependencies between the different features and which feature impacts the response or the target variable. Let\’s check the relation between features for the same.
We first plot the categorical features.
Plotting these categorical columns with the response variable
From the above graphs we observe:
- Education as Basic or 2n cycle hasn\’t responded to any campaign. Only a few from Basic have responded
- Proportion of response is better for Phd as compared to Master and Graduation. Graduation also has a fair share of people who gave a positive response
- Proportion of response is better for Single as compared to Married and Together.
We also observe the same columns vs the different columns which indicate as to when the customer responded to those marketing campaigns. With these graphs we will be able to gain insights in a way that we can have a target audience to actually send out these marketing campaigns to. This ensures increasing profits since we have already reduced our expenses and we will be targeting those customers who actually give a positive response. But for this we group the data such that we get the customers with positive responses only
data_y = data[data.Response==1]pd.crostab(index=data_y[\'AcceptedCmp1\'], columns=data_y[\'Marital_Status\']pd.crostab(index=data_y[\'AcceptedCmp1\'], columns=data_y[\'Marital_Status\'].plot(kind=\'bar\', figsize=(8,8), stacked=True)plt.show()pd.crostab(index=data_y[\'AcceptedCmp1\'], columns=data_y[\'Education\']pd.crostab(index=data_y[\'AcceptedCmp1\'], columns=data_y[\'Education\'].plot(kind=\'bar\', figsize=(8,8), stacked=True)plt.show()
From the above graph we observe:
- Divorced (14%), Together (16%) : around 20% responded in 1st attempt
- Single : 23% of Single has responded in first attempt
- Married : 32% of Married have responded in first attempt
- Widow : 21% responded in first attempt
- 2n Cycle : 31% responded in 1st attempt
- Graduation: 27% responded in 1st attempt
- Master : 17% responded in 1st attempt
- PhD : 20% responded in 1st attempt
So, we can infer all these observations from plotting these two categorical columns with the different \’AcceptedCmp\’ columns. Refer to the EDA notebook for more such graphs.
However, the observations from those graphs are:
- 2n cycle : 31% responded in 1st attempt , almost 0% responded in 2nd , 7% in 3rd, 4% in 4th , 4% in 5th
- Basic : 0% in 1st , 0% in second , 11% in 3rd , 0% in 4th , 0% in 5th
- Graduation : 7% responded in 1st attempt, 1.4% in 2nd , 7% in 3rd , 7% in 4tbh, 7% in 5th (Graduates has equal chance of responding so multiple attempts are useful)
- Master : 4% in 1st , 0% in 2nd , 10% in 3rd , 12% in 4th , 10% in 5th (Masters will respond better if we reach out to them multiple times)
- PhD : 6% responded in 1st attemp1, 2% in 2nd , 8% in 3rd , 9% in 4th , 8% in 5th (Phd have equal chance of responding so multiple attempts are useful)
Furthermore, we start working on the numerical columns. We performed a univariate anaylsis of those columns.
We furthermore group the data by age and view the graphs for positive reponse
data[\'Age\'] = 2020 - data[\'Year_Birth\']data[\'Age\'].head()x = data[data[\'Response\']==1]x_30 = x[x[\'Age\']<=30]x_60 = x[(x[\'Age\']>30) & (x[\'Age\']<=60)]x_100 = x[x[\'Age\']>60]
From the above three graphs, we can move our focus to a specific age group in order to push towards our targeted marketing campaign.
Feature Engineering
One hot encoding is performed on the categorical columns. We get indicator variables from those categories as a new column and drop the original column.
def onehot_encode(df, column): df = df.copy() dummies = pd.get_dummies(df[column], prefix=column) df = pd.concat([df, dummies], axis=1) df = df.drop(column, axis=1) return df
We also preprocess the data. In this, we add missing values to the columns using their mean. The \’Dt_Customer\’ column is converted to a date time format from which we extract out the year, month and day as columns. Using train_test_split, we get train and test sets and we apply standard scaler to both X_train and X_test.
def preprocess_data(df): df = df.copy() df = df.drop(\'ID\', axis=1) df[\'Income\'] = df[\'Income\'].fillna(df[\'Income\'].mean()) df[\'Dt_Customer\'] = pd.to_datetime(df[\'Dt_Customer\']) df[\'Year_Customer\'] = df[\'Dt_Customer\'].apply(lambda x: x.year) df[\'Month_Customer\'] = df[\'Dt_Customer\'].apply(lambda x: x.month) df[\'Day_Customer\'] = df[\'Dt_Customer\'].apply(lambda x: x.day) df = df.drop(\'Dt_Customer\', axis=1) for column in [\'Education\', \'Marital_Status\']: #categorical data df = onehot_encode(df, column=column) y = df[\'Response\'] X = df.drop(\'Response\', axis=1) X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1) scaler = StandardScaler() scaler.fit(X_train) X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns) X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns) return X_train, X_test, y_train, y_test
Models
Now, we observe the different models which can be used to fit this train dataset and get accurate results on the test set. We used different models like: linear and logistic regression, decision trees, random forest, and even a deep learning neural network model. The use of GridSearchCV helped us in finding the hyperparameters which were specific to each algorithm from that we found the best models for the dataset.
The accuracy metrics that we used is mean squared error. The maximum accuracy achieved on the was 89% using deep learning neural network followed by 88% using Logistic Regression and 86% and 85% respectively for Random Forest and Decision Trees.
We used GridSearchCV in each of the algorithms that we applied. For random forest, we tuned the max_depth and n_estimators hyperparameters from which we obtained the final best random forest classifier. We used those best parameters to extract the final classifier.
Same method was applied with Decision Trees, Linear Regression and Logistic Regress. Since, this problem is not a regression one, we obtained very low accuracy using Linear Regression even after using GridSearchCV to calculate the best hyperparameters.
However, for the other two, we used these methods which eventually helped with finding the best algorithms that help in achieving such high accuracies as mentioned above.