Predicting Marketing Campaign with R

In my last blog I created a mechanism to fetch data from Salesforce using rJava and SOQL. In this blog I am going to use that mechanism to fetch ad campaign data from salesforce and predict future ad campaign sales using R

Let us assume that Salesforce has campaign data for last eight quarters.  This data is Total Sales generated by Newspaper, TV and Online ad campaigns and associated expenditure as follows:

Sales            Newspaper   TV           Online

1 16850           1000           500           1500

2 12010             500           500             500

3 14740           2000           500             500

4 13890          1000          1000           1000

5 12950          1000            500             500

6 15640            500          1000           1000

7 14960          1000          1000           1000

8 13630            500          1500            500

Thus, quarter# 1 indicates that $1000, $500 and $1500 were spent on Newspaper, TV and Online ad campaigns respectively and total sales during that quarter was $16,850.

First step is find out if there is any relationship with sales and advertising expenditure. The tool I am going to use is Regression Analysis. In order to perform regression analysis, I am going to fetch data using rJava as follows:


library(rJava)  # Load rjava library
.jnit()         # Initialize java

sfObj=.jnew("SalesforceHelper") # Instantiate java object
CampaignVector=sfObj$queryObject("SELECT Sales__c,Newspaper__c,TV__c,Online__c from CampaignData__c") 

Campaigndata<-getSFDataFrame(CampaignVector) # Convert vector to R data frame

(For more information on how to integrate R and Salesforce, please refer my previous blog at: http://www.r-bloggers.com/r-and-salesforce/
or https://arungaikwad.wordpress.com/2012/02/25/r-and-salesforce/)

Let us ask R to perform regression analysis on Campaigndata using lm() function as follows:

attach(Campaigndata) 
Campaignmodel<-lm(Sales~Newspaper+TV+Online) # perform regression

After performing regression analysis, R is going to give me relationship in form of following equation:

Total Sales = (sales with no advertising) + (newspaper contribution per dollar*newspaper expenditure)+(TV contribution per dollar*TV expenditure)+(Online contribution per dollar*Online expenditure)

sales with no advertising is called Intercept while each contribution is called coefficient.

R will also gives information on how meaningful or strong this relationship is, with R^2(R squared).
As you can see that campaign manager will be interested to know per dollar contribution by each adverting medium. In other words, how much sales will be generated for each dollar of expenditure.

Let us find out this information from our model

> Campaignmodel

Call:
lm(formula = Sales ~ Newspaper + TV + Online)

Coefficients:
(Intercept)    Newspaper           TV       Online  
  9561.4286       1.2465       0.9193       3.5161  
>

Sales without advertising (Rounded) = $9562
Newspaper returns = $1.25 per $1
TV returns = $0.92 per $1
Online returns““= $3.52 per $1 of expenditure. (Clearly a winner)

But how strong is the model? Let us find out

> summary(Campaignmodel)

Call:
lm(formula = Sales ~ Newspaper + TV + Online)

Residuals:
       1        2        3        4        5        6        7        8 
  308.32  -392.36   467.95 -1353.29   -75.59  1019.94  -283.29   308.32 

Coefficients:
                    Estimate        Std.   Error     t value           Pr(>|t|)   
(Intercept)        9561.4286         1700.5869        5.622             0.00492 **
Newspaper             1.2465            0.8100        1.539             0.19865   
TV                    0.9193            1.0766        0.854             0.44126   
Online                3.5161            0.9584        3.669             0.02141 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 938.2 on 4 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared: 0.7879,     Adjusted R-squared: 0.6289 
F-statistic: 4.954 on 3 and 4 DF,  p-value: 0.0781 

Look at the t-values for each advertising medium. t value more than 2 is strong. Again Online advertising has strongest relationship with sales while TV has the weakest. Also Multiple R-squared of 0.7879 indicates it’s a good model.

Obviously I am not expecting real world campaign or account managers to analyze R output. So, I created one related custome object called Campaign_predictor__c with custom fields as follows:

Sales_without_ad__c NUMERIC initialized to 9562.00,
Newspaper_expenditure__c NUMERIC,
TV_expenditure__c NUMERIC,
Online_expenditure__c NUMERIC
Predicted_sales__c FORMULA = (Sales_without_ad__c)+(3.52*Online_expenditure__c)+(1.25*Newspaper_expenditure__c)+(0.92*TV_expenditure__c)
Prediction_probability__c = 78

Now the managers have to just plug in the values and predict the sales. Suppose the manager has $3000 to spend on ad campaigns and based on model decides to allocate $2000 to Online $500 to Newspaper and $500 to TV. The predicted sales with 78% probability is:

$9562 + (3.53*2000)+(1.25*500)+(0.92*500) = $17,707

Thus we can move complex predictive analytics from the realm of super specialists and statisticians to marketing and sales managers using R and Salesforce.com

16 thoughts on “Predicting Marketing Campaign with R

  1. hello,
    nice work on the java side (previous post), but on the modeling side, your “prediction” seems quite dangerous to me…
    it misses error margin, checking of lm() hypotheses (sales following a normal distribution…), and above all interaction effects between different marketing channels and with previous sales and ad expenditures (time series modeling),
    furthermore, it is long known that sales do not respond to advertising in a linear way (check ADBUG model for instance)
    your sales manager has some chance to be disappointed…

    1. Dr. Willart,
      Thanks for your feedback.
      The purpose of this post was to demonstrate how one can harness power of R and integrate it with Sales and Marketing.
      If the is no linear relationship, then R will indicate it with t analysis.

      -Arun Gaikwad

      1. You could use this linear regression model for rough descriptive purposes, but like the DrSylWil pointed out it is not good for prediction. Because in that case, shouldn’t the manager then not focus only on online advertising to optimize the advertising expenditures?

  2. > Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.

    What? Can you clarify what you mean by that sentence please?

    1. Please see line# 21: Multiple R-squared: 0.7879
      R-square is between 0 and 1.
      R-Squared is a statistical term saying how good one term is at predicting another. If R-Squared is 1.0 then given the value of one term, you can perfectly predict the value of another term. If R-Squared is 0.0, then knowing one term doesn’t not help you know the other term at all. More generally, a higher value of R-Squared means that you can better predict one term from another.

      1. Yes I am aware of the definition of r-squared. I asked for an explanation of your wording, which – I am sorry to be blunt – makes no sense to me.

        Can you explain how any of that definition relates to “there is 79% probability with respect to predictability of the model” – which I still cannot follow.

        79% probability of *what*?

        what do you mean by ‘predictability of the model’ here?

  3. I think this is among the most important info for me. And i am glad reading your article. But wanna remark on few general things, The site style is ideal, the articles is really great : D. Good job, cheers

  4. Very interesting. I just started working with SFDC and also see a limitation in its analytic ability. What is your news step/goal for this R integration? Would it be possible to push information back up to SFDC and display with better graphics?

  5. A very interesting article and approach. I’m just starting on this journey so appreciate you sharing. Are you available for projects?

Leave a comment