In my last blog I created a mechanism to fetch data from Salesforce using rJava and SOQL. In this blog I am going to use that mechanism to fetch ad campaign data from salesforce and predict future ad campaign sales using R

Let us assume that Salesforce has campaign data for last eight quarters. This data is Total Sales generated by Newspaper, TV and Online ad campaigns and associated expenditure as follows:

Sales Newspaper TV Online

1 16850 1000 500 1500

2 12010 500 500 500

3 14740 2000 500 500

4 13890 1000 1000 1000

5 12950 1000 500 500

6 15640 500 1000 1000

7 14960 1000 1000 1000

8 13630 500 1500 500

Thus, quarter# 1 indicates that $1000, $500 and $1500 were spent on Newspaper, TV and Online ad campaigns respectively and total sales during that quarter was $16,850.

First step is find out if there is any relationship with sales and advertising expenditure. The tool I am going to use is Regression Analysis. In order to perform regression analysis, I am going to fetch data using rJava as follows:

library(rJava) # Load rjava library .jnit() # Initialize java sfObj=.jnew("SalesforceHelper") # Instantiate java object CampaignVector=sfObj$queryObject("SELECT Sales__c,Newspaper__c,TV__c,Online__c from CampaignData__c") Campaigndata<-getSFDataFrame(CampaignVector) # Convert vector to R data frame

(For more information on how to integrate R and Salesforce, please refer my previous blog at: http://www.r-bloggers.com/r-and-salesforce/

or http://arungaikwad.wordpress.com/2012/02/25/r-and-salesforce/)

Let us ask R to perform regression analysis on Campaigndata using lm() function as follows:

attach(Campaigndata) Campaignmodel<-lm(Sales~Newspaper+TV+Online) # perform regression

After performing regression analysis, R is going to give me relationship in form of following equation:

Total Sales = (sales with no advertising) + (newspaper contribution per dollar*newspaper expenditure)+(TV contribution per dollar*TV expenditure)+(Online contribution per dollar*Online expenditure)

sales with no advertising is called Intercept while each contribution is called coefficient.

R will also gives information on how meaningful or strong this relationship is, with R^2(R squared).

As you can see that campaign manager will be interested to know per dollar contribution by each adverting medium. In other words, how much sales will be generated for each dollar of expenditure.

Let us find out this information from our model

> Campaignmodel Call: lm(formula = Sales ~ Newspaper + TV + Online) Coefficients: (Intercept) Newspaper TV Online 9561.4286 1.2465 0.9193 3.5161 >

Sales without advertising (Rounded) = $9562

Newspaper returns = $1.25 per $1

TV returns = $0.92 per $1

Online returns““= $3.52 per $1 of expenditure. (Clearly a winner)

But how strong is the model? Let us find out

> summary(Campaignmodel) Call: lm(formula = Sales ~ Newspaper + TV + Online) Residuals: 1 2 3 4 5 6 7 8 308.32 -392.36 467.95 -1353.29 -75.59 1019.94 -283.29 308.32 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9561.4286 1700.5869 5.622 0.00492 ** Newspaper 1.2465 0.8100 1.539 0.19865 TV 0.9193 1.0766 0.854 0.44126 Online 3.5161 0.9584 3.669 0.02141 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 938.2 on 4 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.7879, Adjusted R-squared: 0.6289 F-statistic: 4.954 on 3 and 4 DF, p-value: 0.0781

Look at the t-values for each advertising medium. t value more than 2 is strong. Again Online advertising has strongest relationship with sales while TV has the weakest. Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.

Obviously I am not expecting real world campaign or account managers to analyze R output. So, I created one related custome object called Campaign_predictor__c with custom fields as follows:

**Sales_without_ad__c NUMERIC initialized to 9562.00,**

**Newspaper_expenditure__c NUMERIC,**

**TV_expenditure__c NUMERIC,**

**Online_expenditure__c NUMERIC**

**Predicted_sales__c FORMULA = (Sales_without_ad__c)+(3.52*Online_expenditure__c)+****(1.25*Newspaper_expenditure__c)+(0.92*TV_expenditure__c)**

**Prediction_probability__c = 78**

Now the managers have to just plug in the values and predict the sales. Suppose the manager has $3000 to spend on ad campaigns and based on model decides to allocate $2000 to Online $500 to Newspaper and $500 to TV. The predicted sales with 78% probability is:

**$9562 + (3.53*2000)+(1.25*500)+(0.92*500) = $17,707**

Thus we can move complex predictive analytics from the realm of super specialists and statisticians to marketing and sales managers using R and Salesforce.com

hello,

nice work on the java side (previous post), but on the modeling side, your “prediction” seems quite dangerous to me…

it misses error margin, checking of lm() hypotheses (sales following a normal distribution…), and above all interaction effects between different marketing channels and with previous sales and ad expenditures (time series modeling),

furthermore, it is long known that sales do not respond to advertising in a linear way (check ADBUG model for instance)

your sales manager has some chance to be disappointed…

Dr. Willart,

Thanks for your feedback.

The purpose of this post was to demonstrate how one can harness power of R and integrate it with Sales and Marketing.

If the is no linear relationship, then R will indicate it with t analysis.

-Arun Gaikwad

You could use this linear regression model for rough descriptive purposes, but like the DrSylWil pointed out it is not good for prediction. Because in that case, shouldn’t the manager then not focus only on online advertising to optimize the advertising expenditures?

> Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.

What? Can you clarify what you mean by that sentence please?

Please see line# 21:

Multiple R-squared: 0.7879R-square is between 0 and 1.

R-Squared is a statistical term saying how good one term is at predicting another. If R-Squared is 1.0 then given the value of one term, you can perfectly predict the value of another term. If R-Squared is 0.0, then knowing one term doesn’t not help you know the other term at all. More generally, a higher value of R-Squared means that you can better predict one term from another.

Yes I am aware of the definition of r-squared. I asked for an explanation of your wording, which – I am sorry to be blunt – makes no sense to me.

Can you explain how any of that definition relates to “there is 79% probability with respect to predictability of the model” – which I still cannot follow.

79% probability of *what*?

what do you mean by ‘predictability of the model’ here?

I want meeting utile info, this post has got me even more info! .

I think this is among the most important info for me. And i am glad reading your article. But wanna remark on few general things, The site style is ideal, the articles is really great : D. Good job, cheers

Intriguing blog – thank you. You often publish a interesting article. I hope to find others very soon.

Very interesting. I just started working with SFDC and also see a limitation in its analytic ability. What is your news step/goal for this R integration? Would it be possible to push information back up to SFDC and display with better graphics?

Thanks!

Yes. The goal is to create model in SF and send it to R for analysis and update SF

Marvelous blog. You consistently publish a riveting post. I wish to uncover more such in the near future.

Kudos for this post. Pretty intriguing and well penned blog. Thanks!

very interesting points you have mentioned , appreciate it for putting up.

A very interesting article and approach. I’m just starting on this journey so appreciate you sharing. Are you available for projects?