Apex and OAuth

OAuth has become a programming gateway to most of the applications such as Facebook, LinkedIn, Twitter, Klout or DropBox. Without OAuth it is impossible to invoke their APIs. OAuth provides a very convenient and safe way to authenticate 3rd party applications on your behalf. So for example if I want to develop an application to search Facebook and LinkedIn for a Contact just added in Salesforce and if found, insert the social information in contact object.

How can I do this?

Theoretically, I will write a trigger which will invoke an asynchronous web service call out to Facebook and LinkedIn APIs. But in order make this call, the user must authenticate my application. In other words user must permit my application to run on his/her behalf and query Facebook and LinkedIn. This can only be achieved by OAuth.

Using OAuth especially 1.0 version of OAuth could be extremely tedious assuming you are using the libraries such as Python and Java. Without these libraries it could be ten times more difficult. And unfortunately there are no libraries written in Apex for Salesforce developers. So I had decided to write my own and completed code for 2.0 version and tested for Facebook. I will be publishing my code in my future blogs. Please give me your suggestion. Constructive criticism is highly appreciated.

 

Predicting Marketing Campaign with R

In my last blog I created a mechanism to fetch data from Salesforce using rJava and SOQL. In this blog I am going to use that mechanism to fetch ad campaign data from salesforce and predict future ad campaign sales using R

Let us assume that Salesforce has campaign data for last eight quarters.  This data is Total Sales generated by Newspaper, TV and Online ad campaigns and associated expenditure as follows:

Sales            Newspaper   TV           Online

1 16850           1000           500           1500

2 12010             500           500             500

3 14740           2000           500             500

4 13890          1000          1000           1000

5 12950          1000            500             500

6 15640            500          1000           1000

7 14960          1000          1000           1000

8 13630            500          1500            500

Thus, quarter# 1 indicates that $1000, $500 and $1500 were spent on Newspaper, TV and Online ad campaigns respectively and total sales during that quarter was $16,850.

First step is find out if there is any relationship with sales and advertising expenditure. The tool I am going to use is Regression Analysis. In order to perform regression analysis, I am going to fetch data using rJava as follows:


library(rJava)  # Load rjava library
.jnit()         # Initialize java

sfObj=.jnew("SalesforceHelper") # Instantiate java object
CampaignVector=sfObj$queryObject("SELECT Sales__c,Newspaper__c,TV__c,Online__c from CampaignData__c") 

Campaigndata<-getSFDataFrame(CampaignVector) # Convert vector to R data frame

(For more information on how to integrate R and Salesforce, please refer my previous blog at: http://www.r-bloggers.com/r-and-salesforce/
or https://arungaikwad.wordpress.com/2012/02/25/r-and-salesforce/)

Let us ask R to perform regression analysis on Campaigndata using lm() function as follows:

attach(Campaigndata) 
Campaignmodel<-lm(Sales~Newspaper+TV+Online) # perform regression

After performing regression analysis, R is going to give me relationship in form of following equation:

Total Sales = (sales with no advertising) + (newspaper contribution per dollar*newspaper expenditure)+(TV contribution per dollar*TV expenditure)+(Online contribution per dollar*Online expenditure)

sales with no advertising is called Intercept while each contribution is called coefficient.

R will also gives information on how meaningful or strong this relationship is, with R^2(R squared).
As you can see that campaign manager will be interested to know per dollar contribution by each adverting medium. In other words, how much sales will be generated for each dollar of expenditure.

Let us find out this information from our model

> Campaignmodel

Call:
lm(formula = Sales ~ Newspaper + TV + Online)

Coefficients:
(Intercept)    Newspaper           TV       Online  
  9561.4286       1.2465       0.9193       3.5161  
>

Sales without advertising (Rounded) = $9562
Newspaper returns = $1.25 per $1
TV returns = $0.92 per $1
Online returns““= $3.52 per $1 of expenditure. (Clearly a winner)

But how strong is the model? Let us find out

> summary(Campaignmodel)

Call:
lm(formula = Sales ~ Newspaper + TV + Online)

Residuals:
       1        2        3        4        5        6        7        8 
  308.32  -392.36   467.95 -1353.29   -75.59  1019.94  -283.29   308.32 

Coefficients:
                    Estimate        Std.   Error     t value           Pr(>|t|)   
(Intercept)        9561.4286         1700.5869        5.622             0.00492 **
Newspaper             1.2465            0.8100        1.539             0.19865   
TV                    0.9193            1.0766        0.854             0.44126   
Online                3.5161            0.9584        3.669             0.02141 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 938.2 on 4 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared: 0.7879,     Adjusted R-squared: 0.6289 
F-statistic: 4.954 on 3 and 4 DF,  p-value: 0.0781 

Look at the t-values for each advertising medium. t value more than 2 is strong. Again Online advertising has strongest relationship with sales while TV has the weakest. Also Multiple R-squared of 0.7879 indicates that there is 79% probability with respect to predictability of the model.

Obviously I am not expecting real world campaign or account managers to analyze R output. So, I created one related custome object called Campaign_predictor__c with custom fields as follows:

Sales_without_ad__c NUMERIC initialized to 9562.00,
Newspaper_expenditure__c NUMERIC,
TV_expenditure__c NUMERIC,
Online_expenditure__c NUMERIC
Predicted_sales__c FORMULA = (Sales_without_ad__c)+(3.52*Online_expenditure__c)+(1.25*Newspaper_expenditure__c)+(0.92*TV_expenditure__c)
Prediction_probability__c = 78

Now the managers have to just plug in the values and predict the sales. Suppose the manager has $3000 to spend on ad campaigns and based on model decides to allocate $2000 to Online $500 to Newspaper and $500 to TV. The predicted sales with 78% probability is:

$9562 + (3.53*2000)+(1.25*500)+(0.92*500) = $17,707

Thus we can move complex predictive analytics from the realm of super specialists and statisticians to marketing and sales managers using R and Salesforce.com

R and Salesforce

Introduction

R is widely used among scientists and statisticians to perform statistical analysis while Salesforce.com is one of the leading CRM software packages used for Marketing and Salesforce automation. Salesforce.com contains vital information regarding Leads, Customers, Contacts, Opportunities and Cases. Currently this data is mainly used for operational purposes by Sales and Marketing professionals. How about using this data for predictive analysis or forecasting?

Unfortunately Salesforce.com built on a multi-tenant architecture, limits system usage heavily. Only way to achieve this is to read Salesforce data using Java Web Services API provided by Salesforce and feeding that data to R.

One way is to export salesforce data to csv file and importing the data into R.
But this will be quite tedious and time consuming. It would be nice if one can query Salesforce.com data directly from R and create a data frame or matrix for performing analyses.

rJava

Fortunately Java code can be directly invoked from R using rJava package. Values returned by Java methods are available in R session memory and can be mapped to equivalent R data structures. So, theoretically it must be possible to invoke a java method from R which connects to Salesforce.com and returns the required data.

Let us consider a trivial use case where a java method accepts String argument containing SOQL (Salesforce.com equivalent of SQL) and execute that statement returning data as a list of lists such as vector of vectors.

The first step is to install rJava as follows:

Make sure that Java 1.6 or higher is installed and all the environment variables are pointing to correct directories or folders. (Such as CLASSPATH and PATH)

Start R

install.packages(“rJava”)

Once the installation is successful, try

.jnit().

This command will initialize Java.

myRObj=.jnew(“myJavaobj”) will crate a java object from your java class.

For detailed referece, please refer: http://cran.r-project.org/web/packages/rJava/rJava.pdf

Now, I am going to develop a java class say, SalesforceHelper.java having a method queryObject.

GetObject method will accept 1 String argument:SOQL statement. (Salesforce.com Equivalent of SQL) such
as Select Id, Name From Lead.

Force.com Web Service Connector (WSC)

Salesforce.com provides extensive java api to perform database query or updates. In order to develop SalesforceHelper java class mentioned above, I am going to use WSC-20.jar along with enterprise.wsdl.

To download enterprise.wsdl, login to your salesforce.com org and click on Setup.

Click on Develop->API.

Click on enterprise.wsdl.

You may download wsc-20.jar from the project url:

http://code.google.com/p/sfdc-wsc/downloads/list

Once you download wsc-20.jar, you need to create enterprise.jar as follows:

java -classpath wsc-20.jar com.sforce.ws.tools.wsdlc enterprise.wsdl enterprise.jar

Add wsc-20.jar and enterprise.jar files to your java CLASSPATH environment variable.

Let us write a java class to connect to Salesforce.com, perform SOQL and return a Vector containing Vectors of rows which in turn contain the selected field values.
Thus if our query is SELECT Name, Salary__c from Lead LIMIT 20, then our java method will return Vector containing 20 Vectors, each containing 2 objects of type String and Double.

/*
Copyright (c) 2012, Ardent Software, LLC.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the Ardent Software, LLC. nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.

****

Note that this code may require some work before you can deploy it to a standard org.
*/
import com.sforce.soap.enterprise.Connector;
import com.sforce.soap.enterprise.DeleteResult;
import com.sforce.soap.enterprise.EnterpriseConnection;
import com.sforce.soap.enterprise.Error;
import com.sforce.soap.enterprise.QueryResult;
import com.sforce.soap.enterprise.SaveResult;
import com.sforce.soap.enterprise.sobject.SObject;
import com.sforce.soap.enterprise.sobject.AggregateResult;
import com.sforce.ws.ConnectionException;
import com.sforce.ws.ConnectorConfig;
import java.util.*;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

public class SalesforceHelper {

public class Main {

  static final String USERNAME = "USER";
  static final String PASSWORD = "PASSWD+SEC TOKEN";
  static final String ENDPOINT = "https://login.salesforce.com/services/Soap/c/23.0";
  static EnterpriseConnection connection;

  public static void main(String[] args) {

  public static Vector queryObject(String qry) throws Exception {
    ConnectorConfig config = new ConnectorConfig();
    config.setAuthEndpoint(ENDPOINT);
    config.setUsername(USERNAME);
    config.setPassword(PASSWORD);

    Vector rowContainer = new Vector();
    String delims = "[ ,]+"; //Parse query tokens with these delimiters
    String[] tokens = qry.split(delims);
    String tableName=null;
    ArrayList colNames = new ArrayList();
    ListIterator listItr = colNames.listIterator();
    for (int i=1;i<tokens.length;i++){ //Start with 1 because 0 will be Select
        if (tokens[i].equalsIgnoreCase("from")){
            tableName=tokens[i+1].trim();
            break;
       }else{
            colNames.add(tokens[i].trim());
       }
   }
   Object[] cols = colNames.toArray();
   System.out.println("columns:"+cols+" Object:"+tableName);
   connection = Connector.newConnection(config);

   QueryResult queryResults = connection.query(qry);

   if (queryResults.getSize() > 0) {
      for (int i=0;i<queryResults.getRecords().length;i++) {
           Vector vrow = new Vector();
           String obj="com.sforce.soap.enterprise.sobject."+tableName;
           SObject  sfObject = (SObject)queryResults.getRecords()[i];
           Class<?> cl = Class.forName(obj);
          for (int j=0;j<cols.length;j++){
               String methodName="get"+(String)cols[j]; //Method Name will be get<colname> such as getName()
               Method m = cl.getMethod(methodName);
               vrow.add(m.invoke(sfObject)); //Invoke method to get the field value
               System.out.println("Row#"+i+": "+vrow);
          }
         // Add row to row container
         rowContainer.add(vrow); //Add Row to the container vector.
      }
  }

return rowContainer;
 }

}

Now we have to write a function in R which will accept one argument and return a data frame as follows:

getSFDataFrame<-function(arr){
arr<-arr$toArray()
rows<-arr$length
cols<-arr[[1]]$size()
df<-data.frame()
for(i in 1:rows){
  for (j in 1:cols){
    if (arr[[i]]$toArray()[[j]] %instanceof% "java/lang/String"){
        df[i,j]<-as.character(arr[[i]]$toArray()[[j]]$toString())
     }else if (arr[[i]]$toArray()[[j]] %instanceof% "java/lang/Double"){
        df[i,j]<-as.double(arr[[i]]$toArray()[[j]]$toString())
     }
  }
 }
return(df)
}

Currently this function is handling only String and Double object types. But it can be very easily extended to handle all the object types.
Also, this function is using java reflection to execute methods such as toArray() and toString().
Using reflection in this manner is very inefficient. Better way is to use Java Native Interface (JNI).
Please refer rJava reference as mentioned above.

Now let us get our Lead data as follows:

library(rJava)
.jnit()

sfObj=.jnew("SalesforceHelper")
vec=sfObj$queryObject("SELECT Name, Salary__c From Lead LIMIT 20")

LeadDataFrame<-getSFDataFrame(vec)

We now have Lead data frame in R which can be saved for further analysis.

Next Step

Performing some meaningful analysis on Lead data such as Regression Analysis to find out if any lead attributes are directly related to conversion.

Salesforce.com and Analytics

Salesforce.com has become one of the most successful cloud applications. I am quite astounded by it’s mega hit penetration into myriad of industries.  It is being used by leading organizations not only to implement their customer relationship management system but also to develop their own applications running on cloud. But complete absence of meaningful analytical implementation flabbergasts me. Of course Salesforce.com has built in dashboards. But it’s analytical capability is very limited.

For example,if I need to segment my customers using clustering, how do I do that? How can I run simple regression analyses? Or how to perform predictive modeling? Suppose I have a historical data of my leads. Some of them were won while some were lost. The leads which were won, may have some common characteristics such as Age, Education, Job function etc. I can feed this data along with my current prospects to find out which of my current prospects are close to those leads and hence highly likely to be converted.

This is quite impossible to implement using native salesforce.com platform due to the limitations of Apex language and governor’s limit. However it is quite possible to take this data and feed it to R which may be running on another cloud such as Amazon Elastic Cloud and perform the required analysis.

Currently I am exploring the possibility of integrating salesforce data with R. I will keep you posted.