Home » Cloud » Phonetic Search for duplicates in Salesforce.com

Phonetic Search for duplicates in Salesforce.com

Soundex Search Screen

Introduction
Phonetic search is basically searching for terms which may be spelled differently but have similar pronunciation.
Though phonetic search  is quite common requirement, searches are often implemented in term of an exact match criterion with wildcards. I think if there is any application which must have phonetic search then it is CRM. Why?
Because CRM may contain contacts and leads with homo phonic (pronounced the same) names. Also, most of the times the data is manually entered by humans who are well known for the propensity to misspell. This creates a situation where duplicate contacts or leads are entered because there names are misspelled or a contact is not found because it is being searched for a misspelled name such as Smith instead of Smyth. As far as I know, Salesforce has neither implemented any phonetic search capability nor made any announcement to do so. But I am sure someday they will and for time being I have decided to implement it by using Soundex algorithm as shown above. I am searching for last name Branscomb and it has returned 4 records with Last names sounding like Branscomb.

Soundex Algorithm
Soundex algorithm is one of the simplest algorithms to implement. It was developed by Robert Russell and Margaret Odell in 1918. It generates four letter term (Alphabet+Three digits such as A230) for a given word called soundex code by applying following rules:

1. First letter of the word is the letter of the soundex code and not converted to a digit (number).  So for a Last name Smith it is “S”.

2.  Vowels are Not coded.

3.  b,f,p,v are converted to 1.

4. c, g, j, k, q, s, x, z are converted to 2.

5. d,t are converted to 3

6. l is converted to 4.

7. m,n are converted to 5.

8. r is converted to 6.

9 h,w are Not coded

10. Two adjacent letters with the same number are coded as a single number.
Letters with the same number separated by an h or w are also coded as a single number.

11. Continue until you have one letter and three numbers.
If length of the soundex code is less than 4, fill in zeroes until there are three numbers.

By applying these rules, Smith and Smyth both yield to soundex code S530.  Thus if you search for S530, both Smith and Smyth will be returned.

Though simplistic, soundex has limitations.

Design and Implementation

Salesforce design will be very straight forward and very similar to my last blog post to prevent duplicates.

Suppose we need to implement phonetic search for Lastname field.

1. A new custom field of type text and length 4 called LastNameSoundexKey__c will be defined.

2. Make sure that LastNameSoundexKey__c is defined as External Id. This way it is guaranteed to be  indexed.

3. Write an Apex class with a static method to convert any String to a Soundex Code.

4. Write a before insert/update trigger code to convert Lastname to soundex code and assign to LastNameSoundexKey__c.

5. Develop a simple Visual Force page as shown above where user may enter search term and a controller will convert the search term to a soundex code and make a SOQL query.  Thus if user is searching for Smith, controller will actually search for S530.

Code Snippet

Soundex Class:

/*
Copyright (c) 2012, Ardent Software, LLC.
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the Ardent Software, LLC. nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.

****

Note that this code may require some work before you can deploy it to a standard org.
*/
public class Soundex{
public static integer soundexLength = 4;

private static map<String,String> soundexMap = new map<String,String>{'A'=>'0','B'=>'1','C'=>'2','D'=>'3','E'=>'0',
'F'=>'1','G'=>'2','H'=>'0','I'=>'0','J'=>'2',
'K'=>'2','L'=>'4','M'=>'5','N'=>'5','O'=>'0',
'P'=>'1','Q'=>'2','R'=>'6','S'=>'2','T'=>'3',
'U'=>'0','V'=>'1','W'=>'0','X'=>'2','Y'=>'0','Z'=>'2'};
public static string toSoundex(String input){
String prevChar = ' ';
if (input == NULL || input.length() == 0){
return input;
}
String normStr = input.toUpperCase();
//Append first character to encoded string
String soundexStr = normStr.substring(0,1);
integer strLength = normStr.length();
for (integer i=1; i<strLength && soundexStr.length()<soundexLength; i++){
String key=normStr.substring(i,i+1);
String soundexChar = soundexMap.get(key);

if (soundexChar != NULL && !soundexChar.equals('0') && !soundexChar.equals(prevChar)){
soundexStr = soundexStr+soundexChar;
prevChar = soundexChar;
}
}
//Pad soundex string if the length is less than 4
while (soundexStr.length() < soundexLength){
soundexStr = soundexStr+'0';
}
return soundexStr;
}
}

Trigger code:

trigger updSoundex on Lead (before insert, before update) {
for (Lead l:Trigger.New){
l.LastNameSoundexKey__c = Soundex.toSoundex(l.LastName);
}

Controller Methods:

public void setQuery(){

generatedKey = Soundex.toSoundex(lastName);

doQuery();
}

public PageReference doQuery(){
leads = [Select l.Lastname
From Lead l
where l.LastNameSoundexKey__c = :generatedKey
ORDER BY l.Lastname ASC];
return null;
}

Soundex class can also be used to flag probable duplicates and merge them.

Advertisements

11 Comments

  1. Phil Brining says:

    Hi, are you interested in developing more complex algorithms for SF deduping?

    thanks

    Phil

  2. Do you plan on putting this in an unmanaged package in the future? Can you elaborate on the soundex limitations you briefly mentioned?

    • Arun Gaikwad says:

      Since source code is already published, there is no need to crate an unmanaged package.
      Soundex treats Cathey and kathey differently though they sound similar. This can be fixed with stronger algorithms.

  3. Mudit Garg says:

    Hi, i really apperciate which you had written.but i have some problem in our coding. could you please elobrorate the line which i mentioned below.

    String soundexStr = normStr.substring(0,1);

    I could not understand the meaning of substring(0,1).

    • Arun Gaikwad says:

      It gets the first character of the String, e.g. if the string is “Hello” then the substring(0,1) will be “H”.

      • Daniel says:

        Hi
        I really appreciate you posting up some logic in regards to this

        I am trying to develop a de-duper Catcher , I looked at the conventional methods eg dupeblocker etc, however it isn’t fit for purpose so need to develop one my self .

        So far I have created the following apex, which only looks at company name and if its an exact match it doesnt allow you to save the record.I was wondering do you think it’s possible to incorporate your idea into the following code below ? And would it successfully work ?

        For eg if I have a lead in the system called Burger King limited
        And some tried to add Burger King ltd, would it identify that it’s the same lead ?

        Trigger DuplicateLeadPreventer on Lead
        (before insert, before update) {
        //Get map of record types we care about from Custom Setting
        Map leadrtmap = Manage_Lead_Dupes_C__c.getAll();

        //Since only certain leads will match, put them in a separate list
        List LeadstoProcess = new List ();

        //Company to Lead Map
        Map leadMap = new Map();

        for (Lead lead : Trigger.new) {

        //Only process for Leads in our RecordTypeMap
        if (leadrtmap.keyset().contains(lead.RecordTypeId) ) {

        // Make sure we don’t treat an Company name that
        // isn’t changing during an update as a duplicate.

        if (
        (lead.company != null) &&
        (Trigger.isInsert ||
        (lead.company != Trigger.oldMap.get(lead.Id).company))
        )
        {

        // Make sure another new lead isn’t also a duplicate

        if (leadMap.containsKey(lead.company)) {
        lead.company.addError(‘Another new lead has the ‘
        + ‘same company name.’);
        } else {
        leadMap.put(lead.company , lead);
        LeadstoProcess.add(lead);
        }
        }
        } //end RT If Check
        } //End Loop

        /*
        Using a single database query, find all the leads in
        the database that have the same company address as any
        of the leads being inserted or updated.

        */

        Set ExistingCompanies = new Set ();
        for (Lead l: [Select Id, Company from Lead WHERE Company IN :leadMap.keyset()
        AND RecordTypeId IN :leadrtmap.keyset()]) {
        ExistingCompanies.add(l.Company);
        }
        //Now loop through leads to process, since we should only loop if matches
        for (Lead l : LeadstoProcess) {
        if (ExistingCompanies.contains(l.company) ) {
        l.company.addError(‘A lead with this company ‘
        + ‘name already exists.’);
        }
        }
        }

      • Arun Gaikwad says:

        Daniel,
        Phonetic Search is for names or words which are misspelled or spelled differently but sound similar.
        Burger King Limited and Burger King Ltd can not be caught by phonetic searching but Burger Keeng limeeted can be.

      • Daniel says:

        Hi Arun

        Thank you for tour swift response.Do you know what search would best meet my criteria ?

  4. Mudit Garg says:

    okay i got it.
    Thanks alot

  5. Daniel says:

    Hi
    I really appreciate you posting up some logic in regards to this

    I am trying to develop a de-duper Catcher , I looked at the conventional methods eg dupeblocker etc, however it isn’t fit for purpose so need to develop one my self .

    So far I have created the following apex, which only looks at company name and if its an exact match it doesnt allow you to save the record.I was wondering do you think it’s possible to incorporate your idea into the following code below ? And would it successfully work ?

    For eg if I have a lead in the system called Burger King limited
    And some tried to add Burger King ltd, would it identify that it’s the same lead ?

    Trigger DuplicateLeadPreventer on Lead
    (before insert, before update) {
    //Get map of record types we care about from Custom Setting
    Map leadrtmap = Manage_Lead_Dupes_C__c.getAll();

    //Since only certain leads will match, put them in a separate list
    List LeadstoProcess = new List ();

    //Company to Lead Map
    Map leadMap = new Map();

    for (Lead lead : Trigger.new) {

    //Only process for Leads in our RecordTypeMap
    if (leadrtmap.keyset().contains(lead.RecordTypeId) ) {

    // Make sure we don’t treat an Company name that
    // isn’t changing during an update as a duplicate.

    if (
    (lead.company != null) &&
    (Trigger.isInsert ||
    (lead.company != Trigger.oldMap.get(lead.Id).company))
    )
    {

    // Make sure another new lead isn’t also a duplicate

    if (leadMap.containsKey(lead.company)) {
    lead.company.addError(‘Another new lead has the ‘
    + ‘same company name.’);
    } else {
    leadMap.put(lead.company , lead);
    LeadstoProcess.add(lead);
    }
    }
    } //end RT If Check
    } //End Loop

    /*
    Using a single database query, find all the leads in
    the database that have the same company address as any
    of the leads being inserted or updated.

    */

    Set ExistingCompanies = new Set ();
    for (Lead l: [Select Id, Company from Lead WHERE Company IN :leadMap.keyset()
    AND RecordTypeId IN :leadrtmap.keyset()]) {
    ExistingCompanies.add(l.Company);
    }
    //Now loop through leads to process, since we should only loop if matches
    for (Lead l : LeadstoProcess) {
    if (ExistingCompanies.contains(l.company) ) {
    l.company.addError(‘A lead with this company ‘
    + ‘name already exists.’);
    }
    }
    }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: