Get Top results in fuzzy matches with Python

There is usually some scenario that inevitably comes up for me every couple years where I need to see if a phrase or word matches comparably with another one. However, there is the unfortunate circumstance (especially when dealing with thousands and thousands of possible matches) that the best match is not always the correct match.

So I hacked together this basically ugly set of functions to ultimately do one thing, return the top 3 results and their match percentage.

This is my early version of the code, you can modify it to fit your needs (I, for example actually also needed to keep track of ID number for my potentials so I had to have the second item in the potentials list actually be another list instead of just a string and I ended up matching the string inside of that innner list instead. You, you do whatever you like 🙂


#!/usr/bin/python
from difflib import SequenceMatcher

def fuzzymatch(a, b):
   # run a sequence matched ratio on lowercased versions of the two values
   return SequenceMatcher(None, a.lower(), b.lower()).ratio()

def bettermatch(percentage, matches):
    index = 0
    # loop through matches
    for match in matches:
        # bump index
        index = index + 1
        # check to see if this percentage is better
        if (percentage > match[0]):
            # we found a better match, return this index #
            return index

def addmatch(matches, matchdata, index):
    # be sure we have something actually better
    if not index:
        return matches
    # init new matches list
    newlist = []
    # init new index
    newindex = 0
    # loop through old matches
    for amatch in matches:
        # bump index
        newindex = newindex + 1
        # check to see if indices match
        if index == newindex:
            # they do! add in new matchdata
            newlist.append(matchdata)
        # add match to new list
        newlist.append(amatch)
    # pop the list
    newlist.pop()
    # finito! reutnr new list of matches
    return newlist


def topfuzzymatches(choice, potentials, top = 3):
    # init list of top # (default: 3) matches
    matches = []
    # add empty matches until we have our "top 3" results
    while top:
        matches.append([0, ''])
        top = top - 1
    # loop through the potentials
    for potential in potentials:
        # whats the chance of this potential match?
        chance = fuzzymatch(choice, potential)
        # See if this potential is a better match than what we have
        betterindex = bettermatch(chance, matches)
        if betterindex:
            # it is! add new entry to list
            matches = addmatch(matches, [chance, potential], betterindex)
    # return the top three matches
    return matches

Basically in your code let’s say you have a Device called “The Fraggle Widget ZT6578 Compressor”

Let’s also presuppose you have a massive list of potential devices and you want the top 5 results

Simply pass them into the topfuzzymatches() function and voila!

devicename = “The Fraggle ZT6578 Compressor”
top3results = topfuzzymatches(devicename, potentialdevices, 5)

That will return an array (or list rather) of the top 5 matches found in the potentialdevices list as well as their respective matching percentages in order from highest to lowest.

Belisarius Smith consults as a software engineer, cloud engineer, and security adviser. He has a BSBA in Security Management and is currently completing graduate studies in the Engineering Department at Penn State University with a Masters of Software Engineering. When he isn't traveling, mountain climbing, or reading, he spends his spare time on personal side projects and studies.

Leave a Reply