Exploring Fuzzy String Match In Various Database Management Systems.

Arnav Khandelwal
3 min readNov 30, 2022

A popular topic in computer science is fuzzy matching, often known as approximate string matching. Additionally, it is a technique that provides a better way to distinguish between two pieces of text, strings, or entries that are roughly similar but not the same.

In other words, if an exact match cannot be made between two phrases or sentences in a database, a fuzzy technique may be used. This approach looks for matches that are more numerous than the match percentage cutoff specified by the application.

Now, Why do we need this? We Need fuzzy matching to find similar data, as people can sometimes make mistakes in entering data or they might not know the exact term. Take this for example:-

Let’s consider my name, which is “Arnav”. Some people write “Arnab” or “Arnaav”. Apart from these, my name can also be represented as:

  1. Arnav Khandelwal
  2. Arnav K.
  3. Mr. A. Khandelwal

As you can see, there are many ways my name could be represented in a document. If we go with exact string matching, there is a high chance that my name would not show up. Hence, the requirement for fuzzy matching pops up.

In this article, we will be talking in brief about 2 major database systems, that can perform fuzzy searching: Redis and MongoDB.

For demonstration purposes, we will be using a dummy dataset for running queries.

Redis

Redis is an in-memory data structure store that is open source (BSD licenced), used as a database, cache, streaming engine, and message broker. Redis has a variety of data types, including bitmaps, hyperloglogs, geographic indexes, streams, lists, sets, and sorted sets with range queries. It is one of the fastest database as it uses In-memory datasets.

Fuzzy Matching is performed based on Levenshtein distance (LD) in redis. And its syntax is:

FT.SEARCH idx:consumers "@Name: (%%Ravi%%)"

Where idx:consumers represent the Index and the % corresponds to the edit distance. More number of % signs would mean more edit distance (As of v1.4.0).
Now, let us run the query:

Fig 1. Redis Query Result

You can see that the data with the Name field similar to “Ravi” have returned.

MongoDB

For storing large amounts of data, we can use MongoDB, which is a document-oriented NoSQL database. Unlike other database Management systems, MongoDB uses collection and documents, instead of rows and columns. Collections can be assumed as relational database tables, as they store sets of documents.
The data (document) is stored as key-value pairs in MongoDB.
Presently in MongoDB, Only its cloud service: MongoDB Atlas provides Fuzzy Searching capabilities, In its aggregation Text search.

db.details.aggregate([
{
$search: {
"index": 'usersidx',
"text": {
"path": "Name”,
"query": "Ravi",
"fuzzy": {"maxEdits": 2, "maxExpansions": 100}
}
}
},
{
$project: {
"_id": 0,
"Name": 1,
"Aadhar":1,
score: { $meta: "searchScore"}
}
}
])

Here, Path is the field name and query is the query we are searching for. In the fuzzy, you can see that there are two fields: maxEdits and maxExpensions.

maxExpansions: The most no. of variations that should be generated and searched for. This cap is based on a token-by-token basis. 50 is the default value and 100 is the max value.
maxEdits: This represents the maximum no. of edits that are required to match the term. Its default and max value is 2.
Now, Let’s run this query:

Fig 2. MongoDB Query Result

We can see that all results similar to “Ravi” are returned.

Conclusion

In this article, we have seen how we can perform fuzzy string matching in some of the most famous Database Management Systems. There are more DBMSs such as ElasticSearch, Postgress, etc which also provide Fuzzy Matching. More or less all database systems that provide fuzzy search follow similar patterns. You can choose which one to opt for as per your requirement.

--

--