Returning Related Results with Query Expansion

When working with non-search experts, I try to spend time explaining key search concepts so that we understand what’s possible with search and what trade-offs we can expect from different search strategies. One idea that most people seem to be familiar with is the use of synonyms for query terms. Query synonyms are examples of a more general techniqe called query expansion.

A query is a list of one or more words. But the words we read are actually placeholders for the underlying “meaning” of the word. Often words will be ambiguous. They may have multiple meanings only detectable by the context in which they’re used.

At the same time, computers are very literal-minded. A search engine will look at the words used in the query and look for documents containing those words, and only those words. Sometimes this conflicts with the expectations of human language users who know how to interpret the meaning of a given word, and expand that word to include other meanings or other words that could mean the same thing.

Sometimes, documents that use alternate words would be just as good, or relevant in some way to the query. Without knowing these alternate words, the search engine won’t be able to retrieve these documents. In that case, recall is harmed, since not all relevant documents are returned. This is an illustration that recall is about the relevance of the documents to the underlying “information need”, rather than the expression of that need in the query keywords. So, we want to relax the restriction that these and only these terms express that underlying information need.

We can use synonyms (alternate words for the same thing), or conceptually related ideas, to expand the query. In practice, the query is augmented with additional terms that broaden the meaning of the query to capture more related content. Expanding the query has the effect of “relaxing” the criteria for inclusion in the search documents (recall).

There are various ways to implement query expansion using synonyms but the basic idea is that the original query becomes more complex.

"dog bites man" => "(dog OR canine OR pet) bites (man OR woman OR person)"

But it’s not all roses. Sometimes synonymous terms are only synonyms in a related context. Synonyms are usually configured on a per-term basis. As each term is encountered, any synonyms are looked up and added to the query. If we use a less restrictive rule such matching any terms then the synonym can lead to confusing results. Remember that the end-user didn’t actually use these terms, we took the liberty of assuming they were equivalent. This isn’t always the case and the results can include top hits for the hidden terms. Be careful when you use synonyms. Search experts have been bitten by synonyms enough times that we use them with caution. As with all powerful tools, they can cut you.

I hope this explanation clears up confusion you might have about this common technique in search relevance optimization.