Getting Started with Query Log Analysis

If you manage an ecommerce property and are responsible for site search performance, you may be wondering how get started measuring search. Site search has some unique characteristics but there are some simple ways to get started with measurable insights you can use to track performance as you make improvements to your search results. In this post, we’ll look at getting started with query log analysis.

Ecommerce content managers generally try to analyze user behavior on a site. The more data-oriented approaches to content management have had a big influence on the industry. But if you’re not coming from a deep analytics background, it can be hard to implement and use behavioral metrics. This can be extra daunting with site search. Many ecommerce directors I’ve spoken with have no idea what’s going on with site search. They’re just not sure customers are finding all the products on offer when they’re searching.

If site analytics are in place, it may be clear through click-tracking that abandonment in the search funnel is high, but why? Ad-hoc testing might show discrepancies between search results and the products they know for a fact are in the catalog. This is frustrating but I think having a process to start with search quality is really important. It gives us a place to start looking at search in a systematic way and a method for making and measuring incremental improvements.

If you’re actually tracking site abandonment from search, you’re already in a good spot. The first thing you want to do is establish some boundaries around your site search as a funnel. That means you have to treat the search experience as a marketing channel and measure it accordingly. There are a number of things we can measure but most of the metrics I recommend require some level of click-tracking to understand what your customers are actually doing on-site.

The first step is to analyze your query logs. Your query logs are just the list of your customer’s searches in descending order of frequency. So, in the case of apparel, you might have many searches for “jacket” and fewer for “bowtie”. You many only have one search for “purple bonnet”. The query logs are a list of searches with their frequencies.

"search terms", "count"
jackets, 2019
bowtie, 138
purple bonnet, 1

This list can be very long so I recommend you cap it at 1000 or 5000 depending on your search volume. This gives you enough data to analyze the common queries and a sample of uncommon ones. Typically, query frequency obeys a power law distribution, or, more specifically, Zipf’s Law. Put another way, most of the total searches done on your site will be for only a few common queries. This is really good news because it means you can divide the work into two separate areas of focus. The most frequent queries are call the “head” while the remainder of this list is called the “tail”.

When you divide the query frequency table into head and tail, you’ll have a short list of head queries and a long list of tail queries. If 50% of your searches are for the same 100 queries, you can focus your efforts to improve search on the head, since it’s easier to analyze and optimize 100 queries than 4900.

So, for the price of fixing 100 queries, you’ll have improved the experience of 50% of your searches. This is just an example but the basic idea is to bring data-analytic thinking to the task. You have to be able to measure search if you want to manage it effectively.

I like to divide a query log into deciles (10 buckets of search terms), where each bucket represents 10% of overall query volume. That way, you can work left-to-right, analyzing and optimizing each bucket. This guarantees you get the most benefit for the greatest number of searches as you go. Otherwise, you run the risk of fixing queries that, while not performing well, really only represent a small fraction of search traffic. Better to invest where you’ll have the greatest impact.

To sum up, the first step to data-driven search management is to measure your queries. From there, you can start to fix the most impactful queries and devise more general fixes for the long tail queries. Even though those queries may only appear once, there will be patterns in the data that can be addressed to cover a multitude of one-off queries. Beyond that, you can start to measure the effectiveness of the search funnel itself using metrics like:

These are just some of the ways to think about search quality in a data-centric way. Another big idea that most ecommerce analytics platforms can provide is revenue per search. Ideally your search click-tracking feeds into a good attribution model and you can attribute revenue (or partial revenue) to specific searches. If you have that, you can show another view of the query logs that buckets according to revenue rather than frequency.

I hope post has given you some ideas about how to think about search quality. I believe the first step in search quality management is to find ways to quantify the search experience. It’s not very productive to fight small fires in search results. Often, the underlying issue with one query might have far-reaching effects on unknown queries. You’ll never know until you start to analyze your data.