Thinking of Switching Search Engines

The big gorillas of open-source search engines are Solr and Elasticsearch. Solr is the older of the two and has been managed as an Apache open-source project for as long as I can remember. Elasticsearch is younger and has distinguished itself for its data analytics support as well as search. Both search engines use the same underlying Lucene index for the core search capabilities, and in many ways, they have feature parity. Nonetheless, many organizations may be considering a switch from Solr to Elasticsearch, or from Elasticsearch to Solr. Factors such as team skills or platform consolidation play a role in these initiatives. But what should a project team expect when planning to make the switch? What’s the resource commitment for a business to change search engines?

The answer depends on a several factors. Solr and Elasticsearch are very similar products in terms of function, so if I were asked for my guidance, I’d want to know why they want to switch. If they’re unhappy with one or the other, it’d be better to try to fix the unhappiness than to just switch and hope things are better with the new product. Again, they perform the same function and their differences are pretty esoteric.

Another question is what level of investment the business has made in one or the other. If Solr or Elasticsearch is a line-of-business app then there are likely to be other systems that depend on it. You’d have to identify those dependencies and have some kind of plan to switch them, too. If there are integrations that are custom to one or the other, they will not “just work”. Solr and Elasticsearch have very different APIs. Solr typically uses a DSL interface. That’s a Domain Specific Language that is particular to Solr, not an industry-standard. So you’d have to translate any other application logic that expects one or the other way of calling your search application. Elasticsearch, on the other hand, uses a JSON API which you have to configure for each query. Solr has a new JSON API as well but it is limited compared to its DSL, so you may not be able to do everything you want with it. And, it’s not compatible with Elasticsearch’s JSON API.

So why might you want to switch? A couple of reasons come to mind. In some orgs both Solr and Elasticsearch are used by different divisions and IT wants to standardize on one platform. This seems like the most legitimate reason. In that case, you have to consider which platform you have a deeper investment in and what your primary use cases are. Elasticsearch has a sweetspot in data analysis, whereas Solr is arguably stronger in very complex search scenarios. That said, it’s not a crime to run multiple platforms. The history or an organization, its projects, and acquisitions often lead to having competing platforms in house. If you don’t need to consolidate the data, consider whether it’s worth the effort to consolidate the technology.

If you don’t have an experienced in-house team, you have to consider who is going to maintain and manage the platform. You may have IT staff that knows Solr or Elasticsearch but they will have to transfer those skills. Depending on what you’re doing you may have significant DevOps requirements. Elasticsearch has a very good scaling and DevOps story, while Solr is a little less graceful from a DevOps point of view.

To sum up, think hard before switching between these platforms. They are very comparable across a number of different features. If you can justify switching, you’ll need someone to itemize all the dependencies on the search platform to ensure you don’t break anything. And you’ll need people who know how to work with your chosen platform and port your existing queries to the new format.

In the case where there is a legacy search engine in place and the business or tech team wants to move to either Solr or Elasticsearch, the switching costs are likely even greater than switching between those platforms. The common Lucene core of Solr and Elasticsearch means there is a lot of overlapping concepts and data structures. With non-Lucene search engines, you’ll need to ensure that all the features you’re using have a corollary in the target platform and reconfigure your queries on that target platform accordingly. You’ll also need to transfer any technical expertise from the old platform to the new, which includes hosting and operational management of the new system. And last, you’ll need to get your documents/skus into the new system. This is the process of reindexing. For either Solr or Elasticsearch, there is an expected format that the source data will need to be in. If you have access to the source documents, you’ll need to ensure the fields you want searchable are configured during reindexing. Solr has some support for extracting content from raw documents and mapping it into your target configuration. With Elasticsearch, you’ll need to do that on your own. If your existing platform already extracts source data into a structured format, you could piggyback on that by then translating that format to the Solr or Elasticsearch format. In some cases, that could be simpler than starting with the raw documents. But if your data updates frequently, you’ll want to put an indexing process in place that can work with your source data. Solr has an edge with simple indexing scenarios, since it has pretty powerful content extraction capabilities out-of-the-box, whereas you’ll have to set up a separate process (and maintain it) if you go with Elasticsearch.

I hope that gives you some idea of what’s involved when switching search engines. It can definitely be done, but it’s a non-trivial amount of work and requires a lot of planning. Definitely not something to take on lightly.