Recently, we were discussing moving away from out good ol’ Solr-based search engine to a more distributed environment on top of ElasticSearch, first thing I did is listing all the features I was actively using in Solr and compare them one-to-one with their ElasticSearch counterparts. I’m listing my full comparison below.
Note that this list is not a comprehensive list of all Solr features, rather it’s a realistic view of a live Solr setup.
Legend
✓ |
Feature is natively supported in ElasticSearch |
~ |
Feature is supported in ElasticSearch using a plugin and/or workaround |
× |
Feature is not supported in ElasticSearch |
Filters
Solr Feature |
Exists in ElasticSearch |
Notes |
Query by date range |
✓ |
|
Text search across multiple fields (copyField) |
✓ |
|
Filter on distance |
✓ |
|
Filter in/out boolean value |
✓ |
|
Filter in/out numeric value(s) |
✓ |
|
Complex boolean statements (combinations of AND/OR) |
✓ |
|
Sorting Flexibility
Solr Feature |
Exists in ElasticSearch |
Notes |
Ability to override sort by score |
✓ |
|
Sort by field(s) |
✓ |
|
Sort by custom function |
~ |
Using scripting |
Sort by custom query |
~ |
Using scripting |
Boost Functions and Custom Equations
Functions
Solr Feature |
Exists in ElasticSearch |
Notes |
Custom Query within functions |
✓ |
Using Scripting |
Term Frequency (tf) |
✓ |
Using Text scripting |
Inverse Document Frequency (idf) |
✓ |
Per shard (like Solr), Using Text scripting |
sum, sub, div, product |
✓ |
mvel scripting |
min, max |
✓ |
mvel scripting |
sqrt, pow, exp |
✓ |
mvel scripting |
abs |
✓ |
mvel scripting |
duration (ms) |
✓ |
mvel scripting – time() |
if-then-else block |
✓ |
mvel scripting |
default values (def) |
✓ |
mvel scripting |
distance |
✓ |
mvel scripting – distance(), arcDistance(), distanceInKm(), arcDistanceInKm() |
Additional Features
Data Import Handler
Solr Feature |
Exists in ElasticSearch |
Notes |
Feed from Postgresql |
~ |
External Plugin – JDBC River – Could not find complex joins or sub-entites This Link might be useful though |
Feed from Solr |
~ |
External Plugin – Solr River |
Feed from ElasticSearch |
× |
Couldn’t find elasticsearch-river-elasticsearch |
Custom functions and row modifications |
× |
All data should be mapped using the query (single-shot) |
Plugins and extensibility
Solr Feature |
Exists in ElasticSearch |
Notes |
Ability to create plugins |
✓ |
|
Post-processing ($skipDoc) |
× |
|
Plugin: Conditional Entities within dih/river |
× |
|
Plugin: Cached Entities within dih/river |
× |
|
Faceting
Solr Feature |
Exists in ElasticSearch |
Notes |
Facet by field |
✓ |
Term Facet |
Facet by query (multiple fields) |
✓ |
Query Facets |
Custom labels for facet result |
✓ |
|
Grouping / Variety
- SOLR: grouping works but breaks count (fixed by additional facets)
- ES: Field collapsing is not supported (ticket aged 4 years)
Spell Checking
- SOLR: Solr Implements an index-based spellchecker, which is considered rather weak.
- ES: Using Suggester component
Text Analysis Chain
Other
Solr Feature |
Exists in ElasticSearch |
Notes |
Both cached and non-cached filters |
✓ |
Using _cache:false |
Default field value (index-time) |
✓ |
Using null_value |
Debugging and Analysis
SOLR: Detailed scoring for each result, text analyzer emulator in admin
ES: Using Explain API or plugins
Conclusions
Features
Feature-wise, ElasticSearch is catching up very fast to Solr, in some aspects surpassing it as well especially with the new aggregation framework.
Speed
On a single node, speed is very similar. Although the stability and maturity of Solr makes it the most reliable choice on single-node applications.
Distributed Setup
I couldn’t test this thoroughly, and this will vary hugely based on server setup, ZooKeeper optimizations, and other factors. However, ElasticSearch has the better reputation in this domain since it is engineered from the ground up to support distributed and cloud-based environments.
Related
Solr encourages you to understand a little more about what you’re doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because you’re forced to read and modify the 2 well-documented XML config files in order to have a working search app.
I agree. Elasticsearch does hide a lot of these details (sometimes to a harmful way).
I recommend that people start learning using a do-it-yourself technology like solr. Once they get the concepts and start feeling that Solr’s config-first approach is redundant, they can move to a more abstracted solution like elasticsearch.
(This goes to other technologies to, bare-bone web development vs. full fledged MVC frameworks is a good example, too)