Our search index has grown in the last few months by 20% and our JVM and Solr setups were beginning to groan under the weight of the data. I went through a few rounds of JVM tuning, which reduced garbage collection time to less than 2%, and with some Solr configuration options managed to bring our typical query back under 5 seconds. This felt like a major win, until I adjusted the query.
Looking at our query parameters on search I noticed we were using the “fq” parameter to specify the id of the particular site we were looking for. These queries were taking anywhere from 5-15 seconds across our 360GB index, and I suspected that we were pulling in data to the JVM only to filter it away. The garbage collection graphs seemed to indicate this as well, since we had a very slow growing heap, and our eden space was emptying very quickly even with 20G allocated to it. When I changed from dismax to the standard target and specified the site id, I noticed my search time went from 5 seconds to .06 seconds, so started reading, and came across an article on nested queries. My idea was that this would allow me to apply a constraint to the initial set of data returned, using the standard search target, and then perform a full text search using dismax and achieve the same results.
Original Query(grossly simplified):
http://search-server/solr/select?fl=title%2Csite_id%2Ctext&qf=title%5E7+text&qt=dismax&fq=site_id:147&timeAllowed=2500&q=SearchTerm+&start=0&rows=20"
Becomes the following nested query:
http://search-server/solr/select?fl=title%2Csite_id%2Ctext&qf=title%5E7+text&timeAllowed=2500&q=site_id:147+_query_:%22{!dismax}SearchTerm%22&start=0&rows=20
Original Query Time : 5 seconds
Nested Query Time : 87 milliseconds
Both return identical results. So, if performing a query against a large index and you want to use dismax, you should try using a nested search. You’re likely see much better performance, particularly if you’re filtering based on a facet. And this gives you a relatively easy way to specify the value of a field, and still want to use a dismax query.