Pushing Content Out Fast with MongoDB 2.4.4
As some of you might know, Drupal is a great CMS system. But because it’s so full-featured, it’s also a bit slow. It’s not slow compared to other CMSs, but slow when used by itself for pushing content out fast.
For the video data system we’re building on Drupal, we needed an API backend that could be requested with timestamps and complicated filters. Doing this on Drupal would have caused the backend to be very slow, we’re talking about 5-20 requests per second on a single node. To overcome the speed problem, we started indexing all our content to an Apache SOLR server and served it out with a custom Node.JS application. This is fast, very fast even, running at around 800 requests / second on average on a single node.
The only drawback with Apache SOLR is that it’s very slow in updating its index. That wasn’t a problem until we had the request of adding a popularity sort order to the API. Obviously the popularity of the videos changes constantly. The customer was fine with a lag of 10-30 minutes, but with us updating all the 200 000+ items in the index, that couldn’t be reached, not even close. We looked at a lot of different options, including SOLR’s new join, partial index update and everything that could be found. None of these provided a solution.
Eventually we came to the conclusion that we needed another storage system to handle the content. MongoDB was already familiar to us, and used in some other role in the same system, so we started with Mongo 2.3.x and went from there. What we basically decided to do was to index everything to both SOLR and MongoDB and then use the SOLR index only for the full-text searches. Using Mongo’s new full-text search came to mind but with the language being Finnish and a lot of complicated extra steps being done in the indexing phase, we decided to let SOLR do what it does best. The full-text search was a minor and a separate feature anyway. MongoDB could easily handle all the other filters and some more, while allowing us to partially update the index very fast whenever we felt like it. And that was the plan we executed.
At first the results were promising, MongoDB was pushing numbers around 500 requests per second on our test bench. Then we had to add the ‘total number of hits’ that SOLR gives out on every request. In MongoDB that meant running the optimized count query in addition to the actual query. Suddenly the performance was gone, we were down to 10-20 requests per second. We did a lot of optimizing of the index and research on how to do count fast in MongoDB, but couldn’t get any faster than 40 r/s.
It was clear we needed a workaround. After every MongoDB query, we ran the same query on SOLR to get the count. And we were back in numbers, around 280 requests per second. This was sufficient, but horribly ugly, as hacks sometimes are. Also, the indexing in MongoDB and SOLR happens in a bit different pace, so the numbers didn’t necessarily match. We decided we have to live with that.
But then came the MongoDB 2.4.4. They said they fixed the count. ‘At certain cases you could see a significant gain in performance’. Would we be that certain case? So, carefully we installed the new MongoDB on our test bench. And lo and behold, the speed was increased from 40 requests to 480 requests per second instantly.
Finally we installed the fresh MongoDB to production and removed the ugly workaround. Now the system has been very stable in sending out content at incredible speeds, but still being able to change the popularity sort order every 10 minutes.
Lesson of the story? Always choose the right tool for the job and don’t get depressed if the tool doesn’t work right at first. It just might get fixed in the next update (And do submit a ticket if there isn’t one!).