Hello - having an issue with our application where the CPU usage will keep spiking out of control and cause our app to 499 or 502 until it has recovered itself.
At first we had found that were pushing many operations into the oplog and we thought that was bogging down our system, so we set METEOR_OPLOG_TOO_FAR_BEHIND from 2000 to 250, and finally all the down to 100. As this went on, we saw a little bit of alleviation of the spikes, yet they were still happening pretty frequently over the course of a day.
Next, we had found that we had an update operation that was pretty huge - literally eating up all of our oplog, so we have batched that into smaller chunked updates that were also now appropriately diffing, so it drastically reduced the size of the update. Again - fixing this seemed to help somewhat with the server spiking, yet it was still happening much too frequently.
At this point, we have turned the oplog off entirely to see if we still experience the same spiking happening. This is less than ideal as we would love to keep the reactivity of oplog tailing, however if it keeps crashing our app servers, will not be worth it.
The fact that we had to turn down oplog tailing from 2000 to 250 and again to 100 without much improvement seems indicative that something else is going on within our application. After searching for over a week with enhancements being pushed and made almost daily, we are running out of ideas on what kind of things could be causing this. We reduced the operation sizes, optimized queries, and offloaded application code to no avail. If anyone else has any insight into this, that would be much appreciated.