Publication Strategies and Performance

evolross · August 2, 2022, 11:23pm

I’ve been working on our app in Meteor since 2013 and have spent years optimizing, refactoring, etc. Presently, our app’s pub/sub and Meteor method usage is highly customized and evolved to be very use-case specific. We use redis-oplog, rely heavily on its Vent functionality, lots of caching, and Mongo bulk inserts (with redis-oplog’s ability to then notify Meteor about outside mutations).

Everything works pretty good, but it’s a lot more involved and more code than core Meteor. And I’ve always wanted to get off of redis-oplog and get back to oplog tailing. I’ve noticed that some users seem to use it successfully. As long as their code is heavily optimized I assume.

I’m wondering if these newer publication strategies could allow us to go back to non-Redis oplog and simplify our code base.

One of the largest hurdles we have is anywhere from 1000 - 5000 simultaneous users (or often more) all hit the app at the same time (think real-time game show). They all access the same publication and Mongo document (e.g. a game document). This needs to be reactive, but it’s literally 100% cursor reuse for all the participants.

Back in the day before publication strategies, it was disclosed that the server kept a copy of each user’s subscription data in memory, regardless of publication cursor reuse. Thus quickly scaling the RAM to the ceiling on each server. It seems to me that an option like NO_MERGE_NO_HISTORY would work great for a single document publication where all we care about is changes to fields (e.g. game states). And less focus on a list of documents being added and removed. Does this publication strategy thus eliminate the RAM overhead?

Our current approach relies heavily on cached Meteor methods that debounce the same “game document” to each participant (thus bypassing Mongo for the 99% majority) but then annoyingly, albeit effective, relying on Redis Vent calls to keep reactive data in sync.

Anyone have any thoughts or input on the above? Meteor methods are nice because they can be cached and don’t cause a lot of overhead on the server, but they’re not reactive. Pub/sub is reactive, but historically had too much overhead.

The second issue is the flood of participation. Presently, we accept participant responses in a method that collects the responses on the server and only bulk updates them to Mongo on a rolling interval. Then we notify Meteor of the bulk updates (that are actually outside mutations at this point) using Vent.

It would be nice to continue using this approach, but without Redis Vent, is there anyway to update Meteor about outside mutations. As I’m betting bulk inserts don’t jive with native Meteor oplog tailing.

znewsham · August 3, 2022, 6:19am

There are a couple pieces to this.

At least with redis-oplog (it’s been a really long time since I used the regular oplog), there are n + 1 copies of a document in memory for any publication (assuming your publication takes no arguments), where n is the number of subscriptions. 1 copy per subscription for the merge box then 1 for the multiplexer. Moving to something like NO_MERGE_NO_HISTORY will reduce that to 1 copy

You could actually have achieved this even before the publication strategies:

Meteor.publish("myPublication", function() {
  Collection.find().observe({
    added: (doc) => {
      this._session.sendAdded(Collection._name, doc._id, doc);
    },
    changed: (newDoc) => {
      this._session.sendChanged(Collection._name, doc._id, doc);
    }
});

You can actually then do some funky stuff to remove the cost of the multiplexer copy of the memory (making some assumptions, this doesn’t always make sense, but it sometimes does).

The use of publication strategies doesn’t mean you can’t continue to use redis-oplog if you want.

Bulk updates should work with the regular oplog - either “just work” or with minimal tweaks to the oplog package - Mongo’s oplog is used to synchronise data between replica set members - so it doesn’t matter what operation you use, if it modifies data the oplog will contain all the information necessary to replicate that modification. The only time I’ve found weirdness with the oplog’s representation of an action is with something like this:

Collection.rawCollection().update({}, [{ $set: { x: "$y" } }])

E.g., using aggregation stages in an update - the resulting entry in the oplog will be a replace operation (e.g., an update where the entire document changes).

While redis-oplog probably isn’t the right use case for a singleton document you outline above - its value becomes more apparent when dealing with either disjoint sets of observers - e.g., a “multi-tenant” (not necessarily tenant) system where different servers observe small subsets of a collection - the cost of observing the oplog notification, just to throw it away because this server doesn’t care about the update is quite high. It’s this cost that causes the regular oplog not to scale well. As you increase the number of servers, you increase the number of wasted oplog notifications observed. This is redis-oplog’s primary value, it’s use of channels (and namespaces) massively limits this.

Depending on your exact use case - you might be able to use change streams instead of either redis-oplog or the regular oplog package - this is more targeted than the regular oplog package, but works off of the oplog in the same way:

Collection.rawCollection().watch([{$match: someSelectorThisServerCaresAbout }])
.on("change", (event) => {
  // handle the event
});

There are substantial performance implications (from the mongo side) of using change streams - so you’d want to only use them for specific situations. The best use case would be if you had some noisy collections you wanted to exclude (or the inverse). A secondary use case would be something like if you had a multi-tenant database but single tenant servers (for some strange reason) - each server could then observe the oplog for a specific tenancy.

Depending on which piece of the redis-oplog implementation you don’t like (e.g., the observe or the vent functionality) you might be able to use this to auto-publish the events to redis (negating the need of vent, but keeping the same per-server scalability)

paulishca · August 3, 2022, 7:52am

Hi @evolross,

did you ever consider this: meteor-streamer/README.md at master · RocketChat/meteor-streamer · GitHub

rjdavid · August 3, 2022, 9:08am

Seems the effort between meteor-streamer and redis-oplog vent is the same. The only difference is that one has data kept in memory (redis) while one is always broadcast to all

rjdavid · August 3, 2022, 9:15am

https://jira.mongodb.org/plugins/servlet/mobile#issue/SERVER-42116

radekmie · August 3, 2022, 11:31am

I’ve never used Vent of Redis Oplog, but have used Redis Oplog quite extensively. I treat it as a drop-in replacement and to make it truly possible, I suggest trying out oplogtoredis, that @znewsham already linked. (It uses virtually no CPU whatsoever, really.)

I’ve never used different publication strategies either, and honestly don’t see a need for that. One of the app I take care of is truly publications-heavy and yet it works with the stock configuration + Redis Oplog + oplogtoredis.

Last thing that I’d suggest is making sure that you’re actually reusing observers. It’s something I already covered in 🚀 Meteor Scaling/Performance Best Practices - #3 by radekmie.

evolross · August 3, 2022, 3:16pm

Copy all. It doesn’t sound like publication strategies help with the traditional oplog tailing problem. Thanks for the heads up on oplogtoredis. Will look into that.

veered · August 3, 2022, 5:43pm

For this single publication, could you do:

NO_MERGE_NO_HISTORY
Switch the cursor to use polling mode
Set pollingIntervalMs as low as required (this will determine the maximum amount of time the published data could be out-of-date)
Increase/decrease pollingThrottleMs as required (this will make it so that even if a server does a bunch of db updates then that server still won’t immediately republish)
Disable redis oplog for this collection (configured using SomeCollection.configureRedisOplog as shown in the redis oplog docs)

Whether or not this strategy works is dependent on (a) how expensive the polling is and (b) how much latency in the reactivity is acceptable.

evolross · August 3, 2022, 6:17pm

The main game document publication needs low latency. As the host triggers views, game states, etc. and those need to update to the participants as quickly as possible. So polling wouldn’t work for that. Makes sense when updating more passive fields (e.g. response counts, etc.) but not for others.

I guess was wanting to check in on the state of best practices in Meteor. I was getting the impression from bug reports (like this) that people were using plain oplog tailing again. Wanted to check into that. Sounds like everyone is still leveraging redis-oplog for reactivity.