Meteor Scaling - Redis Oplog [Status: Prod ready]

ramez · November 3, 2018, 8:09pm

But regadless, two hits to the DB is costly. The cost of redis-oplog should be lighter.

imagio · November 3, 2018, 9:48pm

After some digging I have figured out that my reactive publishers are re-triggering wayyy too often.

For example I have a meteor method that pings the server every once in awhile and updates a lastSeenAt on my user. For some reason this is invalidating a publication autorun that specifically doesn’t use lastSeenAt! The computation only uses one field on my user that almost never changes, but when there is an update to my user on an unrelated field the autorun triggers anyway.

This is almost certainly what is causing my problems. Every time one of these reactive publishers triggers I get “Performing initial add for observer. Completed initial add for observer. Re-using existing publication” from redis-oplog. Looking at the source it seems that “performing initial add” might be a fairly expensive operation as it loops over all docs.

Is it possible that redis-oplog is invalidating my reactive publish computations on any update to the channel instead of only updates to fields that the computation uses?

imagio · November 4, 2018, 12:41am

Aha! I had some {fields: {}} instead of {fields: {_id: 1}}. I also started using this format for my reactive publications with peerlibrary:computed-field

// actually, this does NOT work
Meteor.publish("myReactivePub", function(){
    console.log("Started myReactivePub")
    const intermediateIds = new ComputedField(function () {
        console.log("PUBLICATION recomputing intermediateIds")
        return SomeCollection.find({ someField: true }, { fields: {_id: 1} }).fetch().map(d => d._id);
    }, EJSON.equals);
        
    this.autorun(function () {
        console.log("PUBLICATION rerunning myReactivePub")
        return SomeCollection.find({_id: {$in: intermediateIds()}})        
    })
})

This results in the minimal amount of recomputation. Something can cause intermediateIds to recompute but if the output does not change then the autorun is not invalidated and the publication does not change.

edit: hmm this actually does NOT work. I need to do some more digging to figure out what is going wrong.

I am still a little unclear on controlling reactivity in my meteor methods. @diaconutheodor Could you provide an example of how to do some updates with {pushToRedis: false} and then later update the subscriptions?

ramez · November 4, 2018, 3:22am

@imagio

I am guessing the reason the method call (and subsequent db change) impacted your reactive pubs is your choice of fields to push in your pubs. Once we tightened that and pushed only the really needed fields, we reduced needless computes. And also, {reactive: false} is critical for us as we optimize what should be reactive vs. non-reactive.

An idea for you, is to have smart pubs. Pass arguments so that the returned fields are exactly what you need (in case you have similar pubs). You can also look at @diaconutheodor graphql library if you want to heavily customize your pub fields.

imagio · November 4, 2018, 4:05pm

@diaconutheodor any idea why my technique using peerlibrary:computed-field to constrain reactivity did not work? With that technique it seemed that the publication would reactively compute only on the first update and then would stop.

diaconutheodor · November 5, 2018, 9:09am

Try the branch that’s ready. It should work. (https://github.com/cult-of-coders/redis-oplog/pull/283)

imagio · November 5, 2018, 12:25pm

I tried that branch but it doesn’t work at all with reactive publishers. I filed an issue on GitHub. Docs get added to the client and then immediately removed.

diaconutheodor · November 5, 2018, 12:48pm

That sounds like an optimistic ui issue, make sure that you don’t have it disabled or something.

imagio · November 5, 2018, 12:51pm

I don’t have it disabled. I don’t think it is optimistic UI because this is happening upon initial subscription. What I mean is that upon initial subscription the server sends all the correct docs to the client and subscribes to the correct reds channels but the docs then immediately get removed from the client. This only happens on subscriptions using peerlibrary:reactive-publish.

evolross · November 5, 2018, 6:59pm

We stopped using reactive-publish a while back. We found it better to just write publications that need to join things manually using this.added, this.updated, and this.removed in the publish function. More here Publish and subscribe | Meteor API Docs and Google around for more examples. It’s easy enough to write your own detailed, optimal publications. But there will be more code.

Once you master the above though, you can write any kind of publication you want with a ton of flexibility.

imagio · November 5, 2018, 8:36pm

@diaconutheodor Looking at the DDP messages it seems there must be a bug in that branch. I get DDP added messages for each item in my publication followed by a ready message followed immediately by a remove message for each document that was just added.

I also added an observeChanges to the cursor I am publishing and the added function gets called for each doc but the removed doesn’t. This indicates to me that redis-oplog is sending remove events to the client when it should not be doing so.

This behavior only shows up when I switch to this branch of redis-oplog. Everything works fine (regarding this behavior) on the master branch or with redis-oplog disabled. I’m still digging to see if I can figure out what is going on but it is taking me some time to understand all the redis-oplog code.

Edit: After some more poking around I can’t find where the DDP remove messages are coming from! I added logging to all of the remove operations in redis-oplog and they aren’t flowing through there. I guess I have to dive inside meteor to figure this out.

Edit2: OK so I have figured out that reactive-publish is removing the docs in its computation.afterRun callback, but I’m not sure why yet.

Edit3: The plot thickens. reactive-publish is removing the docs because upon first run of the publication it appears to be missing a tracker context. I tried removing the Tracker.nonReactive from redis-oplog but this didn’t do anything. Both redis-oplog and reactive-publish overwrite MongoInternals.Connection.prototype._observeChanges… perhaps they are stepping on each other?

Edit4: Finally figured it out. It was the extension of _observeChanges happening out of order. Because I was importing and manually calling RedisOplog.init it was always getting set up after reactive-publish. Whew. That took me way too long to figure out!

imagio · November 6, 2018, 3:33pm

Things seem to be working better now. My production server is still hitting the mongo oplog however… With disable-oplog will meteor still use the mongo oplog for some things?

evolross · November 6, 2018, 5:49pm

Not sure, but you should be able to safely comment out the oplog URL in your settings if you’re using disable-oplog

imagio · November 6, 2018, 5:53pm

I was considering doing that but I would like to figure out why the oplog is still being read.

I have had to roll back in production. Something is still wrong. Users were getting 60k ms+ response times. Lots of wait and DB time in Kadira that doesn’t happen without redis-oplog… Gotta do some more digging.

imagio · November 7, 2018, 10:54pm

@diaconutheodor can you clarify a couple of things about redis-oplog for me please?

The optimistic UI docs aren’t really clear to me. How is data flow affected with optimistic: false?
Will the UI become inconsistent if I set optimistic:false and still have some methods that run on both the client and the server (all of my data mutation is in methods), or will the only outcome be possible flickering of UI until server state overwrites client?
I can’t quite figure out how to use {pushToRedis: false} and then send the updates to the client later. Can you provide an example of a meteor method where a bunch of data is modified with {pushToRedis: false} and then the updates are sent at the end of the method?
I have disable-oplog added to my project but I’m still seeing traffic hitting getmore oplog.rs on my mongo instance. Any idea where this might come from? Is it normal?

Thanks!

ramez · November 8, 2018, 6:05pm

@evolross, we never cease learning! Thanks so much for this. We just started pulling out reactive-publish and are already seeing the positive effects (speed, less needless recomputes, less redis-oplog errors, better control).

The needless recomputes were killing us! Now we just added 20% more users to our servers (mind you, lots more code and testing before we push to prod)

imagio · November 8, 2018, 7:03pm

@evolross can you provide an example of how you write your custom publications? I’m not sure I understand how manually adding/updating/removing would be more efficient than a well written reactive-publish.

veered · November 8, 2018, 8:08pm

Qualia rolled out redis oplog last night and our average db query execution time went from 300ms -> 3ms… And that’s without using any of the special redis oplog features. Thanks @diaconutheodor!

We run a multi-tenant architecture, so we went from having thousands of open oplog cursors to <10 (will be zero soon). So deploying redis oplog was particularly transformative for us.

imagio · November 8, 2018, 9:17pm

Wow @veered that’s an awesome improvement! I have been trying redis-oplog out on my beta servers for a couple of days but it is actually slower than normal oplog for us. I’ve fixed some issues with my reactive publishers but method response time is still slower than normal oplog. It also hasn’t helped with my servers locking up when the CPU load gets too high… If I get more than 70 users on a pod the CPU usage suddenly jumps to 100% and stays there until the pod is killed by my health checker. I’ve got some more debugging to do haha.

What version of meteor and node are you using? I’m thinking that node 8.12 might be a problem per this issue https://github.com/meteor/meteor/issues/10216. I initially upgraded to 8.12 because I thought I was running into the fiber explosion problem, but perhaps something else is going on.

veered · November 8, 2018, 9:22pm

We’re on Meteor 1.6 and Node 8.12.

Redis oplog didn’t have any impact on performance when deployed to a single server. But when we deployed redis oplog to all of our servers the load on the db decrease dramatically. And that’s why we had such massive improvements.

Edit: It’s worth noting that each server is connected to a different db and a different Redis server. Our performance improvements were because we no longer needed to have open oplog cursors on the db.