Meteor Scaling - Redis Oplog [Status: Prod ready]

msavin · October 29, 2017, 3:33pm

Yeah I’ve spoken to the people at the team. IIRC, they used Meteor on the server some time until they could not longer do it. Either way, it’s a common story. I do have an exception story, coming soon

I do think you have a good point though - that a lot of the issues with Meteor have been fixed. We even have code splitting for Blaze. Now if only we can fix the data layer.

diaconutheodor · October 30, 2017, 5:57am

Running on small apps you will simply not see any benefits. That’s it, it makes no sense to add it if you don’t have a lot of traffic.

@alawi
Imagine a cloud with 20 meteor instances running in a load-balanced fashion, but every change you do in the database is sent to all instances (and it gets processed). Now imagine 100 instances and high traffic, there is a limit until you will no longer be able to scale at all, and your instances will be at 100% CPU all the time, and the network bandwidth also. RedisOplog is a solid solution for those problems, making use of namespaces and custom channels.

@msavin what’s wrong with the data layer ?

alawi · October 30, 2017, 7:32am

@diaconutheodor

Thank you for clearly explaining the scaling limitation. So given what you stated and the Github discussion on the Change Stream limitation, it seems that if the diffing logic is placed on the application server it will choke at large scale (100+ instances) and if it’s placed in the DB then it might impact other operations even if the current Change Stream limitation is resolved. Therefore, given the developer control over how to deploy and manage those channels seems like the right way to go.

It seems to me RedisOplog is the right way to go, even with Change Streams, don’t you think?

Also what do you think about the idea of having “Reactive Methods”? basically instead of defining publications at the server, we can define a triggers (which could be implemented using a Redis server) and then those triggers will trigger a methods at the client with a trigger parameter? It’s just like passing an event via socket to trigger a refetch but done the Meteor way…

diaconutheodor · October 30, 2017, 8:10am

@alawi

ChangeStreams, as they were explained, should have solved many of our problems, but they don’t. However, if you listen to only one document by “_id” it makes it efficient enough to scale (Oplog implementation does this also, for _id or array of _ids it’s super efficient), but is that the only way you need reactivity ?

Adding such an event using Redis, is absolutely trivial, and it was already discussed with me and my team, something like:

Prototype API:

// server
RedisOplog.listeners({
    'users'() {
         this.on('custom.event.string', data => {
               this.push(data);
         }
    }
})

// somewhere on the server
RedisOplog.emit('custom.event.string', data);

// client
const handle = RedisOplog.on('users', function (data) {
   // do something
});
// handle.stop()

Using the simple thing I just showed you above, really gives you the flexbility to do anything you want.
And you can also communicate changes outside of RedisOplog and Meteor can be reached:

redis.dispatch('redis.oplog.custom.event.string', 'EJSONstring');

As you can see I have big plans for RedisOplog, now that I know for a fact ChangeStreams won’t solve scaling issues.

alawi · October 30, 2017, 8:23am

Wow thanks! I knew you guys won’t miss something like that

Looking at all the options out there I personally think Redis Oplog is the right approach for Meteor at scale. Thanks again for taking this initiative.

diaconutheodor · October 30, 2017, 3:25pm

No worries, the problem is simple, but we made it very difficult to understand, because we dug deep into implementation details.

RedisOplog aims to emulate the DDP (The reactivity layer) through in the following way:

Move to NPM
- Implement a smart event manager like described above
Create a bridge NPM -> Meteor’s DDP, same package name. (redis-oplog)
Make it as efficient as possible
Make it compatible with ANY TYPE of database
Easily integrate-able in Apollo

The big thing this time is, I want to involve more community members:

If you have time, please help by filling out this form:

It will only take you 1 minute (maximum) and it will help a lot!

ramez · October 30, 2017, 8:44pm

We are an exception to the story too (classroomapp.com), we stuck (and are sticking) with Meteor all the way and are scaling up aggressively with servers in US, Europe and soon in Middle East, tens of thousands of users simultaneously.

We are planning our migration to RedisOplog soon too as we see it as the only viable way to scale up (no opinion yet on Mongo change stream).

diaconutheodor · November 1, 2017, 12:40pm

Updated to 1.2.1

RedisOplog is now more stable and better tested than ever before!

I tried to clear all the bugs I could identify, the rest are either irreproducible by me or I don’t have enough information.

In this release we managed to remove the following dependencies:

No longer rely on a custom publishComposite package for optimistic ui (Thanks @mitar for publish-context)
Removed ‘sift’ npm package
Removed ‘lodash.cloneDeep’ npm package

Based on the document above (out of 10 responses), 75% want to contribute financially, and 50% have time to offer.
Thank you very much for you openness to contribute, if you want to contribute at this moment: just set it up and try to break it!

Cheers.

martineboh · November 1, 2017, 1:09pm

Thank you @diaconutheodor and fellow contributors for this great work.

ixdi · November 2, 2017, 3:35pm

“This cursor does not have a _cursorDescription field. Observe changes will work unex”

Does anyone know what this server log refers to? We are using Galaxy with Meteor 1.6 and redis-oplog.

This warning appeared yesterday after updating the redis-oplog package to the last 1.2.1_1, and it repeats constantly, but we did not touch the code. This is the reason why we think it comes from this package.

Thanks

diaconutheodor · November 2, 2017, 3:53pm

@ixdi indeed I added that warning in 1.2.1. I don’t know why, but I think it’s related to Meteor’s clientVersions, it’s a different type of collection. It’s not something to worry about right now. I will remove that warning in the next version. Can you reproduce this locally ? And find out from where is this coming ? That would be very helpful.

Btw, do you have any metrics how it compares with non-redis-oplog ?

Everyone:
Stay tuned, I have a huge surprise ready. It’s going to make redis-oplog even more performant!

ramez · November 2, 2017, 5:23pm

@diaconutheodor, I am getting that same message. I am having a hard time tracing it. Do you want me to try seeing where it comes from?

diaconutheodor · November 2, 2017, 7:13pm

@ramez @ixdi this only happens in prod mode, and it’s because of autoupdate package, which listens to a local collection. Nothing to worry about just removed the warning, all should be fine.

Updated to 1.2.2

Some minor fixes and…
A huge stepping stone for redis-oplog, because it finally solves:
https://github.com/cult-of-coders/redis-oplog/issues/182 while not having race conditions.

dirkgently · November 3, 2017, 12:43am

I think the problem with Meteor is understanding whats going on with pub/sub, performance is hard and not very obvious, when your app is no longer trivial. We use Kadira and often see queries that are talking way too long that were fine before, with long wait times etc, and no clue why its happening.

So not only is it hard to tell whats wrong and how to fix, its also hard to tell if RedisOplog will be of help. Meteor badly needs performance counters and explain like in mongodb, and some guidelines.

ramez · November 3, 2017, 2:24am

@dirkgently
You are definitely right there is a black box effect when things get too complex with too many things happening asynchronously.

However, it is well understood how oplog trailing is a drag on performance when scaling. The theory says it, and in practice we have experienced it and so did many. Now not all apps are meant to be used simultaneously by many with a lot of reactive data. But those that have that need, will see benefits with redis oplog.

dirkgently · November 3, 2017, 2:34am

Yes no doubt redis-oplog will help. I just added this to our project and a simple test (with verbose turned on) shows some exceptions (TypeError) which I’m guessing were happening in Meteor before but not being shown.

diaconutheodor · November 3, 2017, 5:56am

TypeErrors isn’t something that should happen, can you show me a stacktrace ? Pls file an issue on github.

Regarding Meteor performance metrics, once you get rid of the mongodb oplog, you’ll see that your app behaves more predictable.

dirkgently · November 3, 2017, 7:46am

I see something like this -

Exception from task: TypeError: Cannot read property 'name' of undefined
I20171103-07:35:40.253(0)?     at PublicationFactory.getPublicationId (packages/cultofcoders:redis-oplog/lib/cache/PublicationFactory.js:53:52)
I20171103-07:35:40.253(0)?     at PublicationFactory.create (packages/cultofcoders:redis-oplog/lib/cache/PublicationFactory.js:21:23)
I20171103-07:35:40.254(0)?     at cursors.forEach.cursor (packages/cultofcoders:redis-oplog/lib/publishWithRedis.js:41:40)
I20171103-07:35:40.254(0)?     at Array.forEach (<anonymous>)
I20171103-07:35:40.254(0)?     at PublicationFactory.queue.runTask (packages/cultofcoders:redis-oplog/lib/publishWithRedis.js:39:21)
I20171103-07:35:40.254(0)?     at runWithEnvironment (packages/meteor.js:1188:24)
I20171103-07:35:40.254(0)?     at Object.task (packages/meteor.js:1201:14)
I20171103-07:35:40.255(0)?     at Meteor._SynchronousQueue.SQp._run (packages/meteor.js:819:16)
I20171103-07:35:40.255(0)?     at packages/meteor.js:796:1234:

and also

 Exception from sub sub_name id 9TfYyCWT2gDTmA3G6 { stack: 'TypeError: Cannot read property \'name\' of undefined\n  
at PublicationFactory.getPublicationId (packages/cultofcoders:redis-oplog/lib/cache/PublicationFactory.js:53:52)\n    
at PublicationFactory.create (packages/cultofcoders:redis-oplog/lib/cache/PublicationFactory.js:21:23)\n    
at cursors.forEach.cursor (packages/cultofcoders:redis-oplog/lib/publishWithRedis.js:41:40)\n    
at Array.forEach (<anonymous>)\n    at PublicationFactory.queue.runTask 
(packages/cultofcoders:redis-oplog/lib/publishWithRedis.js:39:21)\n    
at runWithEnvironment (packages/meteor.js:1188:24)\n    at Object.task (packages/meteor.js:1201:14)\n    
at Meteor._SynchronousQueue.SQp._run (packages/meteor.js:819:16)\n    at packages/meteor.js:796:12',

I looked at the publish fn for sub_name and it doesn’t use anything called ‘name’. This is the redis-oplog code -

github.com

cult-of-coders/redis-oplog/blob/b3de52786ba6b472bad56bf2739661de9159c6ea/lib/cache/PublicationFactory.js#L51


 */
remove(id) {
    this.store.remove(id);
}


/**
 * Gets an unique id based on the cursors selector and options
 * @param cursor
 * @returns {string}
 */
getPublicationId(cursor) {
    if (!cursor._cursorDescription) {
        return 'special::' + cursor.collection.name;
    }


    const description = cursor._cursorDescription;


    const selector = description.selector || {};
    const options = description.options || {};


    let collectionName = description.collectionName;

I’m wondering why that’s throwing an exception. Is it somehow a blank collection that shouldn’t be in my code?

diaconutheodor · November 3, 2017, 7:47am

Show me your subscription please.

dirkgently · November 3, 2017, 7:51am

Meteor.publish('sub_name', function(taskId) {
    return new Counter('count_' + taskId, Tasks.find({
        taskId:taskId
    }));
});

Counter is from here - https://github.com/nate-strauser/meteor-publish-performant-counts