Real-time not available with horizontal scaling

markoshust · April 7, 2016, 7:11pm

I’ve chosen to move forward with deploying Meteor on Kubernetes, mainly for the uptime advantage across multiple pods + nodes + etc. However, I’ve recently stumbled on an issue.

The issue draws from the information in this article: https://www.discovermeteor.com/blog/scaling-meteor-the-challenges-of-realtime-apps/

Specifically:

At this point, it’s important to remember that while data changes will usually happen through the Meteor app itself, it’s also possible for external data changes to occur. These could be triggered by another instance of the same app (when scaling horizontally), a different app altogether, or even manual changes to the database through the Mongo shell.

So every 10 seconds (or whenever the server thinks that the Posts collection may have changed), the LRS will re-run the Posts.find() query against Mongo (with a corresponding hit to your Mongo server).

This is a big issue, as I want to deploy any number of pods. I’m currently running 3 nodes on Kubernetes, and a total of 3 pods. This issue is easily replicable when I scale out to a large number of pods (ex. 15).

One client (client A) connects to one pod, and another client (client B) connects to a different pod. However, the results made by client A are not immediately reflected on client B. This defeats the purpose of using Meteor and the oplog for me, as real-time is needed.

There could be a possible solution, to direct all clients in on area/physical location to the same pod, however I’m not entirely sure I can currently do that with Kubernetes. Any clues?

Cheers,
Mark

sashko · April 7, 2016, 11:17pm

You should read the rest of the article, that talks about the new system called oplog tailing, specifically designed to counter the issue you quoted.

babrahams · April 8, 2016, 1:13am

If you’re seeing a lag of up to 10 seconds in the situation you’ve described, make sure your mongo deployment is running as a replica set and that you’re starting your app instances with the MONGO_OPLOG_URL environment variable set correctly.

Real-time is definitely available with horizontal scaling.

markoshust · April 8, 2016, 2:56pm

@sashko @babrahams that’s the thing, it appears oplog tailing is all setup properly. How can I confirm the oplog is working correctly? I’m using kadira and it’s showing oplog notifications.

When I scale down to one pod (vs 3 or 15), everything syncs instantly. When I scale to something like 15 to be able to replicate the issue nearly every time, I receive the delay. It does appear that oplog isn’t working, however I’m showing oplog notifications through Kadira and when everyone is connected to one pod, everything syncs instantly.

FYI, I am running a MongoDB replica set, and MONGO_OPLOG_URL is set.

babrahams · April 8, 2016, 3:10pm

Do you have local as the database for the MONGO_OPLOG_URL, and not the name of the database your data is stored in?

For example:

MONGO_URL=mongodb://127.0.0.1:27017/myDbName?replicaSet=myReplSetName
MONGO_OPLOG_URL=mongodb://127.0.0.1:27017/local?replicaSet=myReplSetName

markoshust · April 8, 2016, 3:17pm

Yes, here are my conn string examples:

MONGO_URL=mongodb://123.456.789.0:27017,123.456.789.1:27017/mydbname?replicaSet=rs0&readPreference=primaryPreferred&w=majority
MONGO_OPLOG_URL=mongodb://123.456.789.0:27017,123.456.789.1:27017/local?authSource=mydbname

I believe on the oplog url you need to set authSource to the db name, and replicaSet param is unneeded.

markoshust · April 8, 2016, 4:41pm

@sashko @babrahams i’ve confirmed oplog is working & setup properly. installed the facts package and here is the output:

I have observe-drivers-oplog values, and no observe-drivers-polling records. Something else is going on here.

info from: Oplog Observe Driver · meteor/meteor Wiki · GitHub