What happens to Meteor during a MongoDB failover event?


#1

I’m connecting my Meteor app to a 3-member replica set: one primary and two secondaries. I have readPreference=primaryPreferred and w=majority in my MONGO_URL (following these guidelines).

If the primary goes away, will the app automatically start sending writes to whichever secondary is elected primary? How long might this take? Up to a minute? Could I lose data during the failover?

Can Meteor do anything with secondaries?

My MONGO_URL includes all three set members. Does the order of these three hosts matter?

Failover seems unlikely, but I just saw it happen when I used the MongoDB Cloud Manager to upgrade my replica set to 3.0.6.


#2

I’m also interested in this as I also just updated my mongo version (to 3.0.8). The process causes the primary to failover, it updates the primary and then restarts it. During the process my logs filled up with exceptions from observe. Not finding the primary is hardly cause for an exception is it ? What happens if the primary fails for any long period of time ?
In my case I’m using Kadira so not sure if their wrapper around it has anything do with it. Certainly doesn’t look like it.

Internal Server Error - Exception while polling query {“collectionName”:“drinks”,“selector”:{"id":“sPwqQevE4EPgs6xLb”},“options”:{“transform”:null,“limit”:1,“fields”:{“createdAt”:0,“updatedAt”:0}}}: Error: No replica set primary available for query with ReadPreference PRIMARY at Object.Future.wait (/opt/emp/app/programs/server/node_modules/fibers/future.js:395:16) at [object Object]..extend.nextObject (packages/mongo/mongo_driver.js:986:1) at [object Object]..extend.forEach (packages/mongo/mongo_driver.js:1020:1) at [object Object]..extend.getRawObjects (packages/mongo/mongo_driver.js:1069:1) at [object Object]..extend.pollMongo (packages/mongo/polling_observe_driver.js:147:1) at [object Object].proto.pollMongo (packages/meteorhacks_kadira/lib/hijack/wrap_observers.js:63:1) at Object.task (packages/mongo/polling_observe_driver.js:85:1) at [object Object]..extend.run (packages/meteor/fiber_helpers.js:147:1) at packages/meteor/fiber_helpers.js:125:1 - - - - - at [object Object].ReplSet.checkoutReader (/opt/emp/app/programs/server/npm/npm-mongo/node_modules/mongodb/lib/mongodb/connection/repl_set/repl_set.js:619:14) at Cursor.nextObject (/opt/emp/app/programs/server/npm/npm-mongo/node_modules/mongodb/lib/mongodb/cursor.js:748:48) at [object Object].fn [as synchronousNextObject] (/opt/emp/app/programs/server/node_modules/fibers/future.js:89:26) at [object Object]..extend.nextObject (packages/mongo/mongo_driver.js:986:1) at [object Object]..extend.forEach (packages/mongo/mongo_driver.js:1020:1) at [object Object]..extend.getRawObjects (packages/mongo/mongo_driver.js:1069:1) at [object Object]..extend._pollMongo (packages/mongo/polling_observe_driver.js:147:1) at [object Object].proto.pollMongo (packages/meteorhacks_kadira/lib/hijack/wrap_observers.js:63:1) at Object.task (packages/mongo/polling_observe_driver.js:85:1) at [object Object]..extend._run (packages/meteor/fiber_helpers.js:147:1)


"Exception while polling query" with MLab and Meteor 1.4
#3

+1 I was also wondering what happens in such a case, I’m running a replica set on Compose.


#4

My Meteor apps now connect to MongoDB 3.2.1. Using the Wired Tiger storage engine. 3-member replica set. No sharding. Each replica set member is an AWS EC2 instance. I monitor the replica set using MongoDB Cloud Manager.

I did some tests during low traffic. Failover seems to take a few seconds. I see some of these errors in my Meteor app logs:

Got exception while polling query: MongoError: not master and slaveOk=false

These errors abate after ~10 seconds from the time of failover.

I don’t have answers to the other questions in my original post.


#5

+1. I am running MongoDB 3.2 on Compose which gives 2 mongos router with no replicaSet and handling failover does not seem to be possible with Meteor at the moment.


#6

Also having problems with this on mLab, anyone managed to make the failover work?


#7

Did anyone ever get to the bottom of this? I get hit by this occasionally and have to restart Meteor so that it can reconnect to the database which is far from ideal.


#8

I’m using Atlas. When AWS was unstable and for some reason we kept losing connection with the primary, mongo elected one of the secondaries. This happened sometimes many times every 10 minutes.

I’ve seen that in the server log sometimes I would have an error with something like: “Connection 15 with DB lost”.

In general the app worked just fine, but there were two cases that I had to kill the container. It was unresponsive. It was online, but you couldn’t use the app.

Overall, it worked just fine.

PS: No sharding, 3-member replica set.


#9

I’m trying changing our database connection string on our test server so that it doesn’t use a replica set. We blocked access to Compose and saw the usual timeout errors appear in the logs and when we unblocked the IP address the Meteor application reconnected to the database.

It’s not an ideal situation because occasionally Compose do have failovers but at least I get emailed when that happens versus, for example, Linode having a brief outage which breaks the app silently.

I’ll be doing more testing before we apply this to the production system but it seems like a good compromise until a proper solution is (hopefully) found.