Unknown problem affects availability


#1

My meteor app is deployed in a EC2 instance (t2.nano) using mupx.
It has been running just fine since July, but one day in December, it just stopped working, I’ve restarted the instance, and it worked again.

Two weeks later, the same thing happened, restart instance.

But this sunday, it stopped working a lot of times, and even after I restarted the instances, the problems occured after a few minutes .The app did actually get a big amout of traffic compared to the other days, but I don’t believe it’s the problem, because I created another EC2 instance (t2 micro), with the same app, and imported the same database, and the app showed signs of slowness: the website loads slowly, and the data isn’t available until 30 seconds of so after (sometimes more than a minute) .

When I checked the mupx log, I found that this problems is repeated a lot:

`Exception while polling query {"collectionName":"products","selector":{"$or":[{"type_id":{"_str":"a2e73bad917429a782169256"},"brand.id":{"_str":"a734c999ad5dae4954faa3ff"}},{"user_id":"zJt64DR7ijEGnPtMs"}]},"options":{"transform":null,"sort":{"createdAt":-1}}}: MongoError: connection 15 to mongodb:27017 timed out
    at Object.Future.wait (/bundle/bundle/programs/server/node_modules/fibers/future.js:449:15)
    at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1024:47)
    at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1058:22)
    at SynchronousCursor.getRawObjects (packages/mongo/mongo_driver.js:1107:12)
    at PollingObserveDriver._pollMongo (packages/mongo/polling_observe_driver.js:152:48)
    at PollingObserveDriver.proto._pollMongo (packages/meteorhacks_kadira.js:2985:23)
    at Object.task (packages/mongo/polling_observe_driver.js:90:12)
    at [object Object]._.extend._run (packages/meteor.js:807:18)
    at packages/meteor.js:785:14
    - - - - -
    at Function.MongoError.create (/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/error.js:29:11)
    at Socket.<anonymous> (/bundle/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:176:20)
    at Socket.g (events.js:260:16)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at Socket._onTimeout (net.js:333:8)
    at _runOnTimeout (timers.js:524:11)
    at _makeTimerTimeout (timers.js:515:3)
    at Timer.unrefTimeout (timers.js:584:5)`

My best guess is, there is a problem with the connection with the MongoDb
What do you think about this case? How can I figure out where the problem comes from and how to solve it?

A few helpful information:

  • The biggest collection in the DB has ~40K documents, but each user is subscribed to only about 200
  • Sometimes, I notice the docker container of the app (not mongodb, but the app) restarts on its own

"Exception while polling query" with MLab and Meteor 1.4
#2

Check if your collection has indexes. Use the mongodb commands to analyze the query

{"collectionName":"products","selector":{"$or":[{"type_id":{"_str":"a2e73bad917429a782169256"},"brand.id":{"_str":"a734c999ad5dae4954faa3ff"}},{"user_id":"zJt64DR7ijEGnPtMs"}]},"options":{"transform":null,"sort":{"createdAt":-1}}}

I had my site crash because of a missing index, which meant the node process had to read the entire collection into memory (or something like that)


#3

Thank you for your suggestion
I don’t think that the indexes can cause the crash, because all the documents in all the collections in my DB take only ~7 MB of the disk (I know so because I dumped the content when I was creating the dev instance)

But I guess it’s worth to look into it.
I will let you know of the results when I’m done


#4

Thanks @jamgold ! Using indexes really did make the app faster.
And after reviewing the queries in the code, I found a mistake in one of them, and it was because of this mistake that the database was overworked.

But I still have a problem understanding why the docker container of the app server restarts on its own: shouldn’t the app just became slow in case of a big query to the database?


#5

I don’t think so if the database doesn’t answer due to a big query. In this case the timeout should be fired and your Meteor app restarts.

You can try to add some timout parameters to your mongo url, but I don’t know if Meteor will override them with own values:

MONGO_URL="XXX:XXX?connectTimeoutMS=60000&socketTimeoutMS=60000"

#6

Shouldn’t Meteor handle this exception without restarting?