Every morning our app refreshes ~30k+ (~30MB JSON) documents (and this can continue to grow).
It gets a lot of data per each doc from API reqs
2. Does a huge bulkUpsert using rawCollection
This has recently caused a Mongo stack overflow error upon attempting the bulkUpsert. I have suspicion to believe it’s because some users have pages open to the app that have some inefficient subscriptions to some of these 30k+ docs and that these things happening in parallel are causing server to crash.
My exact error is
/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:691
throw error;
RangeError: Maximum call stack size exceeded
at ServerSessionPool.acquire (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sessions.js:623:12)
at ClientSession.get serverSession [as serverSession] (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sessions.js:113:47)
at /var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sessions.js:148:19
at maybePromise (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:685:3)
at ClientSession.endSession (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sessions.js:130:12)
at Cursor._endSession (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:392:15)
at done (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:448:16)
at /var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:536:11
at /var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:688:9
at /var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/operations/execute_operation.js:82:7
at maybePromise (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:685:3)
at executeOperation (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/operations/execute_operation.js:34:10)
at Cursor._initializeCursor (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:534:7)
at Cursor._initializeCursor (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/cursor.js:186:11)
at Object.callback (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:439:14)
at processWaitQueue (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sdam/topology.js:1049:21)
at NativeTopology.selectServer (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/sdam/topology.js:449:5)
at executeWithServerSelection (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/operations/execute_operation.js:131:12)
at /var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/operations/execute_operation.js:70:9
at maybePromise (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:685:3)
at executeOperation (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/operations/execute_operation.js:34:10)
at Cursor._initializeCursor (/var/app/current/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/cursor.js:534:7)
Just wondering if there is any potential quick fix here by beefing up either the Meteor Server and/or MongoDB Server so that it can handle the above situation (if in fact that is the error)?
Otherwise, I’m thinking of just delegating this morning sync to a separate server specifically made for running SyncedCron jobs. Any insights and recommendations on easiest way to do this? I’d rather not rewrite any code, but all I really need is my SyncedCron module and having the server point to the same remote Mongo instance (there’s just 1). I don’t need a full copy of the app serving client-code of course. I still want to use Meteor Server as I’m relying on proprietary Meteor code, collections, etc…
Otherwise, I’m thinking of just delegating this morning sync to a separate server specifically made for running SyncedCron jobs
My thoughts exactly, I think this will definitely help alleviate the problem. As to how you should do it, I’ve not done this before but maybe look into Serverless as it can help reduce the costs? Maybe something like this but instead would fetch and insert into the DB. But I think there’s still some investigation to do before rushing to solutions.
You’ve mentioned that there’re subscriptions that are draining your app and causing it to overload.
I have suspicion to believe it’s because some users have pages open to the app that have some inefficient subscriptions to some of these 30k+ docs and that these things happening in parallel are causing server to crash.
Did you try to cut down on the number of documents that gets fetched/updated to see if it runs smoothly? What subscriptions specifically? Honestly if real time isn’t crucial to your business model, replacing it with methods is the way to go. Many Meteor applications must do this in order to scale as pub/sub is an expensive that shouldn’t be used when it’s not needed.
I agree with you wholeheartedly that investigation should be done before rushing. There’s a lot to be refactored with the inefficiency of these subscriptions and so I was kinda hoping for a quick temporary fix.
Moving away from pub/sub to using methods makes sense, pub/sub isn’t really needed in lots of places in the app but I inherited this codebase. Tough to refactor everything being the only dev. But…one thing at a time.
I love your idea about serverless architecture to reduce costs. I’ll have to look into it. We’ve adopted an AWS stack and I’d like to keep whatever solution I go with in there.
ddp-health-check seems cool, although I’d have to play around with it to see if it’ll work well with my problem.
I don’t mind setting up another Meteor instance just to handle these syncs. Challenge is I just want a fraction of my whole codebase (not client code)…but I’d need some Collection classes, helpers, etc… to prevent any sort of refactoring
Make a node script (or any language you want) outside of the app that handles your data functionality, do not use upsert the performance is not good at all.
For mongo (and really any database engine) You must use insert and then delete the old rows using a indexed timestamp in order to work with large datasets of over 1 million records (even more then 100k you’ll not want to be doing updates)
Updates (upserts are updates) require two queries to execute for every command, first a select to get the row and then the update itself. To reduce this you must insert the data and delete the old. On the display layer have a conditional where it only selects the newest and voila you have a working solution good for as many rows as you require (into the billions)
Are you doing these 30k operations in a single bulk insert? A trivial fix might be to batch these into say 10k item batches. I’d be a little surprised if the observers cause a stack overflow, in general the observer is either iterative. Or asynchronously recursive, which I don’t think can cause a stack overflow since it isn’t true recursion (this might be different when combined with fibers though). If it was the observer id expect long running meteor servers to get these stack overflow errors more often
It’s worth noting that even just arr.push(...[lots of items]) can trigger a stack overflow, so anything that behaves like that (which bulk operations might) could cause that
Similarly, you may find you need this even without the stack overflow, I’m not 100% sure how the mongo driver handles bulk operations internally, if it gets serialized into a single mongo command you’ll be limited by the 16mb bson document limit.
I put all 30k into a bulkOperation and execute it in 1 bulk.execute. Mongo says that it automatically chunks per 1k docs though.
That being said, I do have some code away that manually chunks and executes separate bulk.executes’ I haven’t tested this though.
My theory is that it’s something tandem with the observers and the execute being ran simultaneously.
Please look through the stacktrace again. I put some logging in some of those files and verified that it was continusoly happening (as opposed to just once for the single bulk.execute call). That’s why I’m leaning on it being due to Observer.
From personal experience, it can’t handle over 1k in bulk even with a 16 core overclocked gaming rig with 32gb ram on freebsd it was still going into loads of over 10
But if you just do insert and then remove, doesn’t even get warm barely notices it
Nope Nothing crashes just the load is insane, although my update is across 10 fields and I match on one indexed key it’s just take my loads up to over 10, I need to update a growing collection of 1.4m records every 33 seconds from a multithreaded process, so batching into 1000s was the sweet spot after alot of testing.
I use the bulk.update syntax though directly and don’t do the find, it’s slightly different, not sure if that affects it. With bulk.update you can add each update into a bulk object then call execute at the end of iteration. Mongos internal system defaults to split down operations to 1k per operation so that’s probably why it seems to handle it fastest when I have it set at 1k batches. Then it runs smooth with the rest of the process that’s doing the acquisition part for the next batch. End result is load is now 0.1 huge difference and means we can run everything on one server including the actual App and API
In Linux your Load average is displayed in top, just run top at shell to get the output and it’s on the first line, here’s an article with more info. I monitor in top mainly or at dns level or with rrd in cacti. My app is meteor, react and has a API which powers a plug-in that gets around 120k daily uniques
Do you actually shell into your Prod server and run top? What do you mean by monitoring at the DNS level?
Fwiw, I monitored top while hitting the most expensive subscription and saw that it was actually the cpu% for the node process that spiked from like 0.7% to 100+% (across 2 cores though). Mem didn’t really exceed over 15%. So…seems like cpu could actually be the issue.