Meteor/Mongo/DDP crashes - stays in limbo, no data shipped to clients anymore

So today a very weird error occurred to me for the first time.
Following steps happened prior to the error:

  1. Due to a very cpu-intense operation of importing and matching 10.000 images into my osrtio:files collection, I did exactly that on my local machine. So I first removed all items from my local images collection, imported all the files and uploaded them to my AWS S3.
  2. I exported my images collection via mongodump to images.bson.
  3. I copied the bson file to my production server and mongorestore'd the file - without --drop of course.
  4. I ran my “image matcher” on my production server, which matched my 10.000 images to several items in another collection. … now the weird stuff started:
  5. after the matching had ended, the CPU on the server stayed at 100% for to the node process for about 1-2 minutes. no idea why.
  6. from now, the whole app started to behave very weird: at some point the connection to the MongoDB seemed to completely crash, the node process spiked at 100% several times, no client could receive data from mongo anymore. Some client threw errors looking like this one.
  7. I tried:
    -) restarting both my docker containers.
    -) restarting my mongod process.
    Both did not fix my problem, after several seconds or maybe one minute the whole thing “crashed” again.

What fixed it in the end, was mongorestore --drop-ing my items and my images collection, so the images was missing my additional 10.000 images, and about 10.000 items of my items collection were without a imageUrl attribute again - that’s all.

I just don’t understand … all so so weird. I feel like it may has something todo with indices or some mismatch behind the scenes of what ostrio:files package expects to be in the images collection or something like that.

Fun-fact: Neither the node processes, nor the mongod process threw ANY errors. Nothing.

Any ideas on this?

best, Patrick

ping @dr.dimitru
Any idea? Are there maybe extra collections that need to be copied?

I can now reproduce this on my dev-machine. The node process spikes at 100% for many many seconds and thus makes the app compltely unuseable for all users for this period of time. Any idea on how to find out what causes this slowdown?

Okay so I think I found my bug … turns out it was all my fault:
Due to some very old code, I was subscribed to the whole images collection in my admin view. Due to 10.000 new images after the import and not waiting for this subscription be ready the app behaved like this afterwards:
For some reason, entering the admin (React) view wasn’ t a problem but everytime I left the view again and navigated somewhere else, the server spikes at 100% cpu for about 20 seconds.

@ MDG: Any chance of implementing a warning or something when this happens? There are no errors or anything whatsoever, so receiving a warning when subscribing to a huge collection (by accident) would be awesome! :slight_smile:

1 Like