Meteor recovery from errors

Hi,

Was hoping someone could shed some light on issues we’ve seen in our app. We are using a micro service architecture currently with mongo running in an AWS cluster, Meteor 1.2.1 and React front ends. When Mongo has issues such as failure between replicas, the front end will go into a bad state where all users can not perform one or all functions. Errors about the replica issues appear in Kadira but the front end will never recover until all the services are restarted.

As an example, today the Virginia AWS data center had issues. We were still down after the issues resolved themselves until we restarted the services. One service that writes to mongo was having some error will a null object. I know this is vague but we have seen all sorts of stability issues I was hoping someone could point me in the correct direction or shed some light on DDP known issues? Currently we can’t even trap if we are hosed as there is no ability to put any type of monitoring on the front end to catch it.