Does anyone know how I would tell if a Galaxy container hits 100% for just a second or so?
I have a real-time game app with thousands of users and I’m getting what appears to be some dropped DDP messages, where players say their app doesn’t get updated and goes out of sync. It only happens for a few players, very intermittently, and I think I’ve traced it to when my app has a lot of usage when thousands of players all hit the servers at the same time - when joining a game. The CPU gets spiky but Meteor APM and Galaxy never show it hitting 100%. It’s usually hanging around 15% CPU usage then with quick spikes to 40% - 60%. Never even close to 100%.
Galaxy and Meteor APM show a general CPU usage chart but I don’t think it has the fidelity to show if your container peaks just momentarily. For one, the server gets so busy it probably doesn’t register itself at 100% and 2) the chart doesn’t have the detail to show it for just a few seconds even if it could.
The same goes for Galaxy’s charts. Even on the 5m setting it looks like it samples every five seconds. Which could miss the CPU being momentarily at 100%. And it’s usually only way after the fact that I hear about issues, thus missing the five minute window to view any CPU peaks in Galaxy.
It feels like we need some better tools here. Is there any tools for this to get a true real-time CPU usage? I don’t think you can remote into a Galaxy container right? To use command line tools.
Anyone had any experience with Meteor seeming to drop DDP calls? I’m not sure if the problem is the server not sending them or the clients not receiving them, or both. I use
redis-oplog and I’m pretty sure the problem isn’t Redis as its CPU and activity are very low, even in peak times.
I was wondering, if this ends up being the problem that my temporary CPU spikes cause the servers to drop DDP messages, thus breaking my app for a few users momentarily, would switching to raw AWS and using their CPU burst-able containers be an approach to solving this? Does Galaxy perhaps already use these container types?