at first glance it looks like the mapreduce processes are being queued up faster than they’re completing. i’ll look into it in a bit more detail later in the week but my gut feeling is that the db has now grown to a size where the mapreduce process takes longer than it should and kadira might not be waiting for existing calls to finish before triggering the next…?
guess the options there would be to discard old data or beef up the db instance CPU.
Anyone have a cmd or snippet to remove all Kadira records older than X days? I tried doing a manual delete on the big collections (methodTraces etc) but that didn’t go so well and I had to nuke and setup again.
I’m sure the real Kadira had scheduled scripts for this kind of thing.
I run Kadira on a ec2 t2,small instance with 20gb disk (EBS, the default, so its not fast). We have 3-5 instances using it. After a few weeks it stops responding. I asked a qn above about pruning it. My current solution is to just nuke and restart -
docker-compose stop
docker system prune -af
docker-compose up -d
I’ll probably have time to look at this in the next couple of days, if I can figure out a clean way to prune the old records from the db I’ll post it up.
db.getCollection('prodStats').remove({"time": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('pubMetrics').remove({"_id.time": {"$lte": new ISODate("2017-10-15T00:00:00.000Z")}})
db.getCollection('pubTraces').remove({"startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawErrorMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawMethodsMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawPubMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawSystemMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rmaLogs').remove({"startedAt": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('systemMetrics').remove({"_id.time": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
Change the date to whatever you want. My instance normally gets sluggish after 2 weeks (running only 1 host) so I just run the above queries and has resolved it for me.
I started to reduce the Kadira APM projects to one Meteor project.
You only have to run meteor with a mongo replica set (standalone is fine).
So docker is not required.
The slack alerts are working and some other optimizations where made.
But still a lot of work to do…Looking for contributors.
I created a pull request on vladgolubev/kadira to integrate the AWS configuration from lampe/kadira-server - both of these are really good repos to get Kadria running.
Would be great if we can all work on a single repo with our improvements and bug fixes - I just happened to use vladgolubev/kadira first so I wanted to integrate those changes there.
With that repo I have the APM and debugger working on a complex setup (Docker Swarm behind a Nginx load balancer).
Where I am currently stuck is if I run Kadira.profileCpu(10) in the browser console of my app, I get the following error on the server:
Exception while invoking method 'kadira.profileCpu' Error: Error relocating /bundle/programs/server/npm/node_modules/meteor/meteorhacks_kadira-binary-deps/node_modules/v8-profiler/build/profiler/v5.6.5/node-v48-linux-x64/profiler.node: __sprintf_chk: symbol not found
at Error (native)
I think it’s because I’m using an Alpine meteor base and it lacks some dependencies - anyone have any insight?
Thanks for the great effort by all so far - hope to contribute further.
Hi @vladgolubev, thank a lot for your work.
I have a issue when I create a new app in kadira-gui
Exception while invoking method 'apps.create' TypeError: Cannot read property 'name' of null
at [object Object].MongoShardedCluster.pickShard (/home/meteor/www/bundle/programs/server/npm/node_modules/meteor/local_kadira-data/node_modules/mongo-sharded-cluster/lib/index.js:54:26)
at [object Object].Meteor.methods.apps.create (server/methods/apps.js:13:41)
at packages/check.js:130:16
at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
at Object.exports.Match._failIfArgumentsAreNotAllChecked (packages/check.js:129:41)
at maybeAuditArgumentChecks (packages/ddp-server/livedata_server.js:1734:18)
at packages/ddp-server/livedata_server.js:719:19
at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
at packages/ddp-server/livedata_server.js:717:40
at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
For anyone who’s running their own Kadira, this is what I’ve found - I’m using @vladgolubev’s excellent docker images without without which this wouldn’t be possible, so a big thanks.
I am also running mongodb on the same instance via docker, buying dedicated mongo on compose/Atlas gets too expensive.
The majority of resources and cpu is used by mongo (there are some huge MR jobs running all the time!). ‘docker stats’ routinely shows that mongo is using >200% cpu while the other dockers are idle. Thus I also didn’t see any perf increase by running mongo on a separate EC2 instance (which also increases costs).
We have 4-5 servers with moderate traffic. Right now Kadira is running on a t2.large/c4.medium and after a few weeks, it still crawls to a total stop and things fail to load. This is due to all the MR jobs and cpu used by mongo. The only fix is to then
remove containers and reup (which means new appid’s
delete all current data
Both are not really ideal but I haven’t found a better way. I have to wonder what the prod Kadira and Galaxy APM run on, because I don’t see any way to scale this horizontally.
We’ve been working on a version of Meteor APM that you can self host using MUP, and recently fixed some issues with missing indexes that corrected the massive mongo CPU usage you’ve been seeing.
I will be launching a hosted Meteor APM service. Therefore, I am looking for a few people to try it during the private beta. All of the plans will be free until it is made public, and provide up to 1 month in data retention.
Email, web hook, and slack alerts are enabled. CPU profiling will be turned on during the private beta. Support for production source maps has been added to the error tracker for client errors. The feature is limited right now, but will be greatly improved over the next couple weeks.
After the beta, there will be a free plan with data retention of less than a day, along with paid plans starting at $5 / server for 1 week of data retention. The number of servers is calculated the same way Kadira did, by using the median number of servers for the past month.
The product will be ready for the private beta later today. Please send me a message if you are interested in joining the private beta, or have any questions.