Running a own Kadira instance Update: now with a guide!

at first glance it looks like the mapreduce processes are being queued up faster than they’re completing. i’ll look into it in a bit more detail later in the week but my gut feeling is that the db has now grown to a size where the mapreduce process takes longer than it should and kadira might not be waiting for existing calls to finish before triggering the next…?

guess the options there would be to discard old data or beef up the db instance CPU.

Anyone have a cmd or snippet to remove all Kadira records older than X days? I tried doing a manual delete on the big collections (methodTraces etc) but that didn’t go so well and I had to nuke and setup again.

I’m sure the real Kadira had scheduled scripts for this kind of thing.

I run Kadira on a ec2 t2,small instance with 20gb disk (EBS, the default, so its not fast). We have 3-5 instances using it. After a few weeks it stops responding. I asked a qn above about pruning it. My current solution is to just nuke and restart -

docker-compose stop
docker system prune -af
docker-compose up -d

Then it all works (of course :slight_smile: )

Yeah it’s a bit of a pain.

I’ll probably have time to look at this in the next couple of days, if I can figure out a clean way to prune the old records from the db I’ll post it up.

Hey guys,

many of you asked about publishing the source of my docker images

and here you are - https://github.com/vladgolubev/kadira

Sorry for waiting. I kept it private as I removed too much code from kadira-ui (payment-related code) and it didn’t show data older > 2 weeks.

But I never got to fix that so it’s open now.

Hey @mrrk did you figure out a clean way to prune the old records? This is a problem I was having as well.

mongo
use kadiraData

db.getCollection('prodStats').remove({"time": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('pubMetrics').remove({"_id.time": {"$lte": new ISODate("2017-10-15T00:00:00.000Z")}})
db.getCollection('pubTraces').remove({"startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawErrorMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawMethodsMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawPubMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rawSystemMetrics').remove({"value.startTime": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('rmaLogs').remove({"startedAt": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})
db.getCollection('systemMetrics').remove({"_id.time": {"$lte": new ISODate("2017-08-15T00:00:00.000Z")}})

Change the date to whatever you want. My instance normally gets sluggish after 2 weeks (running only 1 host) so I just run the above queries and has resolved it for me.

4 Likes

I started to reduce the Kadira APM projects to one Meteor project.
You only have to run meteor with a mongo replica set (standalone is fine).
So docker is not required.

The slack alerts are working and some other optimizations where made.
But still a lot of work to do…Looking for contributors.

6 Likes

I am very much interested in this.

I created a pull request on vladgolubev/kadira to integrate the AWS configuration from lampe/kadira-server - both of these are really good repos to get Kadria running.

Would be great if we can all work on a single repo with our improvements and bug fixes - I just happened to use vladgolubev/kadira first so I wanted to integrate those changes there.

With that repo I have the APM and debugger working on a complex setup (Docker Swarm behind a Nginx load balancer).

Where I am currently stuck is if I run Kadira.profileCpu(10) in the browser console of my app, I get the following error on the server:

Exception while invoking method 'kadira.profileCpu' Error: Error relocating /bundle/programs/server/npm/node_modules/meteor/meteorhacks_kadira-binary-deps/node_modules/v8-profiler/build/profiler/v5.6.5/node-v48-linux-x64/profiler.node: __sprintf_chk: symbol not found
    at Error (native)

I think it’s because I’m using an Alpine meteor base and it lacks some dependencies - anyone have any insight?

Thanks for the great effort by all so far - hope to contribute further.

The docker images are not working due to the DDOS attack on kadira. How do I update the meteorhacks:kadira image to inlcude the new patches?

Hi @vladgolubev, thank a lot for your work.
I have a issue when I create a new app in kadira-gui

Exception while invoking method 'apps.create' TypeError: Cannot read property 'name' of null
    at [object Object].MongoShardedCluster.pickShard (/home/meteor/www/bundle/programs/server/npm/node_modules/meteor/local_kadira-data/node_modules/mongo-sharded-cluster/lib/index.js:54:26)
    at [object Object].Meteor.methods.apps.create (server/methods/apps.js:13:41)
    at packages/check.js:130:16
    at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
    at Object.exports.Match._failIfArgumentsAreNotAllChecked (packages/check.js:129:41)
    at maybeAuditArgumentChecks (packages/ddp-server/livedata_server.js:1734:18)
    at packages/ddp-server/livedata_server.js:719:19
    at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
    at packages/ddp-server/livedata_server.js:717:40
    at [object Object]._.extend.withValue (packages/meteor.js:1122:17)

Can you help me ?

Hi @eportico,

unfortunately, I don’t work with Kadira anymore, and don’t plan in future.

For anyone who’s running their own Kadira, this is what I’ve found - I’m using @vladgolubev’s excellent docker images without without which this wouldn’t be possible, so a big thanks.

I am also running mongodb on the same instance via docker, buying dedicated mongo on compose/Atlas gets too expensive.

The majority of resources and cpu is used by mongo (there are some huge MR jobs running all the time!). ‘docker stats’ routinely shows that mongo is using >200% cpu while the other dockers are idle. Thus I also didn’t see any perf increase by running mongo on a separate EC2 instance (which also increases costs).

We have 4-5 servers with moderate traffic. Right now Kadira is running on a t2.large/c4.medium and after a few weeks, it still crawls to a total stop and things fail to load. This is due to all the MR jobs and cpu used by mongo. The only fix is to then

  1. remove containers and reup (which means new appid’s
  2. delete all current data

Both are not really ideal but I haven’t found a better way. I have to wonder what the prod Kadira and Galaxy APM run on, because I don’t see any way to scale this horizontally.

1 Like

We’ve been working on a version of Meteor APM that you can self host using MUP, and recently fixed some issues with missing indexes that corrected the massive mongo CPU usage you’ve been seeing.

1 Like

I have it working on a subdomain and an EC2 instance.

Firstly, I’m currently using a t.medium (4gb) oppose to t.micro (1gb). The amount of memory being used is strange.

              total        used        free      shared  buff/cache   available
Mem:           3950         736         911           6        2302        2897

Secondly, I couldn’t send data to https://ec2-0-0-0-0.ap-southeast-2.compute.amazonaws.com:22022.
I had to create a subdomain and send it to https://subdomain.domain.com:22022.

Thank you for your help, edemaine. I really appreciate it.

I will be launching a hosted Meteor APM service. Therefore, I am looking for a few people to try it during the private beta. All of the plans will be free until it is made public, and provide up to 1 month in data retention.

Email, web hook, and slack alerts are enabled. CPU profiling will be turned on during the private beta. Support for production source maps has been added to the error tracker for client errors. The feature is limited right now, but will be greatly improved over the next couple weeks.

After the beta, there will be a free plan with data retention of less than a day, along with paid plans starting at $5 / server for 1 week of data retention. The number of servers is calculated the same way Kadira did, by using the median number of servers for the past month.

The product will be ready for the private beta later today. Please send me a message if you are interested in joining the private beta, or have any questions.

Thank you.

6 Likes

I made a Docker Stack File thanks!

Hi @zodern

Have you made this service available yet?
I would like to take part.

Thanks!

Hi @bradzo.

Thanks for your interest. It currently is in a private beta. I sent you a message with the instructions to join.