Performance Issues(CPU) with large data in publication

We have real time chat application, where admin can see all chats going on. We need to publish following data

  • conversation
  • latest message
  • conversation user
  • conversation assignee

We use publish composite to publish related data. When the published list of conversations are upto 100, it works well. When user selects page size as 200 it becomes slow, and extremely slow on page size of 400-500, that every request takes more than a min to complete.

we use 10+ servers in production behind load balancer, data is properly indexed. Only server where such query with bigger page size goes, it becomes slow/dead.

Upon checking in kadira monitoring, it shows whenever such happens, livedata shows number of oplog notifications has gone up from 20-30k/minute to 100-150k/minute. I am unable to relate, how changing page size would affect number of oplog notifs, even though not much changes are happening on the data being published.

1 Like

I think Redis Oplog could help reducing CPU loads. But to solve the problem from the root, you will need to redesign your app.
Instead of using Meteor Publication/Subscription only, you will use both of Meteor Method and Meteor Pub/Sub.
Every data you fetch from server via Pub/Sub, your server needs to watch them for changes then send the update to clients. The more data you fetch, the more works to be done on the server.
Methods work different, the server sends data to the client then it’s done. Fire and forget. There is no additional works. That’s why using methods is more CPU efficient, but (of-course) you won’t have real-time (reactive) data update.
How to re-design your app using both method and pub/sub? You decide it. But for example, you use pub/sub to load the conversation only and you method to load messages. How? for every new messages, you update a field in conversation, e.g lastMessageAt, and you watch for this field changes, once it changed, you call a method to load more messages.
Btw, every fields you fetch via pub/sub, the server needs to watch them. So limit the number of fields as low as possible. I believe you won’t have any CPU loads issue if you design your app right.

2 Likes

Seems like there needs to be a corresponding increase in updates or updates that are impacting more documents. There is an oplog entry for each document that is updated, not the actual number of updates made by the server.

So I’d look for either (a) individual updates made for each document published to the client or (b) a single update that makes changes to all the published documents.

For (a), there’s a decent chance those updates are initiated by the client, which ought to be shown in Kadira in the “Methods” section (or you could use the “Meteor DevTools” Chrome extension and/or use the Chrome devtools to watch websocket/DDP traffic to see client->server updates). But if the updates are made by the server it won’t be visible in Kadira or in the websocket traffic.

For (b), Multi-document updates can only be initiated by the server, not the client. And debugging server initiated updates is more difficult. But you could look at the actual Mongo oplog collection or increase the Mongo debug verbosity level and look at that. Atlas also has some helpful diagnostics tools, if you happen to be using them. However, you might have to just stare at the code.

Redis Oplog is generally helpful, but in this particular case your primary problem is more likely to be related to update volume.

1 Like

Look at how Rocket.Chat does it with the Streamer package, I think that’s the way to go for apps like yours. Subscriptions will struggle to deal with performance for your use case at scale.

3 Likes

Perhaps, you could also try to optimize the client requests.

Basically request only what is visible on the window, kind like virtualized list. That way you can scale indefinitely.