Strange phenomenon: Subsequent methods or subs won't succeed

Indexes are the same, since the SimpleSchema definitions are the same. Also the stats on Atlas looked pretty much the same. In Monti APM, I couldn’t find a way to see the database query time for pubs. But for methods, they are neglectable, as you can see in the first images above.

The problem seems to be caused by the immense “wait times” in the pubs.

I do not know this. Is there a way to automate the db index using SimpleSchema? We have to manually do this on a per query basis

It’s actually part of Collection2, which extends SimpleSchema (both by aldeed):

2 Likes

Thanks. Learning a number of things from this thread :grin:

I unblocked my pubs now, but this resulted in a noticeable performance degradation.

For a test-case, I zoomed into my map, waited until it has been rendered, and then zoomed out again. Without the unblock, the rendering of the larger view-port takes about 2.3s, but with the unblock, it takes 2.8s, sometimes even over 3s.

I also deployed this version to the prod system. But nothing changed. The problem still persists, the app is barely usable.

CORRECTION: It’s in fact a bit better now on the prod system. The initial wait times are gone:

But still it takes much, much longer than on the staging system until the actual content is being rendered (this is what made me believe the sub did not return). The same operation, which takes 2s on the staging system, takes 12s - 16s on the prod system. Same code base. And it doesn’t really explain why the staging system did not show these problems in the first place, even without unblocks.

The same query on the staging system looks like this:

image

That’s a factor 10 in oplog wait time (17.1s vs. 1.5s), which explains the huge delay. For the same amount of data. The staging database is a 1:1 copy of the prod database, copied with 3T Studio.

1 Like

Ok. I temporarily connected my staging system to the Mongo DB cluster of the prod system. And voilà, it showed the same problems. Seems as if the Atlas cluster is the actual culprit. Now I have to find out how to restart that thing.

Update: Moved the database to a new cluster now, and everything is back to normal. This wasn’t really related to the original question, but at least the server is up and running again. :sweat_smile:

2 Likes

Yay :slight_smile: That’s quite something, we’ve been chasing a ghost!

1 Like

Yeah, thanks again for your support!

PS: I still don’t understand why it made a difference when I started Chrome from scratch, if the database was the culprit. But well…

1 Like

Hi @filipenevola I just wanted to report back that adding this.unblock() to a publication has a serious side-effect I was not aware of. If you do that and change the sub parameters, the pub will send all documents again, even if they already had been part of the scope of the original sub. You can read more about that in this thread:

I have been chasing this problem for days now, and finally remembered that I had added this.unblock() to all pubs. And as soon as I removed this, the pub behaved as I expected.

For a geospatial query like in my case, this side-effect is fatal, because thousands of results might be removed and re-added, which causes significant delays. It also explains why my pub got noticably slower after adding this.unblock.

Is there any documentation anywhere that explains this pub behavior in more detail? Even the DDP docs don’t describe this. I even tried to analyze the DDP code, but this magic seems to be implemented in the server-side Minimongo oplog merge mechanism, which I learned about just recently.

The question is: How can I now work-around the original problem described in this thread?

2 Likes

If your subs transfer a lot of documents and it is high load, you should replace it with methods and custom cache layer(over minimongo or another state management like mobx/redux).

Then you can add debounce/throttling to send requests as you need.

Also, you can reuse cached docs from cache and don’t receive it again(if your geo-points immutable for some time).

We use it to get users info like name, birthday etc., avatars.

Another case - chat application. We load channels and messages at startup, then subscribe to redis events to update data about channels/messages.

Yeap, it is need more works to done, but works more predictable and performance than raw pub/sub.

PS Pub/sub works great at some cases. But for high load it should be ignored.

1 Like

I used to have the same problem two or three years ago, using this.unblock(), but in the end there would be more serious side effects, I tried to find a forum or document, I remember, I seemed to have no solution, it was better for me to give up

I dropped all the pub/sub methods in favor of Apollo because there is generally no real-time response and no need to keep a long connection all the time,

I just told my experience, my skills and experience are limited, I hope you can find a solution and tell us the best solution, or I will use Pub/Sub solution again

:grin:

Well, I think it’s the beauty of pub/sub that you only get the deltas you actually need, if you have a large data set to display. Especially for mobile devices, this is a killer feature. I’m already using methods wherever I can.

In my use-case, I have to deal with real-time information on a world map. How should this work efficiently with methods? At least, it would require some sort of server-side caching, which is exactly what pubs do.

If the data was static, this would be a smaller issue, because I could split the world up in quadrants and load them as needed. But even then, once you zoom into the map, you would have to transfer a lot of data again and again, because you won’t know the deltas unless you calculate them on server-side.

1 Like

Do you still remember which side-effects you were facing? For now, I noticed that documents are deleted and re-added unnecessarily. But it made me suspicious that there might be some other traps I don’t know about yet.

You receive delta on updates, or when reuse data from other subscriptions. With zoom in/out you’ll receive new data? right?

By using event-driven without Mongo oplog, we’r using redis pub/sub.

I can’t solve you issue, but if you want to use efficient way for large data set, for me, its not about pub/sub.

The problem here was: this is only true if you don’t unblock them. Which was a recommendation to solve the original problem mentioned here (methods and pubs hanging up after one user interaction, but only after a couple of weeks).

Do you happen to know if Redis Oplog supports geo-spatial queries? I wasn’t aware that it is not supported by the normal oplog mechanism.

We’r not using redis oplog, because we use transactions with custom MongoDB methods, which doesn’t work with redis oplog package.

Instead, we use more raw and efficient way - redis pub/sub(Vent).

For example, you can dispatch any updates for points via Vent:

Points.update({_id:pointId}, { $set:{ title:newTitle } });
//publish event to all listeners of pointId
Vent.emit(`points:${pointId}:updateTitle`, { title: newTitle });
//publish event to all listeners of all points
Vent.emit(`points:updateTitle`, { title: newTitle });

Also, you can split points by the region (US,EU) or the country (UK,DE,BY,RU) or smallest part like area.

Vent.emit(`points:${point.region}:updateTitle`, { title: newTitle });
Vent.emit(`points:${point.country}:updateTitle`, { title: newTitle });
Vent.emit(`points:${lat+lon+radius or smth else}:updateTitle`, { title: newTitle });

When you change viewport, just load new data and subscribe to channels you need and mutate your cache on new events from channels.

1 Like

Could you please point us to which package this Vent stuff comes from? Or is Vent a custom MongoDB collection? I am unable to connect the dots based on your example, but I’m very keen to learn more about your implementation of redis-based pubsub as an alternative to regular Meteor publications.

RedisOplog’s Vent redis-oplog/vent.md at master · cult-of-coders/redis-oplog · GitHub

1 Like