Introduction of DDP Router

ignl · June 4, 2024, 4:41pm

Yes good point, what I mean it’s nice to have and neat but it’s not that big of a selling point as sometimes we meteor users think it is (otherwise meteor would have been huge). EDIT: also mergebox != publication of one cursor and potentially might make it harder to reuse. If both clients are subscribed to the same cursor but they have a different set of other subscriptions on the same collection they will receive different updates. I don’t know mergebox internals but from general software engineering perspective I suspect there might be many nasty edge cases that could be not trivial to solve.

But since you mentioned it’s already done it’s very interesting then to see results!!

Maybe I will learn something today Does mongo driver do some kind of caching? I though that if we have a cursor to get actual data it still should hit database. Is data cached in memory for active cursor?

Very interesting, thanks for answer!

radekmie · June 5, 2024, 6:50am

From my experience, it highly depends on the business you’re in. For example, real-time is far more important in gastronomy than banking. Then you can make it into your unique selling point and some people will really like it.

Mergebox has both each client’s (called Session and each cursor’s (called ObserveMultiplexer) state. The subscription you execute is not important, since the cursor is actually the deduplicated part. DDP Router does the same.

Yep, that’s the ObserveMultiplexer above.

ignl · June 5, 2024, 7:57am

Real time is fine, I am just more sceptical about the mergebox functionality itself. I come from Java background so not very good with Js or meteor internals even though I use it for a while now
What you are doing here could be called messaging middleware and you could take a look at other such offerings and in some longer term maybe it can become meteor agnostic. If you have some unique features (ironically mergebox probably could be such) it could become separate project altogether.
Alternatively maybe it’s possible to reuse something, for example take a look at zeromq.org it has pub/sub too but is much more that than.

radekmie · June 5, 2024, 8:23am

Except for the DDP part (duh), the DDP Router does not know anything about Meteor. It subscribes to the database, deduplicates a lot, and sends updates. In that case, it’s perfectly viable to think of it in the context of other applications, especially non-Meteor ones.

And it gets even better once we start talking about subscribing to multiple data sources, not only MongoDB.

marklynch · June 5, 2024, 10:52am

Amazing stuff as always !

I was wondering if this is a continuation of what you started with changestream-to-redis ? Is that likely to see any updates? or based on the experience with this project you reckon it’s still solid ?

I’m a little concerned it’s destined to be a galaxy feature - if someone from Meteor could comment and say what the plan is that would be great. If this was opensource and useable by all I think it would be a huge plus for the whole community.

radekmie · June 5, 2024, 10:59am

The DDP Router wouldn’t be possible without changestream-to-redis because I learned a lot about the mergebox and how it was meant to be But as I mentioned already, it’s perfectly valid to use both together. In the long run, I’d say that DDP Router could entirely replace the Meteor’s builtin mergebox, making both changestream-to-redis and cult-of-coders:redis-oplog obsolete.

And as for changestream-to-redis, it’s serving us well in production, so I consider it “ready”. It’s not “done”, though – I want to add more things, like Expose service metrics · Issue #3 · radekmie/changestream-to-redis · GitHub to help with debugging and improve monitoring.

I’ll let myself ping @fredmaiaarantes here.

XTA · June 5, 2024, 1:17pm

Hey,
very interesting. So what will happen if we delete 200.000 - 300.000 (old) documents per day in a batch? Is there any way where we can set some kind of ignore flag so that the DDP Router will ignore these changes, or will we get some CPU spikes and crashes like in the current core implementations? Currently only redis-oplog seems to solve this issue.

radekmie · June 5, 2024, 1:36pm

It depends. If there are no active subscriptions to the collection you’re removing documents from, DDP Router will not even know about it. If there are, it’ll need to check whether anyone subscribes to it.

But the change stream events for deletion are fairly small, so I don’t think it’d pose any problems. I’ve checked it with changestream-to-redis, and it can process thousands of events per second per core, so even if the overhead would be 10x as high (I doubt it’ll be that big), it’s still not a problem.

Having said that, of course, real-life tests are needed.

(Also, the DDP Router supports pooling, just like Meteor does, so it won’t be an issue then. You can configure it on every cursor separately.)

superfail · June 5, 2024, 10:01pm

Hi @radekmie! Could you explain why a “multiplexer” is still necessary when relying on change streams instead of the oplog? My understanding was that there was a lot of work involved in tailing and parsing the oplog, so that work should not be duplicated for each subscription, but is that still relevant when using change streams? Can’t you leverage the natively supported aggregation operator $match and let Mongo do the multiplexing?

radekmie · June 6, 2024, 6:26am

Without multiplexing, not only the Change Streams would be duplicated, but also all of the server-database communication. The former is already an issue on its own since MongoDB will struggle with hundreds of thousands of active Change Streams. The latter is even worse since if multiple people subscribe to the same collection with a slightly different query, every document would be sent multiple times (unnecessarily).

That’s what I wanted to do at first, but it doesn’t work Let’s say you start a Change Stream with a $match stage. When a new document is added, it works. When it’s removed, it’ll also work. But what about an update? It won’t be triggered when the document after the operation won’t match. It can be worked around using fullDocumentBeforeChange, but that, again, puts more load on the database.

hschmaiske · June 6, 2024, 1:44pm

Very good work @radekmie
The results speak for themselves.

koenlav · June 6, 2024, 4:14pm

@radekmie we’d be happy to test this! We’re not on Galaxy though.

leonardoventurini · June 6, 2024, 5:28pm

This is a level of innovation I haven’t seen for a while in the Meteor, perhaps even the real-time ecosystem. I smell a great product brewing up in there. Congrats!

landland · June 6, 2024, 9:39pm

This sounds amazing. Let me know if you need another beta tester. I’m not on Galaxy but would happily test it on a large app.

timheckel · June 10, 2024, 12:44am

Hi @radekmie - this sounds so great. I also have a large prod meteor app on 2.8 (trying to migrate to 3) and would love to test this out. Also not on Galaxy.

superfail · June 11, 2024, 8:24pm

@radekmie OK. Last question. Why Rust and not, say, Golang? Golang seems more appropriate to do both low-memory and async IO.

radekmie · June 12, 2024, 6:39am

Just because I have some solid experience with Rust and none with Go. (I don’t want to go into a debate on which language is better or faster.)

superfail · July 3, 2024, 4:50pm

Now that I am bit more educated on the question, it seems to me Rust is as capable in terms of async IO as Golang is. A bit more low-level perhaps, but gives you more control if you need it. The border between CPU-parallelism and IO-parallelism is a bit blurred in Golang.

superfail · July 3, 2024, 4:53pm

I know I said “promise, last question” before, but here is another one. Does relying on change streams instead of the oplog mean no more polling? Does that mean we always get immediate feedback from subscription, even when using “raw” collection methods (transactions)?

radekmie · July 6, 2024, 7:40am

As far as I know, change streams include transaction events just like oplog, so I wouldn’t expect any changes here. Have you had any issues with that in the past?

And if I recall correctly, there’s no pooling in Meteor’s MongoDB oplog driver, except for the oplog pooling, of course. But that’s about the same with change streams, as they operate on cursors under the hood, which are still pull-based.