Introduction of DDP Router

radekmie · June 3, 2024, 3:00pm

Intro

I’ve been thinking about Meteor performance a lot recently and I have one crazy idea… Would any of you be up for a talk about it? I can write it down later, but it’d take me longer than that

The codename would be “DDP Router”.

@radekmie, 25th January, 2024

It’s a genius idea! I really liked it, and I think it would be extremely beneficial for Meteor.

@denyhs, 26th January, 2024

It is an insane idea and I loved it

@grubba, 26th January, 2024

By the end of the last year, I went on a longer-than-usual vacation. I had a couple of PhD topics to think through, but after a couple of days, I started wondering: where is Meteor actually lacking performance? Is it the MongoDB integration? Is it the DDP protocol? Is it the merge box? Or is it an inevitable result of using Node.js, or more general, garbage-collected language?

A few days later, I realized there’s not a single piece that is slow or bad – it’s just a result of how we want it to work. We want real-time communication and as little amount of work as possible on both the server and client. Ideally an infinitely scalable solution, right?

So… What if we would rework the publications entirely? But we have to do it in a backward-compatible way, as otherwise, it’ll lead to a huge community split (like Python 2 vs 3). It’s a tall order already, but there’s more to it – we’d need to make it worth the additional effort.

Here’s an idea: implement a specialized service to handle the publications. The application remains unchanged, the database load is the same, and the change is transparent to the users. But how do we do that? And where’s the gain?

We let this service ask the server, “What would the publication do?” That makes it entirely backward-compatible and transparent for both the client and the server.
So, every request goes to this new service; it asks the server and then observes the database itself. That means the server only has to start the publications, and then no server resources are needed. (The load is now on the new service.)

DDP Router

That’s a new service hosted separately from the Meteor server. When the client connects to the DDP Router, the DDP Router connects to the Meteor server and forwards all DDP messages both ways.

However, when the client subscribes to a publication, the DDP Router executes a method instead. The Meteor server responds with a serialized cursor (or a list of them). Then, the DDP Router starts listening to the database on its own. Yes, that means it reimplements the entire merge box logic, including all MongoDB operators, sorting, limit/skip, etc. Yes, that’s a lot of logic and tests

In our initial tests, we see CPU gains of 10-20% and RAM gains of 15-25%. (The latter, of course, includes the RAM used by the DDP Router itself.) The gains are there because the server is not tailing the oplog, and the DDP Router is highly optimized to do only that. And yes, it’s written in Rust

One more upside – it uses Change Streams instead of Oplog!

Also, it’s perfectly possible to extend it further, making it a true router. For example, we could route certain DDP methods to a different Meteor server, some GraphQL/gRPC/REST/whatever API, or even support non-MongoDB publications easily!

Who wants to give it a try?

Yes, you heard it right! We tested it with Simple Todos, Atmosphere, and aleno. We haven’t got a chance to go live with it yet, but we’d like to test it with more apps. If you’re interested, please do let us know here!

dallman · June 3, 2024, 5:21pm

I have a two questions/ clarifications. First, at a high level, the client connects to this new service, then the new service connects to the DB, right? Second, is this new service/ does this new service need to be hosted under the meteor binary? It would be very useful to have this publish/ subscribe bevaior encapsulated as an independent node module that could be composed into other applications…

pmogollon · June 3, 2024, 5:43pm

Ir sounds really interesting. Can this service be compatible with refis-oplog?

vooteles · June 3, 2024, 6:20pm

Wow. Really interesting indeed. Having the possibility to just take a specific publication and move it onto a separate instance sounds like a great solution for many scenarios. Especially as a way to alleviate the issue of horizontal scaling leading to many instances hammering the oplog.

storyteller · June 3, 2024, 6:47pm

I’ll have to wrap my head around this, so I would be interested in hearing more. I have potentially a production application that could utilize this.

radekmie · June 3, 2024, 8:25pm

Yes, DDP Router connects to your database. But it reuses the connections, so it’s not 1 database connection for each connected client. In practice it has a connection pool, just like Meteor does, so you can configure the maximum number of connections.
No, it’s a separate service, deployed separately from your app. What’s important is that the app does not require it to work, i.e., you deploy your app, then the DDP Router, and then decide where the users should connect to. That means you can have it running in parallel to your app and try it out safely.

radekmie · June 3, 2024, 8:26pm

I think it could, but there’s no point in that – that’s a replacement for the whole database observing part. Or maybe you have something in particular in mind?

radekmie · June 3, 2024, 8:26pm

It’s more like a dedicated service for all of your publications. But yes, it’s also possible to limit it to only a few.

radekmie · June 3, 2024, 8:27pm

I’ll reach out to you and other interested people later this week.

harry97 · June 3, 2024, 8:59pm

Why didn’t you opensource it from the get go? Are you planning on making it some sort of Saas? And why did you chose Rust?

rjdavid · June 3, 2024, 11:57pm

So there are 3 main changes:

Oplog to change stream
Change stream publications managed by ddp router
DDP router written in Rust

Can you estimate the performance gains contributed by each one?

radekmie · June 4, 2024, 6:34am

My time spent on it was sponsored entirely by Meteor Software. The idea is to make it a Galaxy feature, but it may change in the future. We also discussed open sourcing some parts of it, e.g., the mergebox. (Maybe we could even use it as a WASM extension to replace Meteor’s one, but I didn’t benchmarked it yet.)

radekmie · June 4, 2024, 6:41am

It’s really hard to tell, but let’s go one by one.

This is neither always better nor always worse. The way DDP Router uses it is almost equivalent to Oplog tailing, but leverages only well-documented (and typed!) APIs.
This is probably not a win on its own, but it allows you to have fewer (e.g., 2 instead of 20) Meteor instances and one, beefy, DDP Router instance. That’s because the latter scales vertically easily (i.e., it can utilize multiple threads and thus deduplicate even more publication observers).
That’s a lot. The biggest gain is that we operate on almost binary data and not JavaScript objects. As such the memory usage is significantly lower, and that positively impacts the CPU as well.

ignl · June 4, 2024, 8:48am

Let’s say I manage to split my app into App1 that handles only subscriptions and App2 that does everything else. Basically that would be exactly the same idea, right? Just without rewriting merge box etc?

So basically this idea is just reimplementation of meteor server side reactivity in more efficient manner? It’s not done in meteor right away because there are hopes to evaluate it as value add product for Galaxy service. Do I understand it correctly?

rjdavid · June 4, 2024, 9:32am

Seems like that can be a big win considering that oplog tailing was known to eat resources proportional to the number of server instances

radekmie · June 4, 2024, 9:37am

Well, mostly. It would be that it’s rather hard to separate methods and publications in Meteor, since both go through the same websocket.

That’s the initial state, yes. But the idea behind the DDP Router goes further, e.g., supporting multiple databases (it’s easier with this architecture) or multiple backends (imagine a DDP method that skips the Meteor server entirely and instead executes a GraphQL request instead).

Exactly. It’s also easier to try it out this way.

radekmie · June 4, 2024, 9:38am

That’s exactly where the idea came from. The more the application relies on subscriptions, the bigger the benefit is.

denyhs · June 4, 2024, 12:50pm

The first time @radekmie told me about this, my first question was why we didn’t have it on Meteor yet! I still think it’s a fantastic idea and I believe it’ll be super beneficial for performance in large applications.

ignl · June 4, 2024, 1:41pm

Coincidentally I thought a bit about meteor this morning I can share a few thoughts if you are interested

Maybe not everyone will accept this but I actually think that merge box functionality is very niche. I don’t think there are that much value there to begin with and complexities to have both working correctly as intended and with good performance are enormous. Yes you win some bandwidth and that might be important in some cases (so it’s niche) but for a lot of cases you can live without mergebox. We could see how many people came here and got advice to not use subscriptions but methods and in some cases they reported 90% of stuff converted and app is fine and now performant. On top of that we have different subscription strategies now and me myself converted my chat app to NO_MERGE_NO_HISTORY and using it with redis oplog with good results.

Focusing on reactivity (not that hard to do with websockets on any framework), mergebox (saving bandwidth is low value imho), client side inserts with allow/deny (just horrible idea in general for many reasons) didn’t really work for original MDG and that’s why they abandoned project and started from scratch as those selling points were not selling that well (at least not well enough for investment).

Is it worth to do it I think depends on ROI - reimplementing mergebox to production level could be very expensive. Also it would be interesting to see how much you win vs redis oplog usage instead of oplog tailing. Complexity of deployment seems to also increase and it will be difficult to test on dev machine (as I understand it won’t be available locally only on galaxy).

Anyway idea is interesting and pls don’t think I am trying to discourage it, I welcome all innovation for meteor Just some thoughts I got out and maybe something to think about.

radekmie · June 4, 2024, 2:40pm

While I agree it’s rather a niche, I worked on multiple huge Meteor apps using publications for >90% of their data. Even in the project I work on now (aleno), NO_MERGE_NO_HISTORY is not feasible as it moves the computation to the end-user devices and in our case that is some old, already struggling tablet.

I could also argue, that it’s a niche because of the performance issues and not the other way round. Imagine how much easier it would be to achieve “out-of-the-box full-stack reactivity” (that’s a mouthful) if it’d stay reasonably performant with no problems or drawbacks.

On the other hand, mergebox is not only reducing the bandwidth, but can also skip an entire database query when another client subscribes to the same cursor. (Observer deduplication; I wrote about it in https://forums.meteor.com/t/meteor-scaling-performance-best-practices/52886/3.) With this in mind, we’re not only reducing the database pressure but also reducing the latency (the data’s ready immediately).

Technically you can still have both. It’s just that publications going through the DDP Router won’t use (and need) it. When I tested it on aleno, I actually had both running in parallel.

And as for the ROI reimplementing it… It’s already reimplemented. Like, 95% of it (there are some limitations, e.g., $where would require an entire JavaScript runtime). I even copied all of the relevant tests from Meteor

I don’t know what will be the final release type, but practically speaking it’s just a single binary you provide config to – it’s really easy to run it locally. Or even run it locally against a remote server!