Connecting to MongoDB directly from web browser + Change Streams

mitar · March 12, 2018, 7:13pm

I made an experiment connecting to MongoDB server from web browser and using Change Streams for reactive updates.

The idea is that by allowing web clients to directly connect to a MongoDB database server we remove all the overhead and latency introduced by intermediary code: multiple serializations and deserializations, memory buffers, etc. Because web browsers cannot directly connect to a TCP port, the web app exposes a thin WebSockets-TCP proxy which does not process packets but just passes them back and forth.

Change Streams is an official MongoDB API since MongoDB 3.6 to hook into the oplog and receive a stream of notifications as documents in a collection are being modified.

This web app then uses Vue to render the example reactive collection.

See it here: https://github.com/mitar/web-mongo

Feedback & discussion welcome.

Related: https://github.com/meteor/meteor-feature-requests/issues/158

robfallows · March 13, 2018, 9:58am

An interesting approach, and one I hadn’t considered.

Thanks for sharing

seba · March 13, 2018, 10:10am

Cool, it’s obviously not something you’d want to do for a real/production app, but it’s useful to start playing around with change streams to get a feel how/if Meteor could benefit from it.

msavin · March 13, 2018, 10:10am

This is nice - I’m excited to see the use of Change Streams in Meteor.

What kind of scaling implications might this have on the database side? AFAIK, MongoDB can only support 1000 connections at the same time, at least when it comes to Change Streams.

mitar · March 13, 2018, 10:12am

Depending. If you have public data, you could put it into a read-only collection and have client directly connect to it.

Yes. So you should have multiple replicasets and shards, maybe.

Also, the issue is that for each change stream MongoDB opens one connection. So it is even less clients.

I wrote out some my thoughts about both those issues into README.

msavin · March 13, 2018, 10:23am

@mitar is this where you are defining how ChangeStreams should work? Is it subscribing to changes for the entire collection?

github.com

mitar/web-mongo/blob/master/src/client/collections.js#L82


let initializing = true;
const pendingChanges = [];


const client = await mongoClientPromise;
const db = client.db(dbName);


// To make sure collection exists before we start watching it.
await db.createCollection(this.collectionName);


const collection = db.collection(this.collectionName);
const changeStream = collection.watch([
  {
    // TODO: We do not use it.
    $project: {
      ns: 0,
    },
  },
]);


changeStream.on('change', (change) => {
  if (initializing) {

Regarding your readme - the security is the top concern, especially if/when vulnerabilities in MongoDB are discovered. Plus scaling looks like it would be limited, I suspect Change Streams, when used appropriately, should help you scale to 1m concurrent users and beyond.

Either way, the code looks fairly simple - wouldn’t this be reasonable to integrate with Meteor’s pub/sub?

msavin · March 13, 2018, 10:27am

FYI - it looks like the documentation for Change Streams is finally live:

https://docs.mongodb.com/manual/reference/method/db.collection.watch/

mitar · March 13, 2018, 10:45am

Yes. And yes.

Sure. This is why I would probably have a dedicated MongoDB instance just for this public data, if I would go this way.

My main motivation for me was that I am working on one dashboard for measurements. So a lot of data is getting in and I would like to visualize it. Because the whole tool is to be used inside secured network I do not really worry about MongoDB security. But I do worry that I have to transport a lot of data from server to client and I do not want to spend time serializing/deserializing data unnecessary.

Not sure why you think it would help with scaling? How you got to this number?

I wrote my thoughts about this here: https://github.com/meteor/meteor-feature-requests/issues/158#issuecomment-372426480

In short: I do not see the benefit.

msavin · March 13, 2018, 10:53am

The benefit would be that somone can create static-first applications and then sprinkle in real-time magic where it makes sense, and it would be super scalable.

It’s arbitrary. I’m just trying to say, it should get you to the 7 figures and up, and not just 4-5 figures.

On a side note: I think where this public change streams approach might make sense is if your application has various third-party partners or services - it can make it really easy to create a real-time API or integration point while keeping the amount of connections reasonable (especially if IP whitelisting were implemented)

mitar · March 13, 2018, 6:33pm

That would be ideal, but I do not see change streams providing this scalability. Sadly. I also hoped for that.

O yes, it would be great if it would be so.

msavin · March 13, 2018, 7:35pm

What makes you say this?

Change Streams provide, or can provide, updates on a per document basis right? Meteor’s pub/sub is most efficient when subscribing to a document by _id. It looks like a perfect combination.

mitar · March 13, 2018, 8:00pm

Sure, but you can have only 1000 of change streams at the same time.

Also, you do not get notification when document is being removed.

msavin · March 14, 2018, 11:23am

It’s possible that the concept of having 1000 Change Streams connections has been misunderstood.

Looking at the documentation, Change Streams supports $match, which means that you can specify multiple queries that would meet the requirement with-in one Change Stream. It looks to be as flexible as any MongoDB query.

It also looks like Change Steams support operationType, which lets you watch just about any operation, including insert, update, delete, etc.

I would assume that if someone wanted super scalability, it can be set up so that each server would watch only the documents it needs _id. One can set up a second Change Stream to look for relevant inserts/removes/etc.

If correct, wouldn’t this be more scalable than redis-oplog, as it can deliver updates updates specifically to the servers that require them? or does redis-oplog somehow know which servers need to updated with which data?