So what happens if we delete temporary data (100.000+ docs) per day? Can we „opt out“ so that Meteor ignores the changes from the remove operation?
That is insane!! great job @jam I will ping the team so that we can start testing it right way, it is awesome to have this feature in a package
I did a quick review of your code regarding the change streams and here’s a few points:
- I see you added a comment about connection pool, but from my experiments (*), having a change stream per publication is simply not going to work.
- If you’d use an M30 instance on MongoDB Atlas (it’s already quite large), you’ll have 3k connections limit. Yes, that means less than 3k different change streams available in total.
- If you think that’s a lot – it may be, for some apps. At aleno we have more than 160k active publications in peaks, and some of publish more than one collection (hence would require more than one change stream).
- The
updateDescription
is not that simple and there are bothdisambiguatedPaths
andtruncatedArrays
which require additional handling. - (about the YMMV part) If you’ll have multiple change streams being affected by the same operation, then this one operation will be sent to your app multiple times. On the other hand, oplog will get only one.
- In an extreme case, you could have a multiplier of a hundred or even more, leading to worse performance than oplog tailing.
(*) I hope I’ll be able to share what I’m working on soon… We’ll see
Thanks for taking a look and sharing your thoughts! I was inspired the Change Streams performance data you posted previously and look forward to seeing what you have cooking.
- Technically this solution isn’t using a change stream per publication. It’s a change stream per unique match filter for a collection. So if your app has a lot of users subscribing with the same match filter it would reuse the same change stream. That being said, maybe that still won’t get the job done for some apps and if you’re scaling horizontally it won’t be as effective in reducing change stream creation (though currently scaling oplog horizontally won’t work as I understand it). Curious if you or anyone else has a better solution here.
- Thanks I’ll take a closer look at this.
- This is a good point. Welcome ideas on how to solve.
Yeah, I saw this deduplication and it’s very good that it’s there.
I have some “middleground” solution, but more on that later.
That’s basically the same problem as 1, but the other way round – the fewer change streams you have, the more processing the listeners have to do.
I’ve just published v0.3.0
. It includes some big improvements for using Change Streams with
Meteor.publish.stream
. There are also under-the-hood performance optimizations and improvements if you’d just like to use the package to enable Meteor.publish.once
or subscription caching. Check the changelog for more details. If you give it a go, let me know.
@jam We’re doing an episode on Change Streams tomorrow. Ping me on Slack if you’d like to hop on, chat about your projects that use CS. TWIM #59 is Saturday 10/19 @ 09:30 EST. lmk
Hey @alimgafar, thanks for the invite! It’d be fun to come on the show but unfortunately I can’t make it this time. Hopefully sometime down the line.
I just released a new update v0.4.0
and it’s a doozy . Here’s a quick summary, see the Readme for more details:
1) Data caching for Meteor.publish.stream
By default, the intial set of documents requested by a .stream
will now be cached and kept in sync as data changes. For example, let’s say you set up a stream for a chat room. When the first person connects, they’ll fetch the data for the room, e.g. the most recent 20 messages based on your sort
and limit
, and establish a change stream for the room.
Meteor.publish.stream('messages.room', function({ roomId, sort, limit }) {
// check the data
return Messages.find({ roomId }, { sort, limit });
});
As new messages are inserted / updated / deleted, that set of the 20 most recent messages will be kept in sync so that when others join the room, they’ll pull directly from the cache instead of hitting the database. As users scroll back through the message history, the historical documents will not be cached to avoid expending server resources.
So instead of the traditional Meteor behavior of keeping all state for every client + 1 for the observer, you’ll be only keeping the 20 most recent documents. This should be a big help in freeing up server and db resources and reducing latency.
2) Reduce server resource usage with the serverState
config
You can configure the amount of state you want to keep on the server by setting serverState
. It can be one of auto | standard | minimal | none
. By default, it’s set to auto
. Basically, if you’re using .stream
exclusively for a collection it seeks to minimize expending server resources with the trade off being likely additional bandwidth. See the Readme for more details about the other config options.
3) Efficient pagination / infinite scroll for .stream
and .once
These support range-based pagination using a timestamp so you can avoid using the slower skip
+ limit
if you’d like. Check out the examples.
4) Cache a subscription for the user’s session with cacheDuration: Infinity
I imagine this could come in handy to preserve data fetched via Meteor.publish.once
so that it doesn’t get automatically wiped from Minimongo.
If you have any thoughts or ideas for improvements, let me know!
This certainly is a doozy !! Great work
I’m wondering if you can share any of your performance comparisons ?
Also, is there anyway to see the stats on the cache (size etc) and how well it’s performing ? Or maybe even just a verbose mode so it prints things in the console.
And more in general with the package, if we hooked this up with Monti would it still show these like regular observers ?
I’m wondering how easy it would be to re-write tabular with this
I think this will vary quite a bit from app to app.
What types of things would be helpful? I could add some logs when PubSub.debug
is set to true.
I’d expect it to play nicely with Monti but I haven’t double checked. .stream
publications should appear in Monti’s publications and .once
publications should appear in Monti’s methods.
A brand new version of jam:pub-sub is out.
The latest version now optimizes data sent over the wire when using .updateAsync
and includes a slew of fixes. Check the Changelog for more info.