Meteor Scaling - Redis Oplog [Status: Prod ready]

I have been running it in production with overridePublish: true. Most everything seems to be working fine. But I am getting spikes in method response time, which I wasn’t before using redis:


It seems like it is not running the update queries in parallel.
Any ideas?

1 Like

@jamesgibson14 I need more details regarding that method and more details regarding the spikes. What’s the usual time it takes for it to run ? How are these “spikes” affecting it, how bigger the response time than previous ?

@jamesgibson14 it is most likely related to the way we treat optimistic-ui. It’s the only thing that could cause those spikes and it makes sense, since it requires additional processing. I currently have some ideas how to fix it, however I posted a question on Meteor Github, maybe someone with better knowledge than me can guide me to a better approach.

1 Like

Does this package essentially store all of the session state in redis? If I use it can I remove the requirement for sticky sessions / session affinity on my servers?

@clayne it’s not storing anything in redis, it’s redis’ pub/sub system.

I don’t understand the optimistic UI enough yet, give better feedback. But I will keep testing.

@jamesgibson14 https://github.com/cult-of-coders/redis-oplog/blob/master/lib/mongo/lib/dispatchers.js

Check compensate for latency. If you clone the repo into your packages folder, and simply comment those lines. You should not have anymore spikes. Are you able to test this ?

But isn’t the reason Meteor needs session affinity because Meteor stores all of a user’s subscription state on the server?

I like that this removes a lot of load on the application server, but I’d love to see it go one step further and push all the session state into Redis so that we can remove the requirements for session affinity. Is that beyond the scope of this package?

Pushing the SessionCollectionView to redis will require fetching it on every diff sent.

@diaconutheodor Do mean comment out just line 15? I will see if I can test it out today.

How about using redis as an async cache. The strategy looks something like this:

  1. We first hit the server’s SessionCollectionView to look for the user’s session.
  2. If it doesn’t exist then we go out to redis and fetch the session. If it exists we merge it in to the local cache.
  3. After the operation, we update the local cache and also update the redis cache.

This way, if a user is using a websocket they’ll tend to hit the same server. If they don’t, though, for whatever reason, they can reconnect to a different server and have their session restored.

When there’s a local cache miss it would take a bit more time but we would end up with a much more robust / horizontally scalable solution.

It would also help avoid issues when deploying since when we deploy a new version and connect the user to a new server it will be able to restore their session.

What do you think?

2 Likes

The package has been causing an issue for a Meteor Toys customer. I’m going to look into it, but I figured you might want to know about it, and or have a solution, since it could probably happen in other cases too.

hi folks… any updates on who are all using this in production? benchmarks, pros, cons please…?

cheers
raskal

2 Likes

Is anyone using this in production?

@msavin I looked at it. Does it happen only with the pro version ? If not, I may need the full version to test. Ping me on private I think I may know whatsup, I need to ask you some questions about Meteor Toys internals.

@raskal @bmustata yes. I recently discussed with @nadeemjq he had some issues understanding synthetic mutations, and he told me just by switching to redis-oplog, from 1s load-times he went to near instant. I don’t have all details, but there are people who had the courage to go in prod with it. (You the real MVP)

Sorry about the lower input in the past months, but I had to put my priorities in order. By end-of-march I will clean the issue board :slight_smile:

Critical items:

  • Fallback to long-polling when redis server dies
  • Full compatibility with reactive publish package
  • Some small issues with $unset and $addToSet

@clayne some good ideas there. However, I think we’re gonna uncover a can of worms with this. First of all, SessionCollectionView is per connection not per user.

Imagine this scenario:
I logged in as the same user in 2 tabs. In one tab I’m hitting Server 1, in the other I’m hitting Server 2. The cost of properly updating the SessionCollectionView per user, is going to be very costly time-wise and cpu-wise, because there are many details involved.

@jamesgibson14

I still don’t know how this works, sadly, couldn’t get someone to give me some points. I will look into it in depth, code is open-source, so it shouldn’t be hard to reverse-engineer it. I did the same for most parts of redis-oplog.

I have some crazy ideas to solve this issue with elegance, need to check.

thanks @diaconutheodor! I am moving my code to production in a week and would love to use redis oplog.

the main question i have is about addToSet. There are a few key places where I use this in my code. is it safe to use redis oplog in this case?

thanks
raskal

$addToSet and $pull are properly tested and they work, there were some issues with publishComposite which were fixed.

Seems that I can no longer change the title :frowning:. Edit: thanks @tcastelli

Release 1.1.8

Lots of fixes:

  • Redis retry strategy
  • Fixes for publishComposite and nested fields
  • Found a way to solve the optimistic-ui spikes which I will do in the next version.
5 Likes