This is a more general post linked to my post ObserveChange very slow to trigger - #4 by ivo
We have a specific problematic, which I thought at first would be pretty easy to manage but turns out due to performance issue we do not yet manage to deliver a proper UX. Here I’d like to have some other’s opinion on how to approach the problematic because I feel my approach is limited to my knowledge and it could be done more easily and/or more efficiently.
Here is the “simple” flow of our app for this specific reactive activity. I’ll do an example with 100 users.
- 100 users connect to a unique URL (a session), they are separated into teams of 5 players (20 teams)
- Game consist of 10 challenges (questions), each challenge has a timer of 30 seconds max.
- Submitting the team answer (which is the answer that was voted the most by the team’s player) will be triggered either if 1/ timer is passed 2/ all the team members have replied.
- After each question you have a good / bad answer screen to summarize the answer of the question and then you can set yourself ready and go to the next question
- All along the game you have to know if your teammates have replied already to a question, if they are set as ready to go to next and also the score of the other teams (hence the need for reactivity) Also the tracker will keep track for the current question for each team and the current screen they’re on.
Right now my approach is the following:
- Admin can create a unique session, it will include general info about the session + an array with all the teams’ info. This teams’ info array will include for each challenge all the team member answer and the fact they’re ready to go to next question.
- When player connect to the unique link they subscribe to this unique session publication (so only one item from the collection (a big one) is published in the pub/sub)
- When player submit an answer or tick they’re ready for next this data is pushed (only push and addToSet from one user to avoid concurrency issues) to the session.team.answers. It means for each challenge there will be 100 push of all the players’ answers and 100 push of player is ready and all this update are updated in the subscription and reflected back to the user. When a team arrives to a new challenge we create a Job in a separate collection, called Jobs, that will be checked later (see under)
- Because of concurrency (player answering more or less at the same time) it didn’t seem like a good approach to check if all players have answered after each player submitted a reply so the idea was to run a worker on a separated cpu thanks to nschwarz:cluster that runs every 250ms and check the collection called Jobs. One Job created here is for one question for one team and is deleted after the team has replied. The Jobs will check if the due Time for a question is not passed or if all players have replied for a question. If any of these 2 is true it will submit automatically the team’s answer. It was to make sure there was only one point, on server side, that could submit the team’s question.
It means with 20 teams there will be 20 open jobs, every 250ms we’re looping through these jobs and checking if the team is ready to move forward (due to timer, everyone has replied, or everyone is ready to go to next)
While the flow seems to work very good (no problem in the logic), it leads to plenty of reactivity and performance issues. It could be because of the cluster querying the database every 250ms, could be because there are too many update to the Session we create and then reactivity is not good enough (each time one player replies to one question it will trigger the reactivity, plus the change made by the cluser, altogether it’s around 200 updates to the Session items within a 1minute timeframe and each update is send back to all the 100 users)… Anyway it works but the user experience is chaos (talking up to 5seconds to go to the next challenge)
If some of you are still there after this long reading, the questions are:
- Is the approach ok, if so what could be improved ?
- Is there a totally different approach you would recommend? How would you have approached this problematic
Some ideas I had:
- separate the subscription into smaller subscription, one per team for their own answers and progress and one general to just see the full teams progress. Teams would have different observers triggers but at the end the level of refresh is similar I guess. At least one team member reply would just be updated to his team and not everyone. But I don’t know what affects performance. To detect the changes (and then it could not matter) or to send them back to the users
- Is cluster making too many queries to the db and causing the latency ? We don’t see anything in the metrics but we could find out it could take up to 5seconds for the observechange to trigger.
- installing redis-oplog ? We don’t have any redis set up yet, and don’t have too many knowledge about it but everyone seems to say it’s a life changer. Just setting up a redis instance and basic configuration redis oplog could gain us time ? Is there a risk for worse performance?