connection.onClose concurrent calls fail sometimes

I run a chat application similar to chatroulette built using Meteor. the algorithms behind matching users is all done on the server by utilizing js arrays and objects as such data is not very sensitive to me and don’t feel like bloating the MongoDB with unnecessary transactions.

I use arrays for queuing the users, normally users are matched by popping them from the queue. however in some cases users are matched based on certain properties that are separated in different queues for which I implemented a seek search algorithm for faster search by constructing an index array for each type of queue.

Each match generates a pair ID which is used to identify each conversation/chat room.

I rely on connection ID to identify each connected user because I want to allow each user to possibly have multiple conversation using the same user ID.

when users match in a chat room, they are removed from their designated queue and are added to an associative object called “noQueue” which helps me track users that are in conversations, another records is also added to “currentChats” object which helps me also keep track of the current conversations and their duration so far.

when users are disconnected the connection.onClose() handler function looks for the connection ID and removes it from it’s current position whether it’s in a conversation or still in one of the queues.

The problem is: on an average of twice a day, records in the “currentChats” associative object are not removed on user disconnection using the connection.onClose() handler function. which leaves me with faulty records of 1 or 2 conversations with durations for more than 18 hours. To validate the information I check to find out that those users are not even online.

the following code snippet contains the connection.onClose() handler function:

Meteor.onConnection(function(connection){

connection.onClose(function(){//fires on user disconnection
    const uniQConnIndex = uniQConn.indexOf(connection.id);
    const QConnIndex = QConn.indexOf(connection.id);
    if(uniQConnIndex > -1){//removes user from the university queue
        console.log('User disconnected: '+ uniQueue[uniQConnIndex].email);
        uniQueue.splice(uniQConnIndex,1);
        indexUniQueue.splice(uniQConnIndex,1);
        uniQConn.splice(uniQConnIndex,1);
    }else if(QConnIndex > -1){ //removes user from the all queue
        console.log('User disconnected: '+ queue[QConnIndex].email);
        queue.splice(QConnIndex,1);
        indexQueue.splice(QConnIndex,1);
        QConn.splice(QConnIndex,1);
    }else if(noQueue[connection.id]){
        const targetNoQueue = noQueue[connection.id]; /*saves object*/

        //emits to all chat room users forcing them to discconect
        Meteor.setTimeout(function () {
            streamer.emit('matchDisconnected', targetNoQueue.pairId);
        },500);

        delete noQueue[targetNoQueue.matchedUser.connectionId]; /*deletes matched user fron no Queue*/
        delete noQueue[connection.id]; /*deletes user from no queue*/
//THIS IS THE SECTION THAT FAILS TO RUN SOMETIMES
        if(currentChats[targetNoQueue.pairId]){
            delete currentChats[targetNoQueue.pairId];/*removes the current chat from the array*/
            console.log("Chat ended between: "
                +targetNoQueue.email
                +" & "
                +targetNoQueue.matchedUser.email);
        }
      }
  });   

This snippet shows how the users are matched into conversations:

Meteor.methods({
'match': function (settings) { //finding a matchgit
    user = {};
    /*user.connection = this.connection;*/
    user.queueTime = undefined;
    user.queueTime = moment().toDate();
    user.connectionId = this.connection.id;
    /*user.ip = this.connection.clientAddress;*/
    user.id = Meteor.userId();
    user.uniName = Meteor.user().profile.uniName;
    user.gender = Meteor.user().profile.gender;
    user.email = Meteor.user().emails[0].address;
    user.rate = Meteor.user().profile.rate;
    user.matchedUser = null;
    user.pairId = null;
    console.log("User connected: " + Meteor.user().emails[0].address);
    // console.log("ConnectionId: " + user.connectionId);
    // console.log("Id: " + user.id);

    if (settings == 'all') {
        if (queue.length > 0) {
            if (queue[0].id !== user.id) {
                matchedUser = queue.pop();
                QConn.pop();
                indexQueue.pop();
            } else {
                queue.push(user);
                indexQueue.push(user.uniName);
                QConn.push(user.connectionId);
                // console.log("Queued user");
                // Meteor.call('count');
                return;
            }
        } else if (uniQueue.length > 0) {
            const matchedUserUniQueueIndex = indexUniQueue.indexOf(user.uniName);
            if (matchedUserUniQueueIndex > -1) {
                if (uniQueue[matchedUserUniQueueIndex].id !== user.id) {
                    //match the user from the uni queue
                    matchedUser = uniQueue[matchedUserUniQueueIndex];
                    uniQueue.splice(matchedUserUniQueueIndex, 1);
                    indexUniQueue.splice(matchedUserUniQueueIndex, 1);
                    uniQConn.splice(matchedUserUniQueueIndex, 1);
                } else {
                    queue.push(user);
                    indexQueue.push(user.uniName);
                    QConn.push(user.connectionId);
                    // console.log("Queued user");
                    // Meteor.call('count');
                    return;
                }
            } else {
                queue.push(user);
                indexQueue.push(user.uniName);
                QConn.push(user.connectionId);
                // console.log("Queued user");
                // Meteor.call('count');
                return;
            }
        }
        else {
            queue.push(user);
            indexQueue.push(user.uniName);
            QConn.push(user.connectionId);
            // console.log("Queued user");
            // Meteor.call('count');
            return;
        }
    } else if (settings == 'uni') {
        if (uniQueue.length > 0) {
            //implementing seek search to find a match of same uni
            const matchedUserUniQueueIndex = indexUniQueue.indexOf(user.uniName);

            //if the user of the same uni is found
            if (matchedUserUniQueueIndex > -1) {
                if (uniQueue[matchedUserUniQueueIndex].id !== user.id) {
                    //match the user from the uni queue
                    matchedUser = uniQueue[matchedUserUniQueueIndex];
                    uniQueue.splice(matchedUserUniQueueIndex, 1);
                    indexUniQueue.splice(matchedUserUniQueueIndex, 1);
                    uniQConn.splice(matchedUserUniQueueIndex, 1);
                } else {
                    //queue the user if no one from the same university is found
                    uniQueue.push(user);
                    indexUniQueue.push(user.uniName);
                    uniQConn.push(user.connectionId);
                    // console.log("Queued user");
                    // Meteor.call('count');
                    return; //exit when no university match in both
                }
            } else {
                //queue the user if no one from the same university is found
                uniQueue.push(user);
                indexUniQueue.push(user.uniName);
                uniQConn.push(user.connectionId);
                // console.log("Queued user");
                // Meteor.call('count');
                return; //exit when no university match in both
            }
        } else if (queue.length > 0) {
            const matchedUserQueueIndex = indexQueue.indexOf(user.uniName);

            if (matchedUserQueueIndex > -1) {
                if (queue[matchedUserQueueIndex].id !== user.id) {
                    //match the user from the regular queue
                    matchedUser = queue[matchedUserQueueIndex];
                    queue.splice(matchedUserQueueIndex, 1);
                    indexQueue.splice(matchedUserQueueIndex, 1);
                    QConn.splice(matchedUserQueueIndex, 1);
                } else {
                    //queue the user if no one from the same university is found
                    uniQueue.push(user);
                    indexUniQueue.push(user.uniName);
                    uniQConn.push(user.connectionId);
                    // console.log("Queued user");
                    // Meteor.call('count');
                    return; //exit when no university match in both
                }
            }else {
                //queue the user if no one from the same university is found
                uniQueue.push(user);
                indexUniQueue.push(user.uniName);
                uniQConn.push(user.connectionId);
                // console.log("Queued user");
                // Meteor.call('count');
                return; //exit when no university match in both
            }
        } else {
        //queue the user if there is no one in the queues
        uniQueue.push(user);
        indexUniQueue.push(user.uniName);
        uniQConn.push(user.connectionId);
        // console.log("Queued user");
        // Meteor.call('count');
        return;
        }
    }
    //pairing the user after successful match finding
    pairId = user.connectionId + "#" + matchedUser.connectionId;
    matchedUser.pairId = pairId;
    user.pairId = pairId;

    streamer.emit('match',{
        to: user.connectionId,
        pairId: pairId,
        match: matchedUser
    });
    streamer.emit('match',{
        to: matchedUser.connectionId,
        pairId: pairId,
        match: user
    });

    console.log('Matched between: '+ user.email + " & " + matchedUser.email);

    user.matchedUser = matchedUser;
    matchedUser.matchedUser = user;

    noQueue[user.connectionId] = user;
    noQueue[matchedUser.connectionId] = matchedUser;

    currentChats[pairId] = {user1: user.email, user2: matchedUser.email, start: moment()};

    Meteor.call('currentChats');

    Meteor.call('count');

},

the problem leaves me with server logs like this one having conversations that lasted for 2 days which is faulty info of course

app.web.1:  ===Current Chats===
app.web.1:  Users: giovanni.louis@student.guc.edu.eg & rehan.khattab@student.guc.edu.eg
app.web.1:  Started: 2 days ago
app.web.1:  Users: ibrahim.mohamad@student.guc.edu.eg & amr148209@bue.edu.eg
app.web.1:  Started: 2 days ago
app.web.1:  Users: abdel-rahman.ali@student.guc.edu.eg & rawan.abdo@student.guc.edu.eg
app.web.1:  Started: 2 days ago
app.web.1:  Users: hesham154970@bue.edu.eg & youssef.ahmed7@msa.edu.eg
app.web.1:  Started: 17 hours ago
app.web.1:  Users: mahmoud.nasser@student.guc.edu.eg & amr.mostafa4@msa.edu.eg
app.web.1:  Started: a few seconds ago

I have had some suspicions in the fact that the users may disconnect at the same time which will call the function at the same time deleting the records at the same time which might lead to call failing which I believe is highly unlikely, although I precariously added a delay between match disconnections to make sure this doesn’t happen. but this doesn’t solve it if both users disconnect at the same time, is this even possible?

There’s a lot of code to understand there, so I skimmed and stopped at the first issue (there may be others).

Your match method uses a global user object, which will be destroyed in “mid-flight” if another client makes that method call while the first is still running. You have a number of asynchronous calls in the logic which allow the node thread to yield (Meteor.user(), Meteor.call() for example) and expose that.

The simplest fix for this is to declare const user = {} at the start of the method.

There is another more general problem with your design as it stands, which is that it is not multi-server safe (you will not be able to scale out), because your server arrays will then be different in each server process.

1 Like

Thanks for your reply Mr.Rob, Luckily I changed the part of the user global object like 15 minutes ago, However you made me understand what kind of issue it was causing.
In regards to the scaling I fully understand that, I’m not planning to run multiple instances of the server probably increase the resources as my audience is fairly narrow, probably won’t exceed 10K users as a maximum potential.
Also I would like to know what you think about the main issue which sometimes doesn’t remove the array elements on user disconnection in this section:

else if(noQueue[connection.id]){
        const targetNoQueue = noQueue[connection.id]; /*saves object*/

        //emits to all chat room users forcing them to discconect
        Meteor.setTimeout(function () {
            streamer.emit('matchDisconnected', targetNoQueue.pairId);
        },500);

        delete noQueue[targetNoQueue.matchedUser.connectionId]; /*deletes matched user fron no Queue*/
        delete noQueue[connection.id]; /*deletes user from no queue*/
//THIS IS THE SECTION THAT FAILS TO RUN SOMETIMES
        if(currentChats[targetNoQueue.pairId]){
            delete currentChats[targetNoQueue.pairId];/*removes the current chat from the array*/
            console.log("Chat ended between: "
                +targetNoQueue.email
                +" & "
                +targetNoQueue.matchedUser.email);
        }
      }
  });

You have the same problem with matchUser in your method.

As for your onClose, without understanding how you are managing your arrays it’s almost impossible to give a definitive answer. However, are you certain that your logic is correct in your exclusion code?

if (uniQConnIndex > -1) { // removes user from the university queue
...
} else if (QConnIndex > -1) { // removes user from the all queue
  ...
} else if (noQueue[connection.id]) {
  ...
  delete noQueue[targetNoQueue.matchedUser.connectionId]; /* deletes matched user fron no Queue */
  delete noQueue[connection.id]; /* deletes user from no queue */

So it does correctly delete the noQueue elements, but not currentChats[targetNoQueue.pairId]?

yes, I fixed matchedUser as well, thanks for reminding me.

In regards to the onClose, yes it correctly deletes the noQueue elements but sometimes not always but once every couple of hours of users using the app currentChats[targetNoQueue.pairId] fails to delete.

let me explain a small part of how I manage it, when two users match, a property is created in the currentChats object as I use it as an associative array.
However, when those two users disconnect they both fire the onClose event thus reaching this section of my code:

delete noQueue[targetNoQueue.matchedUser.connectionId]; /* deletes matched user fron no Queue */
delete noQueue[connection.id]; /* deletes user from no queue */

What I’m assuming is that sometimes both users disconnect at the same instant causing them to execute the same line at the same time which leads to failing the execution. I though about the probability that this happens and unfortunately I don’t really understand whether it’s possible or not. so what do you think?

Not possible in that each user’s onClose is a separate closure, so the “same line” is not literally the same line. However, the use of Meteor.setTimeout() does introduce the possibilty of interleaving of two separate onClose events. As long as it’s safe to interleave (no global mutations) then it should be OK. However, your design relies on global mutations to work, so I would look very closely at what might happen if interleaving occurs. Incidentally, if that Meteor.setTimeout is removed I see no yielding methods - unless streamer.emit is asynchronous.

I understand that they are both separate closures however they both delete the same element/property in the global object, my approach of the Meteor.setTimeout() was based on my assumption that this element in the global object is being deleted twice at the same time by two separate closures initiated by both users leaving the same room at the same instant. which is a global mutation if that’s what you meant by the expression?

That is a global mutation. Deleting a non-existent property is harmless in itself - until you really on it being there. The node event loop will run a non-yielding method to completion and hold off other competing methods.The use of setTimeout won’t prevent that - in fact it encourages interleaving and so will increase the risk.

1 Like

Okay I understood what you meant, however I still don’t understand why sometimes the currentChat property isn’t deleted even though I’m technically running the delete code twice, once on each user connection closure.

It must be because at the point of deletion (assuming it’s got there in the code), targetNoQueue.pairId is not valid. Have you tried wrapping that section in a try/catch?

I will do that, and see what happens. thanks for your time Mr.Rob

1 Like

I think that the pairId was declared outside the method ass well which was globally mutating the delete currentChats[targetNoQueue.pairId] call at some point due to mid-flight changes you mentioned, I will test it on the production server and post my feedback, thanks again for clarifying these aspects Mr.Rob, you’ve been very helpful.

1 Like