Display status of services running server-side on the client

zayco · April 8, 2024, 11:16am

I am working on a project where multiple servers will be placed on-prem and act as gateway servers relaying data to a main server in the cloud. The gateway servers will be connected to a mqtt server and to the main server over DDP initiated from the gateway servers.

I am trying to figure out the best way to display the connection status of the services running server-side on the client.

Ideally I would like to have the following functionallity:

Client side on the gateway server:

See the DDP status of the DDP connection running on the server
See the status of the mqtt connection running on the server

Client side on the main server:

At least see which gateway serves are connected EDIT: Can possibly use Meteor.onConnection(callback) if I can supply a known id in the header? but it would still only be on the server I guess
Ideally see the status of the mqtt connection running on the gateway server but that is not as important.

Any suggestions are appreciated. Let me know if anything is unclear and I can try to explain it better.

illustreets · April 8, 2024, 1:24pm

How sensitive is your system to one or more gateway servers going down? That is, do you need to react immediately to a disconnection?

[Edit] Asking this since there are methods that use polling.

zayco · April 8, 2024, 1:34pm

Thanks, not necessarily immediately but I would like to use it in a remote troubleshooting senario and I would prefere to know about it before the customer contacts me (but I guess I would handle that server-side anyway). As a first step it would be sufficient to query the status from the client

illustreets · April 8, 2024, 1:51pm

In the past I had good results with something inspired by socialize:server-presence, which goes like this:

// THE PING PART

// unique ID per server per restart
export const serverId = Random.id();

// collection to hold all instances
const instances = new Mongo.Collection('GatewayInstances');
(async () => {
  await Instances.createIndexAsync({ ping: 1 });
  await Instances.createIndexAsync({ serverId: 1 });
})();

// keep track of which servers are online
const serverPing = () => {
  // there may be other data which you'd like to monitor, so you may choose to add 
  // it here (alternatively, query it elsewhere live based on what servers are alive)
  Instances.upsertAsync({ serverId }, { $set: { ping: new Date() } })
    .catch(console.error);
};

// each server must ping regularly
Meteor.setInterval(serverPing, 1000 * 60);
Meteor.startup(serverPing);

// THE ACTION PART
// remove old servers and their sessions, alert about servers being down,
// do something on the servers that are still live, etc.

const checkInstancesAtInterval = async () => {
  const cutoff = new Date();
  cutoff.setMinutes(new Date().getMinutes() - 2);

  const instancesToRemove = await Instances.find({ ping: { $lt: cutoff } }).fetchAsync();
  const removePromises = instancesToRemove.map((srv) => {
    const removeInstance = Instances.removeAsync({ _id: srv._id });
    const doSomethingElse = someOtherAsyncJobBasedOnThisInstance();
    return Promise.all([removeInstance, doSomethingElse]);
  });

  await Promise.all(removePromises);

  // do something else with your live instances here
};

Meteor.setInterval(() => {
  checkInstancesAtInterval().catch(console.error);
}, 1000 * 90); // every 90 seconds

Meteor.startup(() => {
  checkInstancesAtInterval().catch(console.error);
});

Of course, adjust the polling window to suit your needs, but if the number of instances is low, and you use indexing, polling MongoDB even every 10 seconds should be fine for if this is a mission critical service.

Note that here we use a new ID for the server when it comes online, which may, or may not work for you, especially since you envisage needing to identifying the actual servers that went offline. But you should be able to replace the random ID with a string passed at runtime through environment variables or similar.

This approach is quite resilient, and makes you depend only on MongoDB being always up, which, for obvious reasons, should always be the case.

[Addendum 1] A more robust approach would be to use setTimeout instead of setInterval. This would also remove the need to execute the code once at startup.

[Addendum 2] It depends on your tolerance for duplicate execution, but if you run more than one Meteor worker on the main server, I suggest either spacing the polling timeouts perhaps by using prime numbers, or use MongoDB for deduplication through what I call unique constraint with TTL (Time-To-Live) indexing. The latter is off-topic, but in short it means attempting to insert a uniquely hashed document of function name and arguments into a collection with TTL for auto-expiration before executing the function.

[Addendum 3] I realised I haven’t answered this question:

Simply subscribe to a publication that returns a cursor on the Instances collection.

zayco · April 8, 2024, 8:36pm

I think that pattern is viable for my use case, will play around with it and report back. Thank you for taking the time to give such a detaild answer.