Server Crash When Calling Too Many Twilio API Calls

evolross · October 19, 2018, 1:31am

I recently did a project using Twilio that sent SMS text notifications to about 1000 users as fast as possible. I sent the messages in a loop to each user and used a Twilio Messaging Service with about 30 toll-free numbers that can each send three SMS notifications per second. Twilio recommends using multiple numbers for 1) speed - you can distribute the notifications over all the numbers and 2) sending the same notification to a lot of devices on the same number can cause the carriers to flag it as spam and deny future notifications. Using more numbers reduces the likelihood of getting flagged as spam.

The simplified code looks like this:

_.each(users, function(user, index) {
   try {
      TwilioClient.messages
         .create({
            body: notificationBody,
            messagingServiceSid: process.env.TWILIO_MSG_SERVICE_SID,
            to: user.phoneNumber
         })
         .then(function(message) {
            console.log("Successfully sent SMS notification " + (index + 1) + " -", message.sid);
         })
         .done();
   }
   catch(error) {
      console.log("There was an error trying to send an SMS text message. ", error.reason));
   }
});

It worked well in testing on a few users, but in production with hundreds of users I kept having server crashes with the error message Too Many Requests. After some thought and trials, I figured maybe it was creating too many requests too quickly, so I added…

Meteor._sleepForMs(250);

…just above the try-catch and this fixed the issue and allowed the server to send out all the messages, but slower. I’m assuming this gave the server time to get responses from Twilio before piling on more requests. Without the the sleep call, I’m pretty sure it was loading up all 1000+ requests almost instantly which crashed the server.

My question and reason for posting is to ask if what I’m assuming above correct? Do you have to add some time for the requests to third-party APIs to return before firing new ones? I’m kind of surprised because it didn’t look like my server was running out of memory. Besides, 1000 3rd party API requests can’t take up that much memory? Is this more of a Node limitation? A config var that can be tweaked? Is adding a sleep like this the correct approach? Seems kind of like a low-level band-aid instead of doing something more correct or best-practice.

The goal was to get the notifications out as quickly as possible… all 1000 in under one minute if possible. With my sleep call of 250ms, it raised the time up to 4.16 minutes to get all 1000 sent. This was still acceptable, but under one minute would have been nice. I didn’t do any tests to see if I could have went with a shorter sleep time.

Also Twilio recommends using a loop like this. They don’t have an API function exposed that could send 1000 of the same message to a giant array of numbers all in one shot. You have to loop one-by-one. I wrote Twilio support about this and they confirmed.

coagmano · October 19, 2018, 2:20am

My first thoughts are that that looks like a promise chain, which won’t throw a regular error, rendering the try/catch block useless. You should use a catch block instead to handle errors.

In newish versions of Node, unhandled promise rejections will crash the parent process, which matches the behaviour that you’re seeing.

The weird part is the use of a done call…
So I went digging and found that Twilio uses Q as it’s promise library. Q recommend using the done() call to terminate promise chains, and any promise rejections along the way will throw an error in a new turn of the event loop (which bypasses any try/catch blocks as well).

promise.done(onFulfilled, onRejected, onProgress)

This method should be used to terminate chains of promises that will not be passed elsewhere. Since exceptions thrown in then callbacks are consumed and transformed into rejections, exceptions at the end of the chain are easy to accidentally, silently ignore. By arranging for the exception to be thrown in a future turn of the event loop, so that it won’t be caught, it causes an onerror event on the browser window, or an uncaughtException event on Node.js’s process object.

So my solution for you is to use a catch block for the error case like so:

_.each(users, function(user, index) {
    TwilioClient.messages
       .create({
          body: notificationBody,
          messagingServiceSid: process.env.TWILIO_MSG_SERVICE_SID,
          to: user.phoneNumber
       })
       .then(function(message) {
          console.log("Successfully sent SMS notification " + (index + 1) + " -", message.sid);
       })
       .catch(error => {
            console.log("There was an error trying to send an SMS text message. ", error.reason);
       })
       .done();
});

You will probably want to add better error handling which pauses / throttles the queue and re-schedules the failed request to try it again later.

serkandurusoy · October 19, 2018, 7:54am

This is not about promises or unhandled rejections, although I agree that the mix of try/catch and then/catch/done makes it harder to identify the real problem, whixh is api rate limiting.

TOO MANY REQUESTS is a typical api error response and with twilio it means the same, you have made too many concurrent requests in a short amount of time.

In a typical api call scenario, there are ways like splitting and batching as well as some extra response headers from from some apis that make this easy.

In twilio’s case, they provide you a set of utilities/services that you can use for your various messaging volumes. They are called messaging and notify services which are explained clearly in a somewhat recent blog post from twilio.

coagmano · October 21, 2018, 11:45pm

Yes the cause of the error is definitely because of rate limiting, and the solution needs to address this.

However, the fact that a simple TOO MANY REQUESTS error is crashing the entire process (especially inside a try/catch) is worth addressing as well.
Especially when you want to use the presence of an error as a signal to scale back the speed of requests to an endpoint

evolross · October 22, 2018, 6:36pm

Thanks for the responses.

Yeah, I added the try/catch as a first attempt at handling/figuring out the error. And it didn’t help at all - but I left it in there. I’m not the savviest with promises. Thanks for the tip.

Yeah, I read about the Messaging service in the Twilio docs, and used one above with multiple numbers. I had actually thought about Twilio rate limiting and found this help article on Twilio Rate Limiting which has no mention of the Notify service - so that’s a bit of a bummer. But it does mention a 100 call limit to the API - so I should have caught that. I actually thought sending the same/similar notification in a big loop one-thousand times might be a little suspect and I emailed their support to confirm this is the way to do this and they confirmed I indeed needed to use a Messaging Service with multiple numbers and some certain settings and a massive loop. Maybe they didn’t understand what I was trying to do.

I noticed in the above blog post their using a Promise.all call… would that also stop the server from crashing if I didn’t use a .catch? Just curious.

coagmano · October 22, 2018, 11:59pm

As long as there is a catch somewhere down the promise chain from where an error can be generated, it won’t crash.

Because Promise.all also returns a promise, you can whack a .catch on it to prevent crashing

serkandurusoy · October 23, 2018, 9:52am

@evolross take a look at the docs for notify for the details.

I understand your frustration with the rate limit docs not pointing in that direction, but twilio is a large bundle of services and sometimes some of those services feel like they overlap in doing similar things.

In this case, notify indeed is the best tool because you are sending bulk messages. The other ones are more suited for customized messages.

Regarding async/promises, the following cheatsheet can help you as a quick reference:

One thing to note about Promise.all is that it will fail with the first error you receive, making it hard for you to track actual message sending progress.

In any case, catching the error response will save you from a server crash (usually in the form of an unhandled promise rejection) but it will not help you fulfill the requirement to send those messages.

If you are intent on using the messaging api and not the notify api, you will have to catch all the errors, track the successful ones, and then retry the failed ones at a later time.

Another approach to coordinating async api calls in generals would be to use a queue where you:

create a separate “job” per each message
create a “job runner” that processes each job individually and retries them based on timing/error rules

You can take a look at https://github.com/vsivsi/meteor-job-collection for meteor-friendly solution but note that this is no longer maintained. That being said, it will help you understand the ideas behind jobs and schedules and workers.

You can then maybe invest some time in exploring alternatives that fellow meteorites have used.