Apologies if this post gets a little too detailed…
Our predictive model is was initially a fairly simple online linear regression, using video format, input resolution, output resolution and duration to predict transcode times on lambda, and from that whether lambda can even process a video within the time limits. At the time, lambda had a 5 minute execution limit. This is now 15 minutes, as such the biggest factor of determining whether a video CAN be processed by lambda is size, there is a hard 500MB file system limit for lambda functions. While you can get up to 3GB of RAM, I’ve yet to dedicate the time to find out if this can be used instead (you can’t use RAM disks).
So, our new model is a little more complicated, it tries to predict whether we SHOULD transcode a video with lambda (assuming we can). or whether we should let a real server (spot instance) handle the job. This is based on many other factors, such as the priority of the client requesting the transcode, the preferences of the client, the number of servers we have running (and when their current jobs will end) as well as how many other jobs are in the queue. This piece is still a bit of a work in progress, but in general, we’re trading off cost for throughput/latency on a per client basis.
In regards to the spot instances, yes - we literally request and spin up new spot instances as required - so if we decide a job needs to be done by a server, and we don’t have one available (and predict it wont be available within a reasonable amount of time) we request a new one. If we see a client is uploading a new clip that CAN be transcoded by lambda, but we think they may be uploading more clips in the very near future (our workload is very spiky), we may choose to transcode this clip with a new spot instance, even though lambda could do the work.
If we request a spot instance, and don’t get it (not super easy to determine) after a certain point we request an on demand instance of the same type (easier to detect if we got it). If we don’t we start requesting on demand instances of a different type until we do. This ensures that all jobs get completed in a reasonable time, even if there are no spot instances available.
We’ve considered switching from spot instance requests to changing the required size of a spot fleet, this would allow us to not specify a concrete instance type + AZ, but a pool of instance types, across many AZs, the upside being if a C5d.9xl (are “sweet” spot) isn’t available, but an 18XL is, or an m5.4xl is, etc - one of those can process the job instead. Unfortunately, there is no way to specify a priority here (e.g., give me a c5d.9xl if you can, otherwise…).
I’d say that we use this fairly complex system because of where we are with system load, it’s large enough that AWS’s elastic transcoder, and elemental media are really expensive for us (> $3000 per month, compared to in the region of $150 per month for our setup). But not large enough that we can just purchase some reserved instances for a year (at a discount) and let them process jobs as they come in. I’d recommend people to stick with elemental media unless the cost will be prohibitive.