Architecture suggestions for a Meteor app?

drone1 · October 14, 2019, 8:19pm

Hi all.

I’m building something similar to SoundCloud – users can upload uncompressed media, and a server somewhere will process the audio into streamable m3u/ts files, upload to S3 to make it available for user consumption.

Some requirements for the media processing servers:

Scale based on demand
Need sufficient disk space & throughput to handle swaths of large uncompressed files and read/write to S3

The big thing I want to avoid is diverging codebases. I don’t want to write a separate Express app, although maybe this is fine? Ideally I’d share code with my Meteor app and the master server can talk to the media servers via DDP and get all the other benefits of Meteor.

My app is running on Galaxy. My current thinking is that I will need to deploy a cluster of EC2 servers with this FFMPEG config and deploy with MUP. Does this seem like the obvious route, or does someone know of another option that might be be preferable? Does it seem reasonable to run Meteor processes for lightweight job servers? Have people done this successfully?

Thanks in advance for any discussion!

znewsham · October 14, 2019, 9:05pm

We’ve had great success using AWS lambda functions for our transcoding jobs. About 90% of our jobs are handled by lamba, the other 10% by ec2 spot instances (75% cheaper than on demand). If you have unpredictable and variable load, this works nicely, if you have predictable load, ec2 spot instances are much cheaper per second than lambda, if you can get enough to process your load spikes. And if you can utilise the servers enough to make up for the 2-3 minutes of spinup time you pay for, but can’t use.

It’s also worth using the slightly more expensive c5d AWS instances (vs m4 or even c5) the inclusion of SSD instance storage makes transcoding cheaper (limited EBS required) and faster.

We use slingshot to upload the files direct to s3 then a simple predictive model to determine whether to run a job on lambda or spinup a spot instance.

drone1 · October 14, 2019, 11:36pm

Thanks!

How do you deal with your meteor server talking to AWS? Amazon API Gateway?

Are you actually running meteor on these servers or are they something else?

znewsham · October 14, 2019, 11:47pm

We run our meteor servers on ec2. So no problem talking. You won’t need API gateway, just talk to the relevant API endpoints direct. Doesn’t matter where your server resides. Lambda and ec2 spot instances run raw node scripts to download the video from s3. Then spawn ffmpeg to transcode. The node script tracks progress and when complete, uploads the result to s3 and deletes the original.

waldgeist · October 15, 2019, 4:23am

Would you mind to explain in a bit more detail how this works, especially spinning up a spot instance? Do you just mean you distribute the job to one of these, or do you really launch one if needed?

And thanks @drone1 for raising this question. We need something similar for a social platform.

znewsham · October 15, 2019, 12:47pm

Apologies if this post gets a little too detailed…

Our predictive model is was initially a fairly simple online linear regression, using video format, input resolution, output resolution and duration to predict transcode times on lambda, and from that whether lambda can even process a video within the time limits. At the time, lambda had a 5 minute execution limit. This is now 15 minutes, as such the biggest factor of determining whether a video CAN be processed by lambda is size, there is a hard 500MB file system limit for lambda functions. While you can get up to 3GB of RAM, I’ve yet to dedicate the time to find out if this can be used instead (you can’t use RAM disks).

So, our new model is a little more complicated, it tries to predict whether we SHOULD transcode a video with lambda (assuming we can). or whether we should let a real server (spot instance) handle the job. This is based on many other factors, such as the priority of the client requesting the transcode, the preferences of the client, the number of servers we have running (and when their current jobs will end) as well as how many other jobs are in the queue. This piece is still a bit of a work in progress, but in general, we’re trading off cost for throughput/latency on a per client basis.

In regards to the spot instances, yes - we literally request and spin up new spot instances as required - so if we decide a job needs to be done by a server, and we don’t have one available (and predict it wont be available within a reasonable amount of time) we request a new one. If we see a client is uploading a new clip that CAN be transcoded by lambda, but we think they may be uploading more clips in the very near future (our workload is very spiky), we may choose to transcode this clip with a new spot instance, even though lambda could do the work.

If we request a spot instance, and don’t get it (not super easy to determine) after a certain point we request an on demand instance of the same type (easier to detect if we got it). If we don’t we start requesting on demand instances of a different type until we do. This ensures that all jobs get completed in a reasonable time, even if there are no spot instances available.

We’ve considered switching from spot instance requests to changing the required size of a spot fleet, this would allow us to not specify a concrete instance type + AZ, but a pool of instance types, across many AZs, the upside being if a C5d.9xl (are “sweet” spot) isn’t available, but an 18XL is, or an m5.4xl is, etc - one of those can process the job instead. Unfortunately, there is no way to specify a priority here (e.g., give me a c5d.9xl if you can, otherwise…).

I’d say that we use this fairly complex system because of where we are with system load, it’s large enough that AWS’s elastic transcoder, and elemental media are really expensive for us (> $3000 per month, compared to in the region of $150 per month for our setup). But not large enough that we can just purchase some reserved instances for a year (at a discount) and let them process jobs as they come in. I’d recommend people to stick with elemental media unless the cost will be prohibitive.

drone1 · October 15, 2019, 1:15pm

Are you running a specific “preset” to run meteor with ffmpeg on Lambda/EC2?

Any other tips on MUP deployment to save time would be appreciated if any come to mind.

Thanks for the detailed info btw!

znewsham · October 15, 2019, 1:32pm

No, we run custom arguments to generate an M3U8 with 360 and 720p resolutions (two separate jobs). MUP is an entirely different beast! We use https://bitbucket.org/znewsham/mup-aws-elb/src to work with load balancers, its a little rough around the edges still (we had fairly specific requirements) I’d love to dedicate some more time to make it a really “one click” solution, but it’s nice once you get things setup.

drone1 · October 15, 2019, 2:15pm

Any thoughts on
https://waveshosting.com/ ?

znewsham · October 15, 2019, 2:38pm

I’ve never used it. Can’t find any details on their per-server cost, so I’m not the best person to answer this

waldgeist · October 15, 2019, 5:56pm

Thanks a lot for this comprehensive answer, highly appreciated!

maxhodges · October 17, 2019, 2:47am

You could also build on Google Cloud Platform using their Microservices, Cloud Storage, and pub/sub queues. Spotify migrated to Google Cloud Platform. They’ve written a lot about their architecture if you search around for it. like here:

Mattias Petter Johansson, creator of Fun Fun Function, a YouTube show about programming, used to work at Spotify. He might be able to share some specifics if you reach out to him.

arggh · October 17, 2019, 5:57am

I believe we bypassed this limit with streams from and to S3.

drone1 · October 17, 2019, 9:20am

Interesting. Would you elaborate on this please?

Can you stream virtual disk I/O to CLI tools like ffmpeg?

arggh · October 17, 2019, 11:56am

This aws lambda function seems to do pretty much what you want, using streams.

znewsham · October 17, 2019, 12:01pm

Interesting. How do you stream from s3 to ffmpeg then stream an m3u8 file plus the video chunks? This would be amazing we’d be able to process virtually all our video with lambda functions!

arggh · October 17, 2019, 12:06pm

Just take a look at the source code I linked

znewsham · October 17, 2019, 12:10pm

Unfortunately not it creates the files on lambda, so the storage limit is still there.

marceloprado · October 17, 2019, 4:39pm

I’am using Waves for our production application. So far, so good.
Waves is basically a AWS Elastic Beanstalk wrapper, with a much better interface and UX for creating and managing the environments.

They are not a hosting provider, just a wrapper. What that means is that you create a IAM account for Waves and let they provision and manage some Beanstalk resources.

I’ve been super happy with it. As a plus side, it also has a really good free tier (I believe they charge based on the number of projects and collaborators you add).

The down side is that it works only with Elastic Beanstalk infrastructure. That means AWS only.

As a previous user of MUP and MUP Beanstalk Plugin, I think Waves is more robust for a mission critical application.

drone1 · November 3, 2019, 12:51am

This is a slight digression of the main topic, but I’m curious if people have thoughts on using a Meteor app on EC2 (via Waves, for example) specifically for job processing, etc. So one of my main goals is to share code between my main app and my ‘worker’ app, but I’m wondering if this sounds completely silly to people – if a Meteor app could not possibly be ‘lightweight’ enough.

Essentially I’d like my client app (running on Galaxy or EC2 via Waves, for example) to be able to talk to a cluster of EC2 nodes running a Meteor app and processing jobs (in my case, running shell commands to process media using ffmpeg and imagemagick). Am I sounding crazy or…?

Great discussion by the way. Thanks for all the thoughtful replies.