Serverless Video Rendering

Stink Studios
5 min readJul 25, 2019

Making an economic case for on-demand, cloud-based solutions.

Arpad Ray, Executive Technical Director

At Stink Studios, one of our most popular products is RITA, our cloud-based video rendering platform.

RITA stands for ‘render in the air,’ and was designed from the outset to live in the cloud. We’ve used it on dozens of projects running on Amazon Web Services (AWS), Google Cloud Platform, and Alibaba Cloud, where it can automatically scale itself between one server and hundreds of servers depending on customer demand.

In computing terms, on-demand video rendering is a relatively expensive operation. You typically want to use the fastest computers with the most processing cores so that videos can be generated and served to users as quickly as possible. This translates directly into financial expense. Faster “servers” (even cloud-based ones) cost proportionately more per hour than slower servers. For example, one server that would be appropriate for one of our larger projects costs around $500 per month on AWS. We also usually run at least one spare server so there’s extra capacity available in the case of a sudden spike in demand. But even that spare server usually needs a minute or two to get started once its called into duty. Therefore, there’s always been a certain minimum size / budget of projects for which on-demand video rendering is practical.

That’s where “serverless” computing comes in. Like the name suggests, “serverless” bypasses the need to have servers in the conventional sense. Of course, we’ll still need a server to actually do the video rendering work when required; it’s just not sitting there idling for the rest of the month. In serverless computing, the servers only start when the request arrives and might be immediately shut down again afterwards. Naturally, this wouldn’t be very useful if it still took a minute or two to start up, but the other important aspect of a serverless system is that it can start up in a second or two. We’re only charged by the cloud provider for the time (measured in milliseconds!) during which the server is actually doing our work.

The cost benefit of such a computing model is immediately obvious. If we only need a few thousand video renders per month, and each render takes one second to complete, we might be talking about one hour total billable time, rather than the whole month in the conventional model.

You might wonder why the conventional model continues to exist at all if serverless is so much more economical. One reason is that if you keep a “serverless” server running continuously for a whole month, it works out to be significantly more expensive than renting a comparable server for the month in the conventional model. This is entirely understandable — there would be no incentive for the cloud providers to provide this service if it always meant less income for them. Therefore, there’s a break-even point at a certain level of sustained usage where it becomes more economical to use the conventional model instead.

There are some constraints to starting a server in a second or two. One is that each server is very limited in computing capacity. The server I mentioned above, costing about $500 per month, has 16 vCPUs — a measure of processing power — and 32 GB of memory. On the other hand, the highest performance option in Lambda, AWS’ serverless service, has 1.6 vCPUs and 3GB of memory.

Another constraint is on the amount of code and assets that you can put on the server. For Lambda, you’re allowed to provide a zip file up to 50MB. Google’s equivalent service Cloud Functions allows up to 100MB.

These constraints easily accommodate many workloads. We’ve been keen users of the serverless model for many tasks including serving websites, running Slack bots, and resizing uploaded images.

For video rendering, these constraints are more challenging. 50MB might only be enough for one second of one layer of 1080p video assets. Even in the case of a purely synthetic video with no assets, the level of processing power is an issue. We’re usually aiming to render (and encode) videos about 20–30x realtime. In other words, if we’re creating a 30 second long video, we should be able to return it to the user in about one second. With such limited processing power, we might only be able to render two seconds of video in that time.

The solution to both of these constraints is concurrency — using many servers at the same time. Rather than a single expensive server using 1.5GB of assets, we can use 50 servers at the same time, each with 30MB of assets. Naturally, this is another factor in cost. In the above scenario where we’re doing one hour’s worth of rendering per month, now we’re using 50 hours. It still works out a lot cheaper in this case, but it might tip the scales towards a conventional model in another case.

I’m writing about this now because we’ve recently launched our first public facing project which uses RITA on a completely serverless infrastructure. uses RITA to generate a dozen small videos at the same time. Although of course we hope that people like it, it’s unlikely to get anywhere near the scale of traffic which would warrant running conventional servers continuously, and by using serverless the running cost will be pretty negligible.

I should also mention we’ve recently provided RITA serverless solutions for a couple of prominent tech companies. In both cases the running cost wasn’t the primary factor, it was the fact that serverless systems are pretty much maintenance-free. In a conventional system there’s an operational overhead of monitoring servers, responding to hardware failures, running software updates, etc. In a serverless system, this is all taken care of by the cloud provider. For the first time we can hand over a production-ready RITA system to a completely non-technical team.

Interested in using RITA, or migrating your own infrastructure to serverless platforms? Feel free to get in touch.