Cold Boot – The Dirty Secret of Serverless

cold boot

Ok, now that I have your attention, let’s talk about how to mitigate cold boots and unlock the full potential of serverless. Within this article, I will be discussing AWS lambda, but the same concepts apply to Azure and Google Cloud. Full disclosure, I am a fan of serverless. It offers many benefits including cost and speed of development. But as with any technology, there are tradeoffs and this article is intended to highlight the cold boot pain point and how to mitigate it.

What is a Cold Boot?

Let’s break down a lambda execution into 4 steps:

  1. Starts a container that will be used to execute the code
  2. Downloads your code
  3. Initializes your application
  4. Executes the target function in your application

When your lambda is invoked and no (warm) container is available, AWS must first perform the first 3 steps to provision a container. Note that AWS is free to tear down that container whenever they want and you cannot guarantee when it will be torn down, but it is inevitable that it will be. Note that concurrent executions of your lambda may force multiple cold boots – followed by a period of more available warm containers. The duration of the cold boot depends on a few things, including the size of the application and the programming language, although recent benchmarking shows that the language doesn’t play as significant a role as it used to, with most languages taking less than 0.5 seconds to startup.

We will discuss how to mitigate cold boots later. It’s a best practice to design for cold boots so your system is resilient, especially for lambdas being used by API Gateway where the delay is very visible to end users.

Extra Cost for VPC Cold Boot

If your lambda runs within a VPC, there are additional cold boot costs related to the creation of Elastic Network Interfaces (ENI). Each ENI can take 1-3 seconds to establish, which can feel like an eternity in the lambda world which is measured in 100ms increments. AWS released an architectural improvement in 2019 that effectively eliminated the ENI cost for the lambda itself:
https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/

That was a huge win, but you still pay the cost to create an ENI to each private resource – such as a database – within the VPC. For RDS specifically, you can mitigate this by leveraging the AWS RDS Proxy service. However, we will discuss a more comprehensive solution below that will support any resource type.

Enter Provisioned Concurrency

AWS released Provisioned Concurrency (PC) in 2019 to much fanfare. This offers the perfect balance of keeping your hands off the infrastructure while also mitigating the performance costs for not being able to control the infrastructure. With PC, you pay to ensure that warm containers are always available. The billing is based on the GB-seconds.

With PC, you tell AWS to provision enough containers to guarantee that no cold boots will be incurred as long as you don’t exceed the specified concurrency setting. For example, if you configure PC=5, then you are guaranteed that your lambda can be executed concurrently 5 times without incurring a cold boot. AWS might still tear down the warm containers but they must create warm containers before tearing down the old ones. However, if you get a 6th concurrent execution and no warm container is available, then a cold boot will occur as usual.

Reserved Concurrency

Reserved Concurrency (RC) can be used in conjunction with PC or not at all, as desired. RC is a way to both limit the number of concurrent executions and also to ensure that your lambda can grow to that concurrency level if needed. Without RC, your lambda runs in unreserved mode, which means it competes with all other lambdas in the account (default limit is 1000) and can consume that entire unreserved pool if needed.

If using RC, then you must set RC to at least PC. Note that if you set the Reserved Concurrency (RC) = PC, then you will guarantee no cold boots but at the cost of throttling executions. That is, callers will wait for a warm container instead of incurring a cold boot. Generally speaking, I wouldn’t recommend doing this, but this is possible if desired.

Note that PC does not bill during the initialization step, but you are limited to 10 seconds. After that, the initialization will be paused and it will resume on the first execution within that container. If you have a lot of connections to establish, try to parallelize them to ensure that they can all complete within the 10 sec limit.

Initialize During Cold Boot

This last bit is very important. It’s important to incur those ENI costs during your initialization step to maximize the benefit of Provisioned Concurrency. This means that network resources such as database connections must be initialized outside of the function executions.

Here’s what NOT to do:

import …

my_function(event, ctx, callback-fn) {
    conn = establish_db_connection() // DO NOT DO THIS
    do_query(conn, …)
}

You can see here that we’re not establishing the database connection until my_function is executed. This means that the first execution for that container will incur the 1-3 sec cost for that ENI to be created. However, if we move that outside the function then the connection will be established during initialization:

import …

conn = establish_db_connection()

my_function(event, ctx, callback-fn) {
    do_query(conn, …)
}

Keep in mind that you only have 10 seconds for the whole initialization phase to complete. If you have multiple connections to establish, you should consider doing them in parallel.

Conclusion

The serverless architecture is very powerful, but it’s still important to understand the underlying architecture to know how to configure your application for optimal behavior. In this article, we discussed cold boots and how to mitigate them with Provisioned Concurrency and database initialization during the cold boot. This will have your lambdas in peak performance from the very first call.

References