Over the past couple of weeks, I’ve done some analysis on how you can start moving toward Step Functions as part of your standard development processes. You have the option to go “storage first” with completely asynchronous workflows or go completely synchronous with express state machines.
There are use cases for both, but the consensus for production development lives with a hybrid approach: performing a base set of actions synchronously, like validations and id creation and kicking off the rest of the processing asynchronously. You’d then use a WebSocket to inform the user when the workflow is complete.
Developers are curious in nature. I had a slew of people respond to my posts on LinkedIn and Twitter asking what the cost difference was between Lambda and Step Functions. After all, I was making a pitch to ditch Lambda completely. Seems like a justifiable question.
In order to figure out the impact on the wallet, we need to determine how each of these services are billed. AWS has a number of ways to charge you with their serverless services, but are great about transparency and the whole no hidden costs thing.
We are going to be comparing three different approaches today: Lambda, express workflows, and standard workflows with Step Functions. Below is a table of how each service bills for usage.
|$0.20 per 1M
|$1.00 per 1M
|$.016 per 1K
|$.016 per 1K
|$0.25 per 1K
Prices are based on the US East 1 region
You can tune the amount of memory Lambda is allowed to consume, so your
consumed GB/sec can vary based on what is configured. With Step Functions, the amount of memory it consumes can be calculated with the following formula:
50MB + state machine definition size + execution data size x Number of Parallel or Map Steps
Take the number calculated above rounded to the nearest 64MB and divide it by the average duration of the workflow to get the consumed GB/second for express workflows.
As you can see, express workflows are billed similarly to Lambdas. But standard workflows are completely different. Their billing is solely on number of state transitions.
To get an idea of how Lambda compares to express workflows, we must get a gauge on performance.
The alternative to building out an express workflow is to have a giant Lambda that performs the same operations. You could alternatively use Lambda destinations, but those are asynchronous and do not directly apply to the comparison we are making.
For a benchmark, I will be using the API key workflow from my post on express state machines.
API key creation workflow
To test, each endpoint is run 1000 times. The results are averaged to get the mean execution time. Once we get the average, we can perform our cost analysis.
|Lambda (AWS SDK v2)
|Lambda (AWS SDK v3)
Results from performance tests
As you can see, our winner across the board is the express workflow with Step Functions! With Lambda, the slowest times can be attributed to cold starts. However, you are not billed for cold start time, it’s only the execution time of your function.
A nice benefit of going the Step Function route is that you don’t have cold starts (unless you use Lambda as states). If you use the direct SDK integrations, your state machine will run at blazing speeds.
Again, these Lambdas were tuned for optimal performance, so they were designed to run as fast as possible. When comparing them to Step Functions, performance is almost negligible.
The math for the direct cost comparison for the express workflow per 1 million invocations is:
1.00 + ( 64 consumed memory rounded to nearest 64MB/ 1024MB ) x 1M invocations x 1.8s avg duration (rounded to next 100ms) x .00001667 = $2.87
For Lambda, we can take the faster execution time of the AWS SDK v3 and calculate billed cost per 1 million invocations like this:
.2 + (88MB consumed memory average/1024MB) x 1M invocations x 1.881s avg duration x .00001667 = $2.89
Compiling the results, we have some surprising figures:
|Lambda (AWS SDK v2)
|Lambda (AWS SDK v3)
|Cost per million invocations
Oddly enough, the AWS SDK v2 function consumed the lowest memory, which drove down the cost per million invocations. Either way though, the costs per million executions only vary by a few cents. Very interesting!
If you factor in standard workflows, you can slice the cost a different way. If you have an asynchronous process running in a Lambda function, you must either guarantee the processing will be done in 15 minutes or you kick off other Lambdas to continue processing (effectively building your own state machine via destinations).
Alternatives to this approach would be to provision an EC2 instance and pay for the cost of uptime of the machine or to create a standard workflow with Step Functions and pay per transition.
Let’s take an image processing job for example that runs an image through Textract to get any text and drop the results in S3, then runs a Rekognition job to identify objects contained in the image, and finally saves the consolidated results to DynamoDB.
With Lambda, this would be rather expensive. Assume we could complete the job and finalize any processing in about 2 minutes. If we provisioned 1024MB of memory and on average it used 256MB of processing power (these are large images), we’re looking at the following formula per 1 million invocations:
.20 + (1 million x 120 seconds) x (256 MB/1024 MB) x .0000166667 = $500
Let’s assume the state machine that performs the same action takes 15 states to complete. It kicks off the jobs and performs a status check on a loop until they are completed. The formula to calculate cost of 1 million invocations is:
1 million x 15 x .000025 = $375
In this case, it would be a bit cheaper per month to run the workload as a standard state machine.
When talking straight cost, it appears to be negligible as far as choosing Step Functions vs Lambda. But when you factor in total cost of ownership, the story changes a bit. As discussed in a previous post about refactoring serverless applications, the literal amount something costs is just one factor when calculating the cost to the business.
If it costs $1000/month to run a piece of code in the cloud but takes 20 hours of developer time to fix a bug because of poor maintainability, the cost to the business goes way up.
On the flip side, if it costs $2000/month to run the same code in the cloud but only takes 4 hours of developer time to fix a bug because of easy maintainability, the overall cost to the business is lower.
Cost is so much more than dollars and cents.
There is no “one size fits all” solution. What works for one team might not work for others. Take this into consideration as you decide which way to take your serverless adventure.
By switching to a configuration over code model with Step Functions, you put the responsibility of code execution on the cloud vendor. The fewer responsibilities you have as a software company, the higher return on investment you will receive.
For my use case, Step Functions is the better fit for both synchronous and asynchronous executions.
This will not always be true, as some state machines can get massive and drive up the costs significantly more than some well designed Lambda functions. Find a balance. It’s worth spending the time doing cost estimation before diving into your next major project. Would you benefit from having your complex workflows visualized in a step function diagram? Or does your team work better walking through code?
Step Functions will be a game changing service in the future (heck, it already is!). As adoption rates increase and features continue to grow, it will continue to get better and better and leave traditional “Lambda development” in the dust.
You get better traceability, lower responsibility, and a super cool designer through Step Functions. In some scenarios, it appears to be a lower cost service than Lambda as well.
I encourage you to give it a try if you haven’t already, you might like what you see.