What Is "Production-Grade" Software?

One of my favorite things to do is build “proof of concepts”. They are quick little apps that show a business problem can be solved in a certain way. They’re quick and dirty and throw caution to the wind. One thing we all know about POC’s is that you should never use them in production.

But why is that? Why do we think that a proof of concept isn’t “production-grade?” What does production-grade even mean?

Production-grade vs Production-ready

Before we get into the details of production-grade software, we need to understand the difference between it and production-ready software. When we talk about software being production-ready, we imply a bunch of operational considerations. These are generally non-technical in nature and support the success of the product. Things that make software production-ready are:

Having an on-call program implemented
Deploying changes through a CI pipeline
Having a shared understanding of your deployment strategy (continuous deployment, regular cadence, etc…)
Knowing who to route issues to and maintaining a defect SLA

There are other components to being production-ready, and AWS does a wonderful job working through them in their Operational Readiness Reviews. Before you go live, be sure to work through the review to make sure you’re covered.

When talking about production-grade software, we refer to technical implementation details. These details describe characteristics of the software itself and while they aren’t required to push your app to production, they are critical in determining the success of it.

Characteristics of Production-grade Software

Let’s compare production-grade software to POC’s with an analogy of a bridge. Imagine you were going on a hike in the woods and you stumbled upon a creek. You built a little bridge to get over it and it worked… for you.

You were able to continue on your way as long as it was just you and nobody else. This is your proof of concept. It works for the happy path - meaning it worked for you this one time and probably would have collapsed on your friend. But rarely does a bridge only serve its builder. It’s meant for many people, it needs to last, and it must be safe to boot.

So what are some of these characteristics that would take our proof of concept bridge to a full-scale work of engineered art?

Stability and Robustness

Production-grade software must be able to handle not only the primary use cases but the edge cases as well. It needs to have input schema validation, compensating actions, dead letter queues, and effective error routing (among other things). A proof of concept generally satisfies the happy path. It’s just us. We need to handle edge cases before we make it to production.

Our bridge is the same. It needs to not only support our weight but the weight of others as well. It needs to be effective in the rain and snow, and must stand strong whether someone runs, jumps, or walks across it. It’s unrealistic to expect everyone will always weigh the same amount we do and walk with the same speed and gait as us in nice, sunny weather. So we have to be ready for anything.

Performance

Always a hot topic in software development, production-grade software must be fast. It can’t keep users waiting, and certainly shouldn’t buckle under pressure when you have lots of concurrent users. It needs to maintain a consistent performance whether there’s one or one million users in the system. Architectural aspects like serverless back-ends and asynchronous workflows make a world of difference when maintaining high-quality performance in production.

With our bridge example, we want more than one person to use it at any given time. If we put up a sign that says “one at a time”, eventually we’ll get a line so long people try to find some other route. Or worse, people would try to exceed capacity and break it.

Security

Software needs to be secure, no questions about it. When building a proof of concept, it’s easy to bypass auth mechanisms and make your API open and “undiscoverable” by using random strings in your base url. But that’s not production-grade. You need to put proper authentication and authorization on your app so data is secure and only visible to those who are supposed to see it. In addition, it’s on you as the software vendor to make sure your customers don’t shoot themselves in the foot. Don’t give them easy ways to open doors in your app. Require some form of login and block common privilege escalation patterns to keep your users and data secure.

When building bridges, this would equate to putting up handrails. Not everyone is as coordinated as you, and they might need some extra help balancing across the bridge as they cross.

Maintainability

This is both a technical and non-technical characteristic. The ability to maintain your code over time is probably the most important thing for determining your long-term success. If you throw together something fast but have no idea how it works a month down the road, it will be next to impossible to fix defects or add something new without the risk of breaking existing functionality. Write clear, concise code that is easy to read and understand what is going on. Implement linting rules for added consistency. Your goal as a software developer is to work yourself out of a job, and you won’t be able to do that if your code is not maintainable by anyone other than you.

Of course with our bridge, pieces will need to be replaced eventually. If you threw something together to see it “just work” without maintenance in mind, it might be impossible to replace a part without taking the entire thing down. It’s 2023, we’re slowly moving past the whole scheduled downtime thing. You have to keep the bridge functional as you do repairs and improvements. Not only that, if you randomly threw things together in your build, how would you know what to replace? Someone doing maintenance could accidentally remove a load-bearing support because it was used incorrectly.

Observability

How do you know when your application is performing as expected? How do you know when it’s not? One word - observability. Production-grade software is fully instrumented with tools that enable structured logging, workflow tracing, and alarming when things go south. Tracking data as it progresses through your workflows is vital to understanding any issues that arise in your app. You must know how data goes from point A to point B both for troubleshooting and maintainability purposes. Chances are high you will not discover problems on your own. They will either be reported to you by a customer or raised to you from your observability tools.

For our bridge, we need to regularly inspect the integrity of the materials and watch for signs of wear. Being proactive and finding issues before they are problems is critical to prevent disaster.

Summary

When building software destined for production, there’s more than you might expect. Based on your compliance requirements, programming language, and skill level of your team, implementation details can vary wildly. There’s not much in the way of prescriptive guidance on implementation details.

That said, the concepts will ring true no matter what you build. Be sure to keep security in the front of your mind while instrumenting observability tools along the way. Avoid “cleverness” in your code. Write things so they are simple to understand, even if that means adding another few lines of code.

While the happy path through your software might be the majority use case, be sure to keep edge cases in mind. Throw in some fuzzy tests to see how your application handles invalid inputs. Toss in some chaos to guarantee your application can bounce back from unpredicted errors. How quickly can you discover and subsequently fix them? Can you prevent data loss when errors occur?

It might not be as fun as building a proof of concept, but production-grade development is rewarding. You start thinking about problems a little differently and become a more well-rounded developer. Plus, you get the solace of knowing when you do it right, your chances of getting that panicked “all hands on deck” call at 2am go way down.

Happy coding!