Caching

In 2025, I did a series of videos that went through five design principles for production-ready AI agents. In my episode about guardrails, I made the statement, “If you can get away with not calling a model, do that.” In the episode about agent tools, I made an argument saying to build idempotent tools and store responses temporarily in case of retries or duplicate requests. I didn’t know it at the time, but I essentially was making the same argument just with different words.

Your agent is repeating itself
Ai Your agent is repeating itself

So much of agent traffic is repetitive. By adding in caching layers, you can both improve your user experience and decrease your bills in one fell swoop.

read more

About the Author

I’m a software engineer and writer focused on building practical cloud and AI systems with community.

Know more

Join the Ready, Set, Cloud Picks of the Week

Weekly writing and curated picks on cloud-native systems and practical AI. Browse past issues to see if it’s for you.
Browse past issues.

Join the Ready, Set, Cloud Picks of the Week

Thank you for subscribing! Check your inbox to confirm.
View past issues.  |  Read the latest posts.