Zero Downtime Deployment

Maintaining service availability during upgrades is crucial for businesses reliant on continuous operations. This discussion explores the concept of zero downtime deployments—a critical strategy for ensuring that systems remain operational even during the transition to newer software versions. We will delve into various deployment strategies, including blue-green and canary deployments, that enable seamless updates without disrupting user experience. By addressing both the theoretical and practical aspects of these approaches, we aim to provide a comprehensive understanding of how to implement these techniques effectively.

I’m going to introduce a key concept in high availability, which is zero downtime deployments. So, the problem, Steven, really is that if you have a system that needs to be highly available, you need to ensure it’s available even when you’re upgrading it to a new version. As a simple example, let’s say you have an API. You have the old one, version one, and you have a new one that you’re going to bring live, which is version two. There are a number of strategies here, but given that it might be exposed at an IP address, you can use deployment slots in Azure App Service, or you can use blue-green deployments. In other words, you bring the other version up, flip over, process the traffic on the new one, and then, only when you’re satisfied with it, you retire the old one. You can also have canary deployments where you bring the new system up and direct a percentage of your traffic to it; for instance, you might route 10% of your traffic through the new one and 90% through the old one until you’re satisfied, and then you can migrate the workloads onto it.

And then, finally, you can take the old one down. So, we’ve got some strategies for that. Now, if your API is stateless—in other words, every interaction, end-to-end, just happens and then it’s done—then it’s quite easy, relatively. You’ve got methods of switching over. You want to make sure your new one is pre-warmed. In other words, it’s ready to take that load, but then you can switch over to it. And really, these are just different ways of switching over. And then you’ve got an easy way to fail back because, if that’s not quite right, you just switch it back to the old one. Now, when things are stateful—and again, because it’s three, four, five technology—there’s our focus on integration. Often, you might be calling an API, but that might involve a long-running process that has a lot of state. It has to remember where it is. This makes the whole thing a lot more difficult. Because if you’re switching over, can you only switch over when all the version ones are finished and then you can only start version twos? Or do you start version two, meanwhile, all the version ones are still working in parallel, and then how do you know which one is which?

There’s a whole load of issues there. We’ve got strategies to unpack that, but these are just the problems you encounter. If you’ve got long-running stuff, is it as easy to fail back? Can you go back to the old process? Have you already made changes in the way your data is structured? And one of the key things to look at here is, we used to talk a lot about orchestration, where you’d have a process that’s doing loads of things. Now we try to talk about choreography, which is where a long-running process, you break it down into a number of really quick steps that actually together make that process so that you can replace and upgrade each step, but without having to affect the whole in quite the same way. So, there are a lot more things to think about in zero downtime deployments when you’ve got state and long-running processes rather than stateless fast processes. We’ve got strategies for that. So if this is what’s worrying you, come talk to us.

Chat to us about your Integration journey

Get in touch

Share this post