In this series of blog posts I will be discussing aspects of Enterprise Application Configuration and how we have come across and resolved issues in real-life mission critical systems.
When I talk about Enterprise Applications, I generally refer to applications that are (a) mission critical and (b) distributed across multiple machines. Sometimes (c) geographically dispersed across different data centres also applies.
Some examples of this we have come across are:
- Changing a configuration section by pasting in an XML snippet, only to discover later that the XML was invalid and the configuration could not be read. This type of error can kill an entire application in no time.
- Changing a configuration section, with incorrect configuration settings, but the changes only getting picked up when a machine rebooted due to Windows Updates 3 weeks later. This made the diagnosis difficult because the “what’s changed recently” investigation pointed initially to more recent but irrelevant updates.
- Deployments making partial changes to a system’s configuration, but operators not being able to accurately find which updates have succeeded and which failed, leaving the system in an inconsistent state.
In all of these cases the lack of any kind of version history that operators can readily call upon made the resolution of the incident harder.
Another relevant aspect is the manner in which configuration is updated. Sometimes configuration is deployed via application deployment in an automated fashion, with tokenised templates populated with the correct values in each environment. This is great, and is definitely the way to go for static configuration. Even so, there are still many cases where configuration is updated manually by operators (don’t get me started) even if it’s “temporary”. This type of change is much harder to track because these changes are often made informally and/or in response to other issues.
On top of this there are configuration changes such as modifications to IIS or BizTalk that are made via an administration console. These are not subject to the same risk as direct manual changes to configuration sections, but they still have the capacity to bring down enterprise systems and leave little or no trace.
At 345, when we developed cloco (Cloud Configuration), our Enterprise Application Configuration product, we recognised that version history was an essential part of the toolkit. The freeware version of cloco will ship with version history disabled, but the paid-for version of the software already supports version history logging. Normally, you won’t even know it’s there – it’s just when things go wrong that you know that our configuration platform has your back and saves previous versions of your configuration.
Also, when we created cloco the components we wanted to embrace the fact that configuration updates were a feature of everyday system maintenance. We wanted our tooling to support and enhance good practices, not paper over the cracks. This is why we supply deployment tooling to enable application deployments to work in harmony with our configuration store, and application deployments form part of the tracked version history just as ad-hoc version updates do.
I still wonder how people manage to support distributed applications without a clear way of seeing the changes that have been made to configuration in the lifetime of the application.