What was the problem?
A major UK financial institution had very successfully grown its business to the point where the number of customer transactions handled by its automated business process risked overloading the IT infrastructure.
The financial institution has strict performance SLAs which demand that throughput and response times comply with industry standards. The IT systems are complex and each transaction calls on up to 10 different systems.
Known for our performance engineering expertise, 345 Technology was asked to undertake an overload test, which involved incrementally loading the IT system, in a scientifically-controlled manner, to discover bottlenecks and stress points, so that system failure can be avoided through additional technology investment and capacity planning.
The tests had to be designed in a way that they placed enough load on the system to show some level of stress.
The performance test had to be able to simulate realistic load patterns so that failure points could be accurately identified.
The test that 345 Technology designed had to be repeatable, so that the same results would be seen when the load test was carried out against the same system and configuration.
345 Technology’s performance engineering experts first established a performance benchmark, which involved designing a series of tests that could be run against different system configurations.
We then designed a series of progressive overload tests which introduced gradual, carefully controlled increases in load until the system reached breaking point. This allowed us to understand which points in the system were most likely to fail under unexpected, or excessive loads.
Using our experience of developing high-performance systems, we were able to devise a series of realistic tests that modelled different load scenarios. For this client, we tested a single business process and gradually increased the rate of requests handled by the system.
Following our performance engineering principles, we tested for failure points in the central processing unit (CPU), system memory, network, and disk input/output. We were looking for indications of failure under load, along with other common software engineering issues including resource contention which impacts the performance of multiple applications that are competing for the same resource; co-ordination; the impact of concurrency; and capacity.
Using our test data, we were then able to diagnose which parts of the system would benefit from tuning and recommend system configuration changes that would optimise operation.
Importantly, we followed scientific principles when undertaking each test, changing one thing at a time and then running through the performance benchmark and progressive overload tests to evaluate the resulting performance.
Large financial software systems are complex and it is possible to change and tune hundreds of settings. By following a methodical performance engineering approach, we were able to isolate and examine the benefits of each and every change. This approach also ensured that the tests and the results were repeatable.
Finally, 345 Technology compiled all of the test results into a report for the client. The report specified the exact technical changes that the financial institution’s technical teams needed to implement in future releases in order to achieve the same performance gains that we discovered during our performance engineering tests.
By sharing our method and results in the report, we were also able to increase the pool of software engineering knowledge within the client’s technical team.
What was the result?
By following a methodical, repeatable testing process that identified precisely which points of its system were liable to fail under specific types of load, the client was able to implement the necessary changes and system tuning. As a result, the financial institution was able to more than double throughput handled by its automated, complex business process, from 60 transactions per second to 140 transactions per second, representing a 133% increase in throughput and exceeding industry SLAs. In addition, the recommended hardware changes, which we identified during our performance testing, are expected to achieve in excess of 200 transactions per second, providing the client with a roadmap to comfortably accommodate future business growth.
The final stage was to bring the learning together and write up the results so that they had maximum value for the Client:
- Technical changes – specify the exact technical changes that the Client’s technical teams needed to implement in future releases to achieve the performance gains discovered under test.
- Knowledge capture – write up the method and the results so that 345’s work contributed to the overall pool of knowledge within the Client’s technical team.