Evanta’s 2022 CIO Leadership Perspectives Study found that data and analytics, meeting digital business priorities, and implementing application strategies, architecture and integration platforms were the top three priorities for CIOs after cybersecurity strategies.
Why enterprises are demanding higher quality data from their suppliers
Before they can generate useful business intelligence, organisations often need to aggregate and cleanse data in different formats from various sources, then integrate and enrich it to deliver insights that are based on high quality data.
As a Microsoft Gold Partner (soon to be Solution Partner), we are experts in designing and implementing integration and data platforms. We are comfortable with building data lakes, automating data pipelines and taking on high-level work that delivers business transformation. We’ve designed multiple solutions that allow large enterprises to pull data from different systems into central cloud-based data platforms.
As a prime example, we were recently asked to design an interesting data engineering solution for the aviation sector. Our customer runs a small fleet of aircraft on behalf of their customer, along with other subcontract operators. They have been asked to provide flight recording data from their fleet of forty aircraft so it can be pooled with data from other organisations. This will provide their customer with integrated data from several suppliers so that it can run analytics on the entire fleet of aircraft.
Why data management processes need to be updated to meet enterprise demand
The aircraft operator currently uses its flight data for running its engineering and maintenance schedule on its fleet of turboprop aircraft and light business jets. It has realised the value of this flight data to its own business and its customers and wants to create its own data lake which has the capacity to store tens of Terabytes of flight data over the next ten years.
Every day, employees download flight data to laptops and tablets that are specific to those aircraft. This involves a multi-step process, where ground crew connect to the aircraft, copy the data to their laptop/tablet, then upload to an on-premises computer. From there, the data can be forwarded on to where it is needed.
To meet its customer’s requirements, the aircraft operator needs to transfer data from its flight recorders into the customer’s Microsoft Azure data lake, where it can be processed. The larger organisation also wants the flight recorder data to be enriched with external metadata including the aircraft identification numbers and pilot data.
Why data quality depends on automation
When they approached us, the aircraft operator was manually combining the flight recorder data with aircraft identification by manually typing in tail numbers later on in the day. This relied on employees naming the file correctly and putting that day’s flight recorder data in the right file. The company’s IT team measured a 4 – 15 % error rate in this process, which made it difficult for companies further up the supply chain to rely on the flight recorder data for analytics. Data that is not properly tagged and categorised is meaningless!
In addition to the reduction in data quality, there was a risk of data loss as the flight recorder data was being dragged and dropped onto an on-premises server. There was also a risk of data loss when the flight data manager data subsequently transferred data to a computer for analysis and reporting.
It was clear that this process needed to be automated to ensure that high quality data was delivered to the enterprise customer to enable analytics to be performed right across its aircraft fleet, with data pooled from two other aircraft operators.
How we designed the solution
345’s software engineers are adept at tackling this type of master data management challenge and proposed the following design to provide the aircraft operator with an automated data storage system to provide the end customer with reliable, integrated cloud-based data that is enriched with metadata from multiple sources.
While this automated data storage could be achieved with any cloud platform, we proposed that the aircraft operator uses Microsoft Azure because it has a very strong set of services to support automation both for data integration and application integration. In this case we proposed the use of Azure Data Factory (ADF) because it is designed for moving and integrating hybrid data at enterprise scale. Data Factory looks after all the difficult aspects of data storage, integration and processing and includes more than 90 built-in connectors so you can read and write to/from almost any type of data source. Using Data Factory, once we have configured the data pipelines you can just leave it to carry out those processes. ADF is a managed service, so you don’t have to worry about which computer is running a particular process. It’s powerful and accessible, with a very low management overhead.
Data Factory also supports credential management using Azure Key Vault, so all those connection strings and passwords are held securely and protected against leaks. We routinely configure Azure Monitor to provide logging and alerting so there is a high level of operational support.
Finally, we use Azure Policy so that system administrators can set and enforce governance rules, so that operators can’t do anything that risks breaking the system or making it less secure.
The self-hosted integration runtime component within Azure Data Factory allows the aircraft operator to install a widget on its on-premises system so that it can talk back to the cloud and Data Factory can synchronise files between cloud and on-premises file shares. This meant that existing processes could be continued without interruption, in parallel to Cloud processing.
This level of automation ensures that data processes are carried out to a consistent standard, removes the risk of human error, and provides data quality assurance to the end customer.
Why Data Factory works for every sector
While this solution was designed for the aviation supply chain, it works equally well in other sectors that need to ingest, cleanse, integrate and enrich large volumes of data from an array of sources. As examples, we have used Azure Data Factory for an Education Trust, which is pulling in data from multiple schools to allow the Trust to perform analytics on a range of factors including classroom behaviour, discipline, exam results and to correlate this with analysis of catchment areas and socio-economic factors that could impact pupils’ educational outcomes.
Another customer operates in the retail sector and is successfully using a similar Azure Data Factory architecture to aggregate data from points of sale, automatically store this data in the cloud, and enrich it with customer metadata such as loyalty card use and other information that allows procurement managers to accurately draw insights and plan stock levels based on customer behaviour, weather forecasts and regional trends.
These two organisations share the same data ingestion, integration and data enrichment challenges despite being in completely different industries.
The business benefits of cloud-based data management
A major benefit of implementing an automated cloud-based data management solution is that it embeds data processes into the organisation so that nothing is dependent on a particular person or siloed on a particular computer. This allows organisations to retain ownership of their data and provides the required level of operational control and quality assurance.
Implementing data management methods and structures that are suited to advanced analytics allows organisations to realise the potential of their data both for their own operational insights and as an additional customer service and potential revenue stream further down the line