Here are some quick thoughts about data and everything around it. These are my opinions only and might not reflect what others think in this industry. I hope there’s something interesting in here!
From aeronautical engineering to data design
I started out studying aeronautical engineering, thinking I was going to do a grown-up job and be an engineer.
Well, turned out that most of it was just computational mathematics and numerical calculus. And when I got into the real world of being an actual aeronautical engineer, it didn’t take long for the penny to drop and I knew this wasn’t the right gig for me.
I like it when things move quickly and stuff gets done.
I can’t stand it when there’s a lag between doing something and seeing the result, and yet that’s what happens in most aerospace programs!
There was one time when I was assigned to work on a missile project and the design was done in the 1970s but the project wasn’t going into service until in the 1990s. Twenty plus years! I could do the maths, the calculations, the writing of the programs for aerospace systems, the testing of the gearbox housings. But the long timeframes – that I couldn’t do. So, I had to get out.
I found my way onto being involved in a manufacturing project, and all of a sudden I was able to sketch out a plan for a complete database design. That felt like proper, rewarding work. It was fun, it was fast and I didn’t have to wait two decades before seeing something I’d done actually out in the world – winning. And it was my cue for learning more about data design, coding, machine learning and AI.
What I wish I‘d known sooner about data
The big thing was probably not understanding data technologies well enough or knowing how databases work. You end up playing it safe and using only the tools you know, because that’s easy and no one’s going to criticise you if you make a mess of things (not that I probably would have, but you never know).
This is probably natural to some extent – you don’t know what you don’t know – but I would have done better work sooner if I’d understood what tools were around that could have stopped me using a hammer when what I really needed was a screwdriver.
Sorting the hammers from the screwdrivers has meant knowing where I can use some battle-hardened existing tech to do my heavy lifting. There’s no point wasting my time in coding custom solutions to a tech problem when someone else has done the grunt work for me.
Turning everything into “let’s build a custom solution” takes ages and costs too much.
No client would agree to that if they knew any better, and I’m not interested in billing for time I don’t need to spend on a project.
One really cool thing about data
I love understanding and improving system performance. Traditionally, this topic is constrained to the 4 parameters of computing performance, so that’s:
- read/write (or input/output)
In the old way of doing things, to improve you have to do better in one or more of those categories. Better performance meant buying bigger and better tech. It meant spending money on more processors, more memory, faster disks, etc. But there’s only so much of “more, more, more” you can do before you hit limits.
There are other ways of improving performance than just chasing bigger and better.
A solution I find elegant and wonderful is “sharding”, which we see in NoSQL databases such as Cassandra and MongoDB.
They separate the way data is stored and processed through the use of the right keys, allowing small segments (“shard”) of the data to be distributed across multiple machines.
It means that each computer on such a distributed system can be quite low powered, with the result being that problems are solved in parallel instead of serially. You bin the “single supercomputer” model and get in lots of cheaper tech that can do the job just as well if not better – and it’s more resilient, scalable and all that good stuff.
Where I wish I could do better with data
In an interview, the politically correct answer would be that I prefer not to focus on the minutiae of day-to-day data handling. In practice, this means that I avoid being bored by repetitive tasks – and that involves writing code to stop those tasks sucking up any more of my time than they need to. It’s not that I’m not detail oriented at the right times, but I do prefer the big picture data topics.
The numerical calculus background has prepped me well for working with data, but I’m also aware that I’m still relatively early in my journey of learning how AI works. This is a massive area that’s going to form a huge part of our lives, and we’re way off mastering it. I’m not even sure we have a clue of what’s possible yet.
What worries me about big data
My big concern about big data is what used to be the lone refuge of the foil-hat brigade: that if used wrongly, our personal information could become a vehicle for manipulating us. Well, guess what: it’s already happening, sadly.
The big tech firms have so much data on us now that they’re able to use it to decide what is fake news, and that has major implications for the state of public discourse about politics and journalism. I’m sure that getting a proper handle on what big tech is allowed to do will be a major political issue throughout the 2020s.
And don’t forget health data. If that’s treated wrongly, it could be used to justify large increases in insurance for some groups of people. It doesn’t seem fair to me to penalise individuals because of their genetics (which, last time I checked, they had no way of changing).
Despite these worries, I’m actually optimistic or at least somewhat hopeful. Predictive data used well has the power to improve people’s health. That’s the kind of work I want to help our customers with.
Imagine the burden we could lift from healthcare services by using data to predict avoidable or treatable conditions before they turn into symptoms in a person. That sort of work could save lives and allow more of our resources to be used on cracking the really hard healthcare problems.
What’s the next big thing in data?
Our smartphones already gather huge amounts of data about our activity. PropTech is doing the same. We can measure everything: temperature, humidity, pollution and all sorts of other data.
This is the tip of the iceberg.
I predict that we should look out for the “instrumentation of everything” – that’s where we can measure so many things that we have the potential to optimise just about every aspect of life.
This has good and bad sides (one person’s optimisation might be another person’s unfair tax), but I do think that responsible use of data is a force for positive change, and that’s what I’m most interested in doing more work on in future.
I could go on for ever about data and the development of machine learning and AI to do good in the world – but I expect you’ll thank me for stopping now, if you’ve got this far! Or if you’re like me and enjoy a really good discussion on the topic, please do drop me a line. We have regular guests on our 345 Technology podcasts – maybe you could be our next one?