Immediate context: ‘the largest IT outage in history’ the global IT outage that happened on 18th and 19th July 2024, due to issues arising from a cybersecurity software used by large enterprises across the world (CrowdStrike), and the Windows operating system. This led to the disruption to normal daily activities across the world: cancelled and grounded flights, disrupted medical processes and more.
The incident has rightly been described as “showing the fragility of the digitally connected world”.
Take the aviation sector, for instance. Aviation could be considered as one of the most safety-critical engineering/technology domain. The mechanical (and electronic and electrical) engineering involved in aviation is grounded in mature safety principles like redundancy, fail-safes, fail-overs, bypass, fault tolerance, fault isolation, and such. The internet-involved software-based computer technology (“the digital world”) is still a long way away from such levels of safety/security maturity.
The CrowdStrike+Windows incident juxtaposes the maturity of aviation engineering and the immaturity of digital engineering.
What’s the takeaway?
Firstly, it’s to be acknowledged that there are fundamental weaknesses in the “digital ecosystem”. It’s not a mature field like civil, mechanical, or electrical engineering.
It’s time to stop treating the digital revolution or digital transformation like a gold rush, and to instead invest in the maturing and careful understanding of the technology and systems involved.