On the morning of Friday, 19 July 2024, the world woke up after a well-deserved night's rest, ready to face a new day; however, a rather tricky problem awaited the awakening of those who should have dealt with it. Crowdstrike, a well-known US company specialising in computer security, had distributed a configuration update for its Falcon Sensor software, for Windows PCs and servers, at around 04:10 in the morning (UTC). They only realised later that there was an error within the update: a change to a configuration file had caused an out-of-bounds read in the Windows sensor client, thus causing the dreaded ‘blue screen of death’; the affected devices were indirectly forced to either bootloop or reboot into recovery mode.
An out-of-bounds memory read, mentioned above, is a computer anomaly in which a programme, when analysing the data provided, exceeds the boundaries of the area in which the data is stored, and attempts to ‘read’ adjacent data - it is a peculiar case of a security breach. For a software like Falcon, being a platform aimed at protecting users' systems from potential threats, and whose ultimate goal is to minimise cyber security risks, such an incident seems almost ironic.
Realising the situation, Crowdstrike reverted the update at around 05:30 am (UTC). All devices started up after the reset were unaffected by the incident, while those already afflicted by the situation had to be fixed manually, rebooting them while connected to the network, so that they could download the correct new update. In the event of failure, certain files had to be deleted, rebooting afterwards.
Most personal devices reported no problems, as did computers using macOS and Linux as operating systems - however, Linux experienced a similar situation, albeit on a much smaller scale, during the month of April 2024. Falcon Sensor, to be precise, tends to be used by large companies and corporations, and they were exactly those who suffered what is now referred to as ‘the biggest IT failure in history’ - the system outage caused massive disruption on a global scale. More than 10,000 flights were cancelled and some means of transport were even forced to stop operating, Sky News and other television stations were unable to broadcast information as they had no access to the network, several emergency centres were disconnected, hospitals and clinics experienced problems with their appointment management systems, and a wide variety of banking services were suspended. As simple and straightforward as the recovery process mentioned above appears to be, restoring each individual device manually forced large companies, corporations, and service providers to wait until the recovery was complete before they could resume work, thus increasing general dissatisfaction, especially with Crowdstrike.
According to a study carried out by Microsoft in the days following the incident, the impact of this affected 8.5 million devices, and, in the US alone, about 700 of the largest national corporate enterprises - following an estimate by insurance experts, the whole situation is thought to cost them around USD 5.4 billion.
The greatest concern during the outage was the cause itself: it was thought to be some kind of computer breach, a perfectly executed hacking operation aimed at sowing panic and chaos, an attack perpetrated by malicious insiders. Once the motivation behind the incident was discovered, the general apprehension subsided, but a slight layer of fear remained, supported by evidence of several hacking attempts in the days following the incident. According to a Crowdstrike blog post, several phishing emails and fake phone calls were reported by the company's customers, misleading communications in which the perpetrators pretended to be part of the company's support team. As a result of the incident, the CISA (Cybersecurity and Infrastructure Security Agency) urged both individuals and companies to exercise a higher level of caution with regard to online communications.
From a broader perspective, the incident played a key role in understanding some rather complex issues, especially in the global IT sphere. What happened raised questions about oligopoly and centralisation in the IT sector, shedding light on the fragility of the Internet infrastructure.