Versio.io

CrowdStrike BSOD

How Versio.io customers solve the IT outage caused by CrowdStrike more efficiently and promptly

Free trial In a nutshell NIS2 🇩🇪
In this blog post you will learn:
  • An overview of the global IT outage caused by CrowdStrike on 19 July 2024
  • What is needed to detect CrownStrike with Versio.io
  • How Versio.io customers identify the affected CrowdStrike servers
  • Frequently asked questions about the CrowdStrike BSOD problem from Versio.io customers

Global IT outage due to CrowdStrike

CrowdStrike - How Versio.io customers resolve IT outages more efficiently Security provider CrowdStrike released an update to its popular platform on 19 July 2024 that ultimately caused an issue that led to the failure of many Windows-based machines, resulting in a BSOD (Blue Screen of Death). The global impact has affected almost every major industry, leading to bank branch closures, flight cancellations, retail point-of-sale machine failures and, unfortunately, much more. Many organisations are struggling to determine the extent of the problem and what dependencies exist on the affected machines.
 

What is needed to recognise CrownStrike with Versio.io?

CrowdStrike - How Versio.io customers resolve IT outages more efficiently The Versio.io OneImporter Agent with the modules `OS & hardware` and `Process (OS)` must be installed on the servers. This carries out a fully automated inventory at infrastructure and application level. Furthermore, the OneImporter ensures that all changes are continuously recorded. This means that you always have all the information about your servers available in Versio.io in almost real time. We also call it the ‘digital twin’, as it means your entire IT landscape is available to you in a central repository.
 

How Versio.io customers identify the affected CrowdStrike servers

With the Versio.io inventory platform, you can quickly understand what is running in your environment at infrastructure and application level. Using the automatically recognised Versio.io topologies of servers, processes, application services and web applications in data centres and multicloud environments, you can quickly identify the servers affected by CrownStrike.
 

Automatically record CrowdStrike usage in the IT landscape

Automatically record CrowdStrike usage in the IT landscape
X

Automatically record CrowdStrike usage in the IT landscape

The OneImporter agent of the Versio.io platform is able to record all executed processes on a server. This includes the CrowdStrike Falcon agent named ‘CSFalconService.exe’, which caused the IT outage.

The fully automated inventory ensures that Versio.io customers have accurate data on the use of the CrowdStrike Falcon Agent. In addition to the process characteristics, Versio.io automatically recognises the product and its version numbers.

In addition, the OneImporter can use the ‘File Importer’ module to inventory the problem-causing file ‘C:\Windows\System32\drivers\CrowdStrike\C-00000291*.sys’.

 

Determine servers on which the CrowdStrike Falcon Agent is running via topology context

Determine servers on which the CrowdStrike Falcon Agent is running via topology context
X

Determine servers on which the CrowdStrike Falcon Agent is running via topology context

The Versio.io platform is able to automatically recognise the relationships between recorded configuration items. The topology illustration shows that the CrowdStrike Falcon Agent was started by a process called ‘Services.exe’, which in turn was started by ‘Wininit.exe’. ‘Wininit.exe’ is the most important process of a Windows operating system and therefore has a direct relationship to the server instance “evp-node-1�.

On this topological basis, it is now transparent for each CrowdStrike Falcon agent on which server it is executed.

 

Identification of all servers that use the CrowdStrike Falcon Agent

Identification of all servers that use the CrowdStrike Falcon Agent
X

Identification of all servers that use the CrowdStrike Falcon Agent

Based on the recorded process data and the topology, all processes can now be filtered by ‘CSFalconService.exe’ in Versio.io Reporting and the executing host can be displayed.

This means that Versio.io customers now have access to basic information about the scope and the servers that use the CrowdStrike Falcon Agent. In the same way, it would be possible to report on which servers the file ‘C-00000291*.sys’ is available.

 

Determine the runtime behaviour of the servers affected by CrowdStrike using OneImporter Heartbeat

Determine the runtime behaviour of the servers affected by CrowdStrike using OneImporter Heartbeat
X

Determine the runtime behaviour of the servers affected by CrowdStrike using OneImporter Heartbeat

Each of our customers' servers is provisioned with a Versio.io OneImporter. This sends heartbeats to the server at regular intervals. The heartbeat is a message that indicates to the Versio.io server that the OneImporter is functional. The heartbeat status can be used in the OneImporter Dashboard to recognise which OneImporters running on Windows operating systems are no longer working correctly. Due to the high stability of the OneImporter, it can be assumed that these Windows systems are part of the CrowdStrike problem during the period of the global IT outage.

 

Questions & answers

Frequently asked questions about the CrowdStrike BSOD problem by Versio.io customers.
 

Were the Versio.io services affected by the outage?

The services of the Versio.io platform were not affected by the CrowdStrike issue, as the platform only runs on Linux-based computers. All OneImporter and OneGates agents may be affected if they are running on Windows systems. In the OneImporter and OneGate Dashbaord, however, you can easily recognise when the agents are no longer functional by the heartbeat.

What was the cause of the failure?

CrowdStrike released an update for Windows PCs that contained a defect.Affected servers were forced into a boot loop that prevented them from switching on. The boot sequence is the first time a server is switched on, during which the operating systems, applications and services running on the server are first brought online.

Why was the outage so severe?

If an affected server is stuck in a boot loop, it cannot establish communication or services, i.e. it does not respond to requests or commands. It is as if the server is switched off. In order to restore the services, the rectification must be carried out individually and manually. The remediation process can also be complex and time-consuming for each server and may involve a ‘rollback’ to an earlier point in time from backups. In total, an estimated 8.5 million Windows devices are affected.

Is there a schedule for restoring the services

As remediation is manual and time consuming, service recovery depends on which servers are involved in the most critical applications and which servers are prioritised over less critical services. This can take hours or days for many organisations. Versio.io customers can speed up this process by quickly finding affected hosts and prioritising the most critical first based on protection needs.

How does Versio.io help our customers who are affected by the outage?

This problem needs to be fixed manually, but Versio.io recognises which servers and which services are affected. With this information, we simplify the process for our customers to create plans and restore servers and services associated with their most critical (high protection) applications.

Are many Versio.io customers affected by the outage?

Yes, because this outage was unavoidable after CrowdStrike released the buggy update. Many of the world's largest and most important companies use CrowdStrike for endpoint protection. Fortunately, Versio.io helps our customers quickly identify and prioritise affected servers so they can quickly restore services to their most critical business functions. By knowing exactly which offline servers are connected to specific critical business services and the exact dependency relationships, IT teams can quickly create manual remediation plans to efficiently restore business-critical functions. Versio.io customers are very familiar with this process, as they use it when zero-day runtime vulnerabilities such as log4j are discovered that pose an immediate threat to large parts of their environment. In these cases of vulnerabilities, Versio.io helps customers to immediately identify and prioritise the affected code.


Authors | July 19, 2024


Fabian Klose
Fabian Klose
Head of Software Development
P:  +49-30-221986-51
LinkedIn
Contact person
Matthias Scholze
Chief Technology Officer
P:  +49-30-221986-51
LinkedIn


Keywords

CrowdStrike

 

Falcon sensor

 

BSOD

 

Blue Screen of Death

 

Outage

 

MTTR

 

Mean Time To Recover

 

We use cookies to ensure that we give you the best experience on our website. Read privacy policies for more information.