There is plenty of blame to go around in the CrowdStrike fiasco. For one, the almost complete lack of accountability and consequences for everyone involved.
Nobody working at Microsoft will have any negative consequenses from this - system works as designed, CrowdStrike messed up. Nobody in leadership positions at any of the companies affected by the outage will have their bonuses threatened either, because “force majeure” and we’re just poor victims here. Sure, some mid-level manager at CrowdStrike won’t get their annual bonus, and that will be that.
Fact is that decades of skimping on infrastructure maintenance and investments, the regular layoffs to “optimise team sizes”, outsourcing to further cut cost (and switching outsourcing partners every 3-5 years) is in financial terms “good for the stock price”. This is tied to leadership compensation and shareholder value. All this takes precedence over customer value and leaves us in situations like this. Instead, the true cost is distributed across every affected customer, and in turn, their end-customers.
Add to that the generally low focus on Governance, Risk and Compliance (GRC) - management practices designed to identify and reduce risk, stay compliant with not only regulation but internal policies (like Security) and ensuring sufficient capacity to operate the IT estate at secure levels. (The only part of Compliance that is actually controlled is in industries where there are regular external audits with the risk of penalities for non-compliance cases. E.g. FDA, Sarbanes-Oxley, GDPR etc.)
One major factor that allows companies to operate in this way is how the EULAs and ToS “agreements” are written. All software companies have given themselves a blanket immunity against any and all negative events that might arise by using their products. They are not liable for anything and can’t be sued. Taking the stance that “our software business is more important than any business our customers use said software to conduct” is nicely aligning with capitalist theory, but it does nothing for securing vital infrastructure and the needs of society. Events like this will continue to happen. Regulation is probably the only way to reduce the risk of this, but I don’t see that in our immediate future.
That said, software like CrowdStrike, ZScaler, antivirus-tools and the like are being deployed as protections against cyber attacks and malware. Our platforms are super-fragile. Reasonably, the OS should be able to operate securely without multiple 3rd-party kernel extensions. Apparently, this is not the case.
A user with the handle “zmmmmm” shared the following which I belive is a great additional angle:
"So CrowdStrike is deployed as third party software into the critical path of mission critical systems and then left to update itself. It’s easy to blame CrowdStrike but that seems too easy on both the orgs that do this but also the upstream forces that compel them to do it.
My org which does mission critical healthcare just deployed ZScaler on every computer which is now in the critical path of every computer starting up and then in the critical path of every network connection the computer makes. The risk of ZScaler being a central point of failure is not considered. But - the risk of failing the compliance checkbox it satisfies is paramount.
All over the place I’m seeing checkbox compliance being prioritised above actual real risks from how the compliance is implemented. Orgs are doing this because they are more scared of failing an audit than they are of the consequences failure of the underlying systems the audits are supposed to be protecting. So we need to hold regulatory bodies accountable as well - when they frame regulation such that organisations are cornered into this they get to be part of the culpability here too."
