Our Vulnerability: CrowdStrike Outage, Infrastructure Fragility and Subscription Software

airwhale · July 20, 2024, 8:47am

There is plenty of blame to go around in the CrowdStrike fiasco. For one, the almost complete lack of accountability and consequences for everyone involved.

Nobody working at Microsoft will have any negative consequenses from this - system works as designed, CrowdStrike messed up. Nobody in leadership positions at any of the companies affected by the outage will have their bonuses threatened either, because “force majeure” and we’re just poor victims here. Sure, some mid-level manager at CrowdStrike won’t get their annual bonus, and that will be that.

Fact is that decades of skimping on infrastructure maintenance and investments, the regular layoffs to “optimise team sizes”, outsourcing to further cut cost (and switching outsourcing partners every 3-5 years) is in financial terms “good for the stock price”. This is tied to leadership compensation and shareholder value. All this takes precedence over customer value and leaves us in situations like this. Instead, the true cost is distributed across every affected customer, and in turn, their end-customers.

Add to that the generally low focus on Governance, Risk and Compliance (GRC) - management practices designed to identify and reduce risk, stay compliant with not only regulation but internal policies (like Security) and ensuring sufficient capacity to operate the IT estate at secure levels. (The only part of Compliance that is actually controlled is in industries where there are regular external audits with the risk of penalities for non-compliance cases. E.g. FDA, Sarbanes-Oxley, GDPR etc.)

One major factor that allows companies to operate in this way is how the EULAs and ToS “agreements” are written. All software companies have given themselves a blanket immunity against any and all negative events that might arise by using their products. They are not liable for anything and can’t be sued. Taking the stance that “our software business is more important than any business our customers use said software to conduct” is nicely aligning with capitalist theory, but it does nothing for securing vital infrastructure and the needs of society. Events like this will continue to happen. Regulation is probably the only way to reduce the risk of this, but I don’t see that in our immediate future.

That said, software like CrowdStrike, ZScaler, antivirus-tools and the like are being deployed as protections against cyber attacks and malware. Our platforms are super-fragile. Reasonably, the OS should be able to operate securely without multiple 3rd-party kernel extensions. Apparently, this is not the case.

A user with the handle “zmmmmm” shared the following which I belive is a great additional angle:

"So CrowdStrike is deployed as third party software into the critical path of mission critical systems and then left to update itself. It’s easy to blame CrowdStrike but that seems too easy on both the orgs that do this but also the upstream forces that compel them to do it.

My org which does mission critical healthcare just deployed ZScaler on every computer which is now in the critical path of every computer starting up and then in the critical path of every network connection the computer makes. The risk of ZScaler being a central point of failure is not considered. But - the risk of failing the compliance checkbox it satisfies is paramount.

All over the place I’m seeing checkbox compliance being prioritised above actual real risks from how the compliance is implemented. Orgs are doing this because they are more scared of failing an audit than they are of the consequences failure of the underlying systems the audits are supposed to be protecting. So we need to hold regulatory bodies accountable as well - when they frame regulation such that organisations are cornered into this they get to be part of the culpability here too."

Bmosbacker · July 20, 2024, 10:16am

I’m not sure I want to read it. Ignorance is bliss.

Nick · July 20, 2024, 10:44am

I saw this - it made me laugh

nationalinterest · July 20, 2024, 11:18am

I didn’t know about that - and it’s ludicrous - although that would then rely on Microsoft’s ability to rapidly patch issues that can only be patched at kernel level. That tension between competition and protection to the core platform continues today with Apple being forced to open up areas of their ecosystem that they claim will compromise system security (sometimes debatably). It’s a difficult balance. Only allowing Safari on iOS and Mac would potentially increase security, safety and privacy… but it would also offer a massive potential single point of failure and would reduce competition and consumer choice.

There perhaps needs to be a revisiting of the core protections for an OS. The problem here is CrowdStrike has to work at a low level to offer its protections, and it’s impossible for their customers to test updates in a normal way because they have to be implemented very quickly if they’re to protect against zero-day exploits and live hacking incidents. Waiting a few hours could be too late.

A key problem is, as @sgtaylor5 identified, the lack of diversity among some of these major corporate systems: CrowdStrike had a quarter of the endpoint protection market.

TudorEynon · July 20, 2024, 11:26am

Yeah, though plenty of people have pointed it out, the whole idea that everything should be linked up as it is is crazy. One www in other words. There should be, obviously, ring fenced systems. I don’t exactly how to achieve that of course, it should be done though and my instinct is that militaries will be doing it already.

Of course as others are saying, we take it all for granted, even the ‘bad parts’, like our ridiculous reliance on cars, parking spaces; wanton environental destruction and other lunacies that future generations will laff at as we do regarding the Middle Ages.

The hold Windows has still is a bit sad too of course. My own efforts to appleize my wife’s business were stymied by a couple of ‘windows’ friction points at a key verndor. That was all that was needed to stop it. Now it is more or less inertia though and blindly walking into AI making it all way worse.

rms · July 20, 2024, 11:30am

This figured out decades ago, but in more recent decades the new generations are forgetting (sometimes deliberately so) the learnings from the past.

Controls implemented in the past to mitigate risks are being abandoned partially because the risks those controls were intended to mitigate are forgotten, misunderstood, or ignored.

TudorEynon · July 20, 2024, 11:34am

Not even incompetence, often just path dependenent behaviors and situations that have become ludicrous: why empitres fall… Perverse incentives, ‘not my job’, ‘more than my job’s worth’ all play a role as does, in my view, though I won’t pound sand, the current economic system, geared to short term (ish) profit.

Also given the number of large scale and obvious issues facing us, individuals feel over whelmed: I for sure do.

TudorEynon · July 20, 2024, 11:38am

You ain’t jokin’. It was being said long ago as I should have pointed out. As I said though, I do think some systems are being isolated or have an isolated parallel system. I don’t want to inflame Cold War 2.0 rhetoric here but there is an obvious line of thinking regarding geo politics.

geoffaire · July 20, 2024, 12:41pm

Agreed, if you didn’t buy the annual support at 40% of the license cost, you had to rebuy the software every 3-5 years. Potatoe - potaato

Of course you could choose not to but either, but then you just paid for support when you requirednit

geoffaire · July 20, 2024, 12:52pm

Not just efficiency, simplification in an increasingly complex world.

No company would run 2 different AV/AM solutions on their company laptops splitting the estate 50/50. That’s even less likely with monitoring tools as the idea is to bring all logs and notifications into one place.

It’ll be interesting if companies act on this or put it down as “one of those things”/Accept the risk.

geoffaire · July 20, 2024, 12:54pm

There are ways to improve this and that’s a staggered rollout, but that causes other problems with version mismatches.

I don’t think you mean incompetence, i.e. lack of skills/experience/training. I suspect you mean human error.

Beyond anything anyone else has said, Crowdstrike and many of their customer’s IT departments are not enjoying the weekend they had planned.

Synchronicity · July 20, 2024, 1:44pm

It also needs to be remembered that even the most competent people make mistakes. It’s part of being human.

liminal · July 20, 2024, 3:00pm

More than 20 years later, I still believe that Bruce Schneier’s proposed solution of software industry liability is one of the better ideas in this area. Unlikely to happen as long as tech leviathans continued to buy politicians though.

"Today there are no real consequences for having bad security, or having low-quality software of any kind. Even worse, the marketplace often rewards low quality. More precisely, it rewards additional features and timely release dates, even if they come at the expense of quality.

If we expect software vendors to reduce features, lengthen development cycles, and invest in secure software development processes, they must be liable for security vulnerabilities in their products. If we expect CEOs to spend significant resources on their own network security—especially the security of their customers—they must be liable for mishandling their customers’ data. Basically, we have to tweak the risk equation so the CEO cares about actually fixing the problem. And putting pressure on his balance sheet is the best way to do that."

https://www.schneier.com/essays/archives/2003/11/liability_changes_ev.html

Also this 2011 essay by Poul-Henning Kamp

https://queue.acm.org/detail.cfm?id=2030258

geoffaire · July 20, 2024, 3:20pm

This is all based on a massive misconception that there is always a secure state (I.e. there will 100% be no breach) There isn’t for many reasons, the biggest one being that people make mistakes.

managing security is always a trade off between what is reasonable and how much that costs
insurance companies will not allow companies to provide such warranties
when something goes wrong, try and prove whose fault it was, AND get them to accept responsibility
Lastly, but by no means least, costs for software would increase massively and we’ve already established in this thread that cost is a factor when choosing solutions.

TudorEynon · July 20, 2024, 3:22pm

In June 2023, the Federal Trade Commission fielded a call for public comments about cloud computing business practices. Microsoft and Amazon, two companies that dominate this space, replied by insisting that competition was “thriving” and “highly dynamic and competitive”. Google, less of a player than the other two, was less demure and offered an 11-page document accusing Microsoft of stifling competition.

From a Guardian article Saturday, July 20, 2024 by Edward Ongweso Jr
The role of monopoly, which really makes a nonsense of ‘free market’ arguments in the tech field is however not something I am going to pound sand on here.
It is a kind of poltical economy free space about one of the few products, warts and all I know, that I actually enjoy using. However that has to run out one day. For me not yet.

OogieM · July 20, 2024, 4:18pm

Not true. It’s possible to write code that is cross platform within a single framework. AnimalTrakker Desktop apps (Registry and Farm) can run on Mac, Linux or Windows. I’m using Python 3.8 for all development.

Almost any farmer or person living in a moderately rural area already does all that as a standard operating procedure. I know we’re covered for every contingency you mention and some you haven’t mentioned.

WayneG · July 20, 2024, 4:40pm

You are, of course, correct. I’ve been out of the game for a few years but still see things from the view of someone supporting traditional commercial software. Or very specialized custom software.

MacSparky · July 20, 2024, 5:25pm

I’ve got a blog post about this going up next week. What stands out to me is that all of this results from negligence. How much worse could have been if the source was malice?

95omega · July 20, 2024, 6:42pm

Many run CrowdStrike along side of Defender in the enterprise. In fact, one of their sales motions is a “better together” approach. The falcon agent is pretty lightweight, but this can depend on service and type of host.

Bmosbacker · July 20, 2024, 6:49pm

@MacSparky Indeed, this is my point above. My hunch is that it is only a matter of time.

This would make a helpful podcast, “How to prepare for and minimize the impact of a significant cyber disruption”—perhaps invite a couple of cybersecurity experts to join the discussion. This could go beyond the obvious backups to include a selection of apps, what security software, if any, to install, whether one should keep all essential documents local on the hard drive in addition to on the cloud, when to include analog along with digital documents, and more. For example, upon reading of the GPS disruption in Ukraine, I decided to purchase an old fashioned US Road Atlas.