First Agile doesn't really mean breaking your application. Second, if you break something then you break it early during development and testing, not in production.Move fast and break things! Got to be agile!
Ah, on rewatch, I apparently missed him saying "let's speculate" right before that bit. Thanks for the correction.He doesn’t say the updates include p-code. He just says that would be possible to do and would be bad.
First Agile doesn't really mean breaking your application. Second, if you break something then you break it early during development and testing, not in production.
This sounds more like a QA screwup due to inadequate processes or insufficient man-power
John McAfee was right about McAfee software…It seems that he was McAfee CEO back in 2010.
Without consent!I dunno, they seem to be living up to their name here already; they struck the crowd, alright, open-handed right across the boot process.
I dunno, they seem to be living up to their name here already; they struck the crowd, alright, open-handed right across the boot process.
That is not true. I've worked for one company where the switch to agile methods brought massive quality improvements through a combination of test automation and shorter, incremental development cycles. The focus was also on a steady release cadence rather than speed. I've done waterfall too, and that company had far more issues in their code.Agile has definitely allowed a sloppiness to encroach into software development and the constant push for quick releases always comes at the cost of quality...ALWAYS.
This kinda defeats the purpose of XDR, though. XDR has its main power in being able to aggregate activities across all of your devices, so you can have your router go "hey, something is talking on this suspicious port", and trace it back to computer X and have it go "well, I am running program Y, which is also doing these suspicious activities" and kill the software. Or determine, "wait, that software is new, but good, so let it continue to use that port". As soon as you are splitting your security software stack apart, you lose visibility across your entire network. So you get resilience against these very infrequent issues (and most of these companies would still be down for a while even if only have their computers were BSOD), at the expense of actual security.I guess some companies will pivot to an A/B strategy where half of the endpoints get CrowdStrike protection and the other half gets an alternative.
This will of course increase the system administration bills but you get resilience in return.
I came away from that video completely shocked. They made so many enormously risky decisions:Dave Plummer, past Microsoft engineer, goes over the Crowdstrike incident
[...]
One interesting bit is he says the "content updates" include p-code that the Crowdstrike driver executes, and that the driver is set up to essentially run unsigned code from these update files at the kernel level. That's actually pretty scary stuff.
It sounds like this is an almost-daily type of update, similar to antivirus definitions.This is why you never deploy on Fridays.
That assumes a lot about the quality of (or existence of) their internal testing.What I don't get is given how broadly applicable the problems were, how in the world did this pass internal testing at CrowdStrike?
I could understand if the problem was extremely rare, but this seems like it basially hit everyone who downloaded the update.
In theory agile doesn't mean that, in practice it often does lead to "eff it, push it to prod to get the work item out of my sprint, it passed x unit tests or worked on my dev box"First Agile doesn't really mean breaking your application. Second, if you break something then you break it early during development and testing, not in production.
This sounds more like a QA screwup due to inadequate processes or insufficient man-power
According to this article a few months ago the CrowdStrike software for Linux also caused issues with an update. It turns out that some supposedly supported linux configurations (Debian, Rocky Linux, and therefore likely Redhat was well) are not part of the testing matrix at CrowdStrike. I find it hard to believe that some configurations of Windows – and apparently fairly common ones at that – would not be part of the testing, but at least there is precedent for incomplete testing.What I don't get is given how broadly applicable the problems were, how in the world did this pass internal testing at CrowdStrike?
I could understand if the problem was extremely rare, but this seems like it basially hit everyone who downloaded the update.
I suspect the file got corrupted in transfer after it was tested.What I don't get is given how broadly applicable the problems were, how in the world did this pass internal testing at CrowdStrike?
You mean between testing and deployment servers internally at Crowdstrike? Because there is no way it got corrupted upon download by thousands of systems worldwide in the same way. And even then, everything should be checked against a checksum. Even internally, so they can make sure the same file they are testing is the same file they are deploying.I suspect the file got corrupted in transfer after it was tested.
That would mean they have no checksumming or file integrity on their builds which would be an even more damning implication than "they made a bad build get pushed"I suspect the file got corrupted in transfer after it was tested.
Apple quite frequently goes around the published APIs thus failing to dogfood the published APIs, leaving gaping holes that need improvement. They also quite frequently DO dogfood their published APIs, and those usually turn out pretty great. And then there's Swift-UI.If apple would need deep access beyond their API, then that just means their API is insufficient and needs improvement.
Still possible, but it requires 3 reboots (among other things) along the wayOk, I was going to be snarky (as this has been possible for decades). But: is this really true? Certainly this has been getting harder and harder over recent years. But AFAIK it's still possible to install kernel extensions in MacOS. Or at least that was still the case fairly recently.
Some time ago Apple managed to push out an update that killed Ethernet interfaces: https://www.digitaltrends.com/computing/mac-update-breaks-ethernet-fix/Apple quite frequently goes around the published APIs thus failing to dogfood the published APIs, leaving gaping holes that need improvement. They also quite frequently DO dogfood their published APIs, and those usually turn out pretty great. And then there's Swift-UI.
Still possible, but it requires 3 reboots (among other things) along the way
The over-dependency on enterprise "cybersecurity" providers is inherently dangerous, exactly like this event shows. I realize it checks the liability boxes so senior execs can say they performed diligence and get their massive pay regardless of how horrific a cock-up happens on their watch, but from a system design point of view, it's questionable. At best.
Take a wild guess who was CTO when similar incident happened in McAfee in 2010Yes the formatting breaks on mobile too
Those systems would still boot.
[mike drop]
Microsft should permit customers to mark a driver -It's a universal problem, not limited to Microsoft. A bad kernel module will kill a Linux install as well. A driver, pretty much by definition, has to run with kernel level privileges; and at that level, a mistake in the code cannot be trapped - it's going to bring the system down.
Some things that are currently kernel modules can be moved to userspace - but some things cannot. (And doing so does bring certain tradeoffs - for example, GPU drivers can be in userspace, but there is a performance hit in doing so. Given how complex GPU drivers are these days, that's a worthwhile tradeoff IMO; but it is a tradeoff.)
CrowdStrike made the choice - rightly or wrongly - to implement their code as a kernel level driver. Their code caused these crashes. Ergo, CrowdStrike is wholly to blame for this. Microsoft might be able to implement improvements that allow more stuff that's currently in the kernel to move into userspace - but that's a separate issue.
If you write stuff that runs with kernel-level privileges, it is on you to make sure your code is robust. There is only so much that the OS vendor can do to limit the damage in that scenario.
boot start: false
, if they choose so. Allowing manual override over (CrowdStrike etc's) boot start: true
hard-code.It's enterprise security software, it's deployed en masse and designed so users can't mess with it. In theory it's designed so that sysadmins can designate deployment rings so this kind of thing only hits a small test group. Instead Crowdstrike hid that they can tell a patch to deploy to all anyways. Nobody knew they might have to do something like that.Microsft should permit customers to mark a driver -boot start: false
, if they choose so. Allowing manual override over (CrowdStrike etc's)boot start: true
hard-code.
It's enterprise security software, it's deployed en masse and designed so users can't mess with it. In theory it's designed so that sysadmins can designate deployment rings so this kind of thing only hits a small test group. Instead Crowdstrike hid that they can tell a patch to deploy to all anyways. Nobody knew they might have to do something like that.
.. users can mess with it ..
is something you assumed and went on with. Giving an option, of course, means giving it to the authorized user.It's hard to top this one, lol.
Apparently doing the same thing over and over again and expecting different results is no longer the height of insanity.Have you tried turning it off and on and off and on and off and on and off and on and off and on and off and on and off and on and off and on and off and on and off and on again?
I resemble that remark. Citation: I am oldIs it just me, or did anyone else read this and flash back to the time when their dad kept banging on the side of the old CRT TV's to get it to show a picture?
Why not both!? Putin collected the Dragon Balls and made a wish!I think a magical space dragon did this, and I have as much evidence as you do!
Is that Clown-Strike or Clown's-Trike?May we also nominate ClownStrike?
Oh no, now I'm going to have to tune in to one of the olde TV subchannels where they play Laugh-In in the wee hours...Why not both!? Putin collected the Dragon Balls and made a wish!
Is that Clown-Strike or Clown's-Trike?
You're thinking on a small, individual user scale - not an enterprise scale. The rules and requirements change radically when there are hundreds or thousands of systems to manage. Giving the end users the ability to turn off security critical software is exactly the sort of thing that cyber security teams are dead set against. It means you end up with non-compliant systems, and in some industries, that's an absolute no. Even in industries where it's not an absolute no, it's still something that the IT team would be extremely hesitant about.Giving an option, of course, means giving it to the authorized user.