The contention was that this is a single threaded game which has not been updated in a while and is horribly CPU limited. I do not think there is anything CPU limited on a modern computer though, such that it could use modernization. It does use multiple threads where it can benefit, such as rendering the world (it is a modern DX12 graphics engine, using multiple threads.)Hmm, sounds like the theatre audience trash mobs? One of my all time favourite dungeons. The music and atmosphere were great; the architecture of the place itself told stories and lore. The NPCs tied into so many story threads from the classic questing experience. Karazhan was a lovingly crafted adventure. Not just "content" cranked out routinely from a crownd of overworked people with little creative freedom.
Anyway, I think you present a nice rationale why nobody at Blizzard would ever feel that this scenario needs any more performance tuning. But the point still stands that such crowd situations are single thread limited. And IMHO any performance tuning here is still valuable, because the stuttering is a slippery slope. Sometimes the catastrophic fail pull would have actually been survived if the players' computers hadn't slowed down to a crawl.
There are a few places you can pick up some performance with a better system, but mostly it runs well even on quite old and not so powerful hardware. I do not think complaints they have not modernized it are fair, there are a lot of signs they do optimize and modernize the engine over time.
Let us look at those situations where the frame rate is not high on almost any computer.
The theater packs are what it looks like he is pulling from his screenshot for the combat test. He has a video though, which I suppose I can watch in a corner as it is not too long. Looking, it is everything from the beginning of the instance, the ballroom, banquet hall, etc. all the way up through the theater, all at once. That is a lot of mobs, he means it by worst case combat scenario in some ways.
The potato is still getting almost 60fps there, and scaling looks like it purely based upon memory generation. I do not think there is anything they can do in terms of threading it better, the hundreds of mobs all doing things and needing updates are going to keep it memory bound. The X3D chip has a tiny impact on frame rate, as does CPU frequency (what should be a slower processor is actually winning slightly in fps, the 12400F is on top by a small amount over both a 12700 at a higher frequency, and a 5800X3D with more cache), it wildly blows any cache size I assume and is purely down to how many memory accesses can be serviced at that point. That gets slightly faster with memory generation, or server platforms with more channels, not CPU speed. CPUs with more cache and more frequency are not winning this benchmark, it is purely memory.
Threading cannot solve a problem like that, it would just block more cores for no gain. Optimizing something like that further means not allowing independent actions for each mob, but this is not a swarm shooter, and that is likely not a good idea for the kind of game this is, unless they have something specific in mind for an event.
That has not improved massively over time as memory latency has not improved much over time. Transfer rate has, and that can be seen to some degree in the benchmarks, but it is mostly a latency limited task once you produce a mob pile of that scale. More interesting would be how many mobs you can pull before the rate drops both with and without an X3D sized cache, as that likely does matter.
Wading through hundreds of players to the bank is similar, although it does show a big jump in frame rate for the X3D chip (but not CPU speed, a slower single thread CPU of a similar generation wins this again by a small margin if we discount the X3D chip, which has a massive jump in rate.) The assumption there is that they can pick up some locality of reference as many of those players will just be sitting there, so it is not updating everything all the time and it could find that data in the cache with some frequency. Loading delays as it brings in newly seen equipment and such kill the low 1% on everything they test in this scenario, and is likely a major drag on the average rate.
The heavy combat test especially says number of cores, amount of cache (including X3D), and CPU frequency are all basically irrelevant to the results. It only cares about memory latency.