Multicore, dual-core, and the future of Intel

by Jon "Hannibal" Stokes

This year's IDF will go down in Intel history as a major shift in strategy for the company. While the rest of the industry, which has been less obsessed than Intel with pushing the MHz envelope, has been openly moving towards parallel computing for some time now, Intel's company-wide turn in that direction has been more sudden and dramatic. Sun and IBM in particular have been beating the parallel computing drum for quite some time now, while Intel has resolutely stuck to its GHz-centric guns. To see what I mean, take a look at the following quote from my PPC 970 coverage two years ago, discussing the difference in approaches between Intel and IBM:

This difference in clock speed... reflects a fundamental difference in the approaches of the PPC 970 and the Pentium 4. As is evidenced by the enormous power requirements that go along with its high clock speed, the P4 is really aimed at single-processor desktop systems. Sure, Intel sells the P4 Xeon for use in 2-way and 4-way server setups, but the price and power consumption of such machines restrict them primarily to the server closet. The PowerPC 970, on the other hand, is designed from the ground up with multiprocessing in mind — IBM intends to see the 970 used in 4-way or higher desktop SMP systems. So instead of increasing the performance of desktop and server systems by using a few narrow, very high clock speed, high power CPUs IBM would rather see multiple, slower but wider, lower power CPUs ganged together via very high-bandwidth connections. (This, incidentally, is the philosophy behind IBM's upcoming Cell processor for the PS3. I'm not suggesting here that the two projects are directly connected, but I would say that they embody some of the same fundamental design values.)

Intel is now taking up the high-bandwidth multiprocessing paradigm with a vengeance, and they expect to move most of their product line to multicore chips in the next few years. What Intel announced at this IDF was no less than a total rethinking of their approach to microprocessors. The news that everything from servers to laptops will eventually go multicore is significant, and it shows just how serious Intel is about this new approach.

Intel presentation presented a coherent message and vision for the company, all built around two primary propositions:

Moore's-Curves-driven performance scaling will come not from increases not in MHz ratings but in machine width.
Datasets are growing in size, and so are the network pipes that connect those datasets.

Everything that Intel discussed in the webcast can be seen as a response to one or both of these facts. Let's take a look at each factor individually, before examining how Intel's road map fits them.

Wider, not faster

The opening to my first P4 vs. G4e article was taken up with characterizing the Pentium 4's approach to performance as "fast and narrow", versus the G4e's "slow and wide" approach. The Pentium 4's performance is dependent on its ability to complete a sequence of instructions very, very rapidly, one after the other. So the Pentium 4 pushes the code stream through its narrow execution core, a core that consists of relatively few functional units, in a rapid-fire, serial manner.

The G4e, in contrast, has a wider execution core and runs slower. The G4 spreads the code stream's instructions out as widely as possible and pushes them through a larger number of execution units at a slower rate.

Both approaches have their advantages and disadvantages, and as I noted then and in subsequent articles on the Pentium 4, the "fast and narrow" approach works quite well as long as you can keep the clock speed up there. The main drawback to the Pentium 4's approach is that rising clock speeds translate into rising power consumption, and as my recent Prescott article discussed, power consumption is currently a big problem in the industry. Thus Prescott, and the whole "fast and narrow" approach to performance that it exemplifies, is rapidly reaching its practical performance limits.

There's still plenty of headroom left in the "slow and wide" approach, though, so that's the direction that Intel is now turning its entire product line. Intel explicitly and repeatedly stated in their IDF keynote that the days of MHz as a (flawed) performance metric and marketing tool are at an end, and that the new buzzword is features.

Integration, not acceleration

The history of the past two decades of microprocessor evolution is the history of functionality moving from the motherboard onto the processor die. Memory, floating-point capabilities, SIMD, and now memory controllers are all examples of functionality that was once implemented on a separate IC but that has now made its way onto the CPU die itself. Intel is looking to continue this trend by putting more CPUs on the CPU die.

From what I've heard, Intel's initial multicore designs are very hasty and hackish in nature. Intel was kind of taken by surprise with the Prescott power problems, so they had to turn the whole company on a dime. As a result, the first multicore Prescott implementation isn't as clean as what you get with the Opteron, which was initially designed from the ground up for dual-core. But I'm not going to go into any more detail about Prescott, because we at Ars aren't really in the rumor-reporting business. If and when more information on the nuts and bolts of Intel's multicore implementation becomes available, we'll be on top of it. Until then, you can take my (admittedly vague) word for it, or not.

I think we can also expect to see more integration in the form of on-die memory controllers in the future, and eventually Intel will start cramming as much as they can onto their chips. There are, however, limits to how much you can put on a CPU die. Some types of circuits just don't fit well together, like the kind of analog circuitry that you need to send, receive, and process radio signals (i.e., WiFi and Bluetooth) and digital circuitry. This likely means that higher levels of integration will primarily take the form of more cores and more cache.