Ars Technica logo. Serving the PC enthusiast for over 5x10-2 centuries  

Subscribe to Ars Technica!

Have news? Send it in.

 
Ars Guides.
  Buyer's Guide
  How-To's & Tweaks
  Product Reviews
  Ars Shopping Engine

Technopaedia.
  Technical Blackpapers
  CPU Theory & Praxis
  Ars OpenForum
  Search Ars

Columnar Edifice.
  Wankerdesk
  AskArs!
  Diary of a Geek
  Game.Ars Report   Mac.Ars takes on...
  Linux.Ars

Site Info.
  Subscribe to Ars
  Ars Merchandise
  Who We Ars
  Advertising
  Links



Multicore, dual-core, and the future of Intel

   by Jon "Hannibal" Stokes

 

This year's IDF will go down in Intel history as a major shift in strategy for the company. While the rest of the industry, which has been less obsessed than Intel with pushing the MHz envelope, has been openly moving towards parallel computing for some time now, Intel's company-wide turn in that direction has been more sudden and dramatic. Sun and IBM in particular have been beating the parallel computing drum for quite some time now, while Intel has resolutely stuck to its GHz-centric guns. To see what I mean, take a look at the following quote from my PPC 970 coverage two years ago, discussing the difference in approaches between Intel and IBM:

This difference in clock speed... reflects a fundamental difference in the approaches of the PPC 970 and the Pentium 4. As is evidenced by the enormous power requirements that go along with its high clock speed, the P4 is really aimed at single-processor desktop systems. Sure, Intel sells the P4 Xeon for use in 2-way and 4-way server setups, but the price and power consumption of such machines restrict them primarily to the server closet. The PowerPC 970, on the other hand, is designed from the ground up with multiprocessing in mind — IBM intends to see the 970 used in 4-way or higher desktop SMP systems. So instead of increasing the performance of desktop and server systems by using a few narrow, very high clock speed, high power CPUs IBM would rather see multiple, slower but wider, lower power CPUs ganged together via very high-bandwidth connections. (This, incidentally, is the philosophy behind IBM's upcoming Cell processor for the PS3. I'm not suggesting here that the two projects are directly connected, but I would say that they embody some of the same fundamental design values.)

Intel is now taking up the high-bandwidth multiprocessing paradigm with a vengeance, and they expect to move most of their product line to multicore chips in the next few years. What Intel announced at this IDF was no less than a total rethinking of their approach to microprocessors. The news that everything from servers to laptops will eventually go multicore is significant, and it shows just how serious Intel is about this new approach.

Intel presentation presented a coherent message and vision for the company, all built around two primary propositions:

  1. Moore's-Curves-driven performance scaling will come not from increases not in MHz ratings but in machine width.
  2. Datasets are growing in size, and so are the network pipes that connect those datasets.

Everything that Intel discussed in the webcast can be seen as a response to one or both of these facts. Let's take a look at each factor individually, before examining how Intel's road map fits them.

Wider, not faster

The opening to my first P4 vs. G4e article was taken up with characterizing the Pentium 4's approach to performance as "fast and narrow", versus the G4e's "slow and wide" approach. The Pentium 4's performance is dependent on its ability to complete a sequence of instructions very, very rapidly, one after the other. So the Pentium 4 pushes the code stream through its narrow execution core, a core that consists of relatively few functional units, in a rapid-fire, serial manner.

The G4e, in contrast, has a wider execution core and runs slower. The G4 spreads the code stream's instructions out as widely as possible and pushes them through a larger number of execution units at a slower rate.

Both approaches have their advantages and disadvantages, and as I noted then and in subsequent articles on the Pentium 4, the "fast and narrow" approach works quite well as long as you can keep the clock speed up there. The main drawback to the Pentium 4's approach is that rising clock speeds translate into rising power consumption, and as my recent Prescott article discussed, power consumption is currently a big problem in the industry. Thus Prescott, and the whole "fast and narrow" approach to performance that it exemplifies, is rapidly reaching its practical performance limits.

There's still plenty of headroom left in the "slow and wide" approach, though, so that's the direction that Intel is now turning its entire product line. Intel explicitly and repeatedly stated in their IDF keynote that the days of MHz as a (flawed) performance metric and marketing tool are at an end, and that the new buzzword is features.

Integration, not acceleration

The history of the past two decades of microprocessor evolution is the history of functionality moving from the motherboard onto the processor die. Memory, floating-point capabilities, SIMD, and now memory controllers are all examples of functionality that was once implemented on a separate IC but that has now made its way onto the CPU die itself. Intel is looking to continue this trend by putting more CPUs on the CPU die.

From what I've heard, Intel's initial multicore designs are very hasty and hackish in nature. Intel was kind of taken by surprise with the Prescott power problems, so they had to turn the whole company on a dime. As a result, the first multicore Prescott implementation isn't as clean as what you get with the Opteron, which was initially designed from the ground up for dual-core. But I'm not going to go into any more detail about Prescott, because we at Ars aren't really in the rumor-reporting business. If and when more information on the nuts and bolts of Intel's multicore implementation becomes available, we'll be on top of it. Until then, you can take my (admittedly vague) word for it, or not.

I think we can also expect to see more integration in the form of on-die memory controllers in the future, and eventually Intel will start cramming as much as they can onto their chips. There are, however, limits to how much you can put on a CPU die. Some types of circuits just don't fit well together, like the kind of analog circuitry that you need to send, receive, and process radio signals (i.e., WiFi and Bluetooth) and digital circuitry. This likely means that higher levels of integration will primarily take the form of more cores and more cache.

Next: Data growth

 


Dual 2.5GHz Power Mac G5 review

The Sims 2 review

Pipelining: an overview (Part II)

System Guide: September edition

Pipelining: an overview (Part I)

Chris Sawyer's Locomotion review

Multicore, dual-core, and the future of Intel

System Guide: gaming boxes

TrackIR3 Pro review

Doom 3: the review

PowerPC on Apple: An Architectural History, Part I

Virtual machine shootout: Virtual PC vs. VMware

The Pentium: An Architectural History � Part II

Joint Operations: Typhoon Rising game review

AirPort Express review

The Pentium: An Architectural History � Part I

The Ars guide to PCI Express

Beyond Divinity game review

The future of Prescott

Interview with Mozilla.org's Scott Collins

Thief: Deadly Shadows game review

USB 2.0 Hi-Speed Flash drive review

A closer look at Intel's processor numbers and 2004 road map

Far Cry game review

Dell Latitude D800 laptop review

HP Compaq nc6000 laptop review

Hitman: Contracts game review

Deploying a small business Windows 2003 network

Alternative AIM clients for Windows

Inside GNOME 2.6

/etc

OpenForum

Distributed Computing

Take the Poll Technica

FAQ: Celeron overclocking

 

Copyright © 1998-2004 Ars Technica, LLC