Original Link: https://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive
Intel Core i7 3960X (Sandy Bridge E) Review: Keeping the High End Alive
by Anand Lal Shimpi on November 14, 2011 3:01 AM EST- Posted in
- CPUs
- Intel
- Core i7
- Sandy Bridge
- Sandy Bridge E
If you look carefully enough, you may notice that things are changing. It first became apparent shortly after the release of Nehalem. Intel bifurcated the performance desktop space by embracing a two-socket strategy, something we'd never seen from Intel and only once from AMD in the early Athlon 64 days (Socket-940 and Socket-754).
LGA-1366 came first, but by the time LGA-1156 arrived a year later it no longer made sense to recommend Intel's high-end Nehalem platform. Lynnfield was nearly as fast and the entire platform was more affordable.
When Sandy Bridge launched earlier this year, all we got was the mainstream desktop version. No one complained because it was fast enough, but we all knew an ultra high-end desktop part was in the works. A true successor to Nehalem's LGA-1366 platform for those who waited all this time.
Left to right: Sandy Bridge E, Gulftown, Sandy Bridge
After some delays, Sandy Bridge E is finally here. The platform is actually pretty simple to talk about. There's a new socket: LGA-2011, a new chipset Intel's X79 and of course the Sandy Bridge E CPU itself. We'll start at the CPU.
For the desktop, Sandy Bridge E is only available in 6-core configurations at launch. Early next year we'll see a quad-core version. I mention the desktop qualification because Sandy Bridge E is really a die harvested Sandy Bridge EP, Intel's next generation Xeon part:
If you look carefully at the die shot above, you'll notice that there are actually eight Sandy Bridge cores. The Xeon version will have all eight enabled, but the last two are fused off for SNB-E. The 32nm die is absolutely gigantic by desktop standards, measuring 20.8 mm x 20.9 mm (~435mm^2) Sandy Bridge E is bigger than most GPUs. It also has a ridiculous number of transistors: 2.27 billion.
Around a quarter of the die is dedicated just to the chip's massive L3 cache. Each cache slice has increased in size compared to Sandy Bridge. Instead of 2MB, Sandy Bridge E boasts 2.5MB cache slices. In its Xeon configuration that works out to 20MB of L3 cache, but for desktops it's only 15MB. That's just 1MB shy of how much system memory my old upgraded 386-SX/20 had.
CPU Specification Comparison | ||||||||
CPU | Manufacturing Process | Cores | Transistor Count | Die Size | ||||
AMD Bulldozer 8C | 32nm | 8 | 1.2B* | 315mm2 | ||||
AMD Thuban 6C | 45nm | 6 | 904M | 346mm2 | ||||
AMD Deneb 4C | 45nm | 4 | 758M | 258mm2 | ||||
Intel Gulftown 6C | 32nm | 6 | 1.17B | 240mm2 | ||||
Intel Sandy Bridge E (6C) | 32nm | 6 | 2.27B | 435mm2 | ||||
Intel Nehalem/Bloomfield 4C | 45nm | 4 | 731M | 263mm2 | ||||
Intel Sandy Bridge 4C | 32nm | 4 | 995M | 216mm2 | ||||
Intel Lynnfield 4C | 45nm | 4 | 774M | 296mm2 | ||||
Intel Clarkdale 2C | 32nm | 2 | 384M | 81mm2 | ||||
Intel Sandy Bridge 2C (GT1) | 32nm | 2 | 504M | 131mm2 | ||||
Intel Sandy Bridge 2C (GT2) | 32nm | 2 | 624M | 149mm2 |
Update: AMD originally told us Bulldozer was a 2B transistor chip. It has since told us that the 8C Bulldozer is actually 1.2B transistors. The die size is still accurate at 315mm2.
At the core level, Sandy Bridge E is no different than Sandy Bridge. It doesn't clock any higher, L1/L2 caches remain unchanged and per-core performance is identical to what Intel launched earlier this year.
The Lineup
Processor | Core Clock | Cores / Threads | L3 Cache | Max Turbo | Max Overclock Multiplier | TDP | Price |
Intel Core i7 3960X | 3.3GHz | 6 / 12 | 15MB | 3.9GHz | 57x | 130W | $990 |
Intel Core i7 3930K | 3.2GHz | 6 / 12 | 12MB | 3.8GHz | 57x | 130W | $555 |
Intel Core i7 3820 | 3.6GHz | 4 / 8 | 10MB | 3.9GHz | 43x | 130W | TBD |
Intel Core i7 2700K | 3.5GHz | 4 / 8 | 8MB | 3.9GHz | 57x | 95W | $332 |
Intel Core i7 2600K | 3.4GHz | 4 / 8 | 8MB | 3.8GHz | 57x | 95W | $317 |
Intel Core i7 2600 | 3.4GHz | 4 / 8 | 8MB | 3.8GHz | 42x | 95W | $294 |
Intel Core i5 2500K | 3.3GHz | 4 / 4 | 6MB | 3.7GHz | 57x | 95W | $216 |
Intel Core i5 2500 | 3.3GHz | 4 / 4 | 6MB | 3.7GHz | 41x | 95W | $205 |
Those of you buying today only have two options: the Core i7-3960X and the Core i7-3930K. Both have six fully unlocked cores, but the 3960X gives you a 15MB L3 cache vs. 12MB with the 3930K. You pay handsomely for that extra 3MB of L3. The 3960X goes for $990 in 1K unit quantities, while the 3930K sells for $555.
The 3960X has the same 3.9GHz max turbo frequency as the Core i7 2700K, that's with 1 - 2 cores active. With 5 - 6 cores active the max turbo drops to a respectable 3.6GHz. Unlike the old days of many vs. few core CPUs, there are no tradeoffs for performance when you buy a SNB-E. Thanks to power gating and turbo, you get pretty much the fastest possible clock speeds regardless of workload.
Early next year we'll see a Core i7 3820, priced around $300, with only 4 cores and a 10MB L3. The 3820 will only be partially unlocked (max OC multiplier = 4 bins above max turbo).
No Integrated Graphics, No Quick Sync
All of this growth in die area comes at the expense of one of Sandy Bridge's greatest assets: its integrated graphics core. SNB-E features no on-die GPU, and as a result it does not feature Quick Sync either. Remember that Quick Sync leverages the GPU's shader array to accelerate some of the transcode pipe, without its presence on SNB-E there's no Quick Sync.
Given the target market for SNB-E's die donor (Xeon servers), further increasing the die area by including an on-die GPU doesn't seem to make sense. Unfortunately desktop users suffer as you lose a very efficient way to transcode videos. Intel argues that you do have more cores to chew through frames with, but the fact remains that Quick Sync frees up your cores to do other things while SNB-E requires that they're all tied up in (quickly) transcoding video. If you don't run any Quick Sync enabled transcoding applications, you won't miss the feature on SNB-E. If you do however, this will be a tradeoff you'll have to come to terms with.
Tons of PCIe and Memory Bandwidth
Occupying the die area where the GPU would normally be is SNB-E's new memory controller. While its predecessor featured a fairly standard dual-channel DDR3 memory controller, SNB-E features four 64-bit DDR3 memory channels. With a single DDR3 DIMM per channel Intel officially supports speeds of up to DDR3-1600, with two DIMMs per channel the max official speed drops to 1333MHz.
With a quad-channel memory controller you'll have to install DIMMs four at a time to take full advantage of the bandwidth. In response, memory vendors are selling 4 and 8 DIMM kits specifically for SNB-E systems. Most high-end X79 motherboards feature 8 DIMM slots (2 per channel). Just as with previous architectures, installing fewer DIMMs is possible, it simply reduces the peak available memory bandwidth.
Intel increased bandwidth on the other side of the chip as well. A single SNB-E CPU features 40 PCIe lanes that are compliant with rev 3.0 of the PCI Express Base Specification (aka PCIe 3.0). With no PCIe 3.0 GPUs available (yet) to test and validate the interface, Intel lists PCIe 3.0 support in the chip's datasheet but is publicly guaranteeing PCIe 2.0 speeds. Intel does add that some PCIe devices may be able to operate at Gen 3 speeds, but we'll have to wait and see once those devices hit the market.
The PCIe lanes off the CPU are quite configurable as you can see from the diagram above. Users running dual-GPU setups can enjoy the fact that both GPUs will have a full x16 interface to SNB-E (vs x8 in SNB). If you're looking for this to deliver a tangible performance increase, you'll be disappointed:
Multi GPU Scaling - Radeon HD 5870 CF | |||||
Max Quality, 4X AA/16X AF | Metro 2033 (19x12) | Crysis: Warhead (19x12) | Crysis: Warhead (25x16) | ||
Intel Core i7 3960X (2 x16) | 1.87x | 1.80x | 1.90x | ||
Intel Core i7 2600K (2 x8) | 1.94x | 1.80x | 1.88x |
Modern GPUs don't lose much performance in games, even at high quality settings, when going from a x16 to a x8 slot.
I tested PCIe performance with an OCZ Z-Drive R4 PCIe SSD to ensure nothing was lost in the move to the new architecture. Compared to X58, I saw no real deltas in transfers to/from the Z-Drive R4:
PCI Express Performance - OCZ Z-Drive R4, Large Block Sequential Speed - ATTO | ||||
Intel X58 | Intel X79 | |||
Read | 2.62 GB/s | 2.66 GB/s | ||
Write | 2.49 GB/s | 2.50 GB/s |
The Letdown: No SAS, No Native USB 3.0
Intel's current RST (Rapid Story Technology) drivers don't support X79, however Intel's RSTe (for enterprise) 3.0 will support the platform once available. We got our hands on an engineering build of the software, which identifies the X79's SATA controller as an Intel C600:
Intel's enterprise chipsets use the Cxxx nomenclature, so this label makes sense. A quick look at Intel's RSTe readme tells us a little more about Intel's C600 controller:
SCU Controllers:
- Intel(R) C600 series chipset SAS RAID (SATA mode)
Controller
- Intel C600 series chipset SAS RAID ControllerSATA RAID Controllers:
- Intel(R) C600 series chipset SATA RAID ControllerSATA AHCI Controllers:
- Intel(R) C600 series chipset SATA AHCI Controller
As was originally rumored, X79 was supposed to support both SATA and SAS. Issues with the implementation of the latter forced Intel to kill SAS support and go with the same 4+2 3Gbps/6Gbps SATA implementation 6-series chipset users get. I would've at least liked to have had more 6Gbps SATA ports. It's quite disappointing to see Intel's flagship chipset lacking feature parity with AMD's year-old 8-series chipsets.
I ran a sanity test on Intel's X79 against some of our H67 data for SATA performance with a Crucial m4 SSD. It looks like 6Gbps SATA performance is identical to the mainstream Sandy Bridge platform:
6Gbps SATA Performance - Crucial m4 256GB (FW0009) | ||||||
4KB Random Write (8GB LBA, QD32) | 4KB Random Read (100% LBA, QD3) | 128KB Sequential Write | 128KB Sequential Read | |||
Intel X79 | 231.4 MB/s | 57.6 MB/s | 273.3 MB/s | 381.7 MB/s | ||
Intel Z68 | 234.0 MB/s | 59.0 MB/s | 269.7 MB/s | 372.1 MB/s |
Intel still hasn't delivered an integrated USB 3.0 controller in X79. Motherboard manufacturers will continue to use 3rd party solutions to enable USB 3.0 support.
Overclocking
Sandy Bridge brought the motherboard's clock generator onto the 6-series chipset die. In doing so, Intel also locked its operation to 100MHz. While there was a bit of wiggle room, when combined with a locked processor, Intel effectively killed overclocking with most lower end Sandy Bridge chips.
For its more expensive CPUs, Intel offered either partially or fully unlocked (K-series) CPUs. The bus clock was still fixed at 100MHz, but you could overclock your processor by increasing its clock multiplier just like you could in the early days of overclocking.
With Sandy Bridge E, overclocking changes a bit. The clock generator is still mostly impervious to significant bus clock changes, however you're now able to send a multiple of its frequency to the CPU if you so desire. The options available are 100MHz, 125MHz, 166MHz and 250MHz.
Once again, wiggle room at any of these frequencies is limited so don't think we've moved back to the days of bus overclocking. You do get a little more flexibility, particularly with partially unlocked CPUs, but otherwise SNB-E overclocking is hardly any different from its predecessor.
Note that even if you select any of these options, the rest of the system still operates within spec. The multiplied bus clock is only fed to the CPU.
With a bit of effort I had no problems hitting 4.6GHz on my Core i7 3960X review sample. I had to increase core voltage from 1.104V to 1.44V, but the system was stable. While I could get into Windows at 4.8GHz and run a few benchmarks, the system wasn't completely stable.
No Cooler Included
None of the retail or OEM SNB-E parts include an Intel cooler in the bundle, a significant departure from previous CPUs. Presumably the cost of bundling a beefy cooler with these parts would've driven prices higher than Intel would've liked (remember you are getting a much larger die for roughly the same price as the outgoing Core i7 990X). Intel can also rationalize its decision against including any sort of cooler in the retail box by looking at the fact that many enthusiasts at this level opt for aftermarket cooling regardless.
Intel hasn't completely left SNB-E cooling up to 3rd party vendors however. There are two official Intel coolers available for use with SNB-E. The first is a < $20 heatsink that looks a lot like Intel's current coolers but with a couple of modifications (clear fan/shroud, retention screws instead of pegs). Intel states that this cooler is designed for operation within spec, meaning it could possibly limit overclocking attempts.
If you want an Intel branded overclocking solution, there's the RTS2011LC:
This is a closed loop liquid cooling solution similar to what AMD introduced alongside its Bulldozer CPU and similar to what many 3rd party cooling companies already offer. Intel expects its liquid cooling solution to be priced somewhere in the $85 - $100 range.
These closed loop liquid coolers are great primarily for getting away from the tower-of-metal heatsinks that have grown in popularity over the past several years. The radiator is a too small to compete with more traditional water cooling systems, but it can be a good gateway drug for the risk averse.
The Test
To keep the review length manageable we're presenting a subset of our results here. For all benchmark results and even more comparisons be sure to use our performance comparison tool: Bench.
Motherboard: |
ASUS P8Z68-V Pro (Intel Z68) ASUS Crosshair V Formula (AMD 990FX) Intel DX79SI (Intel X79) |
Hard Disk: |
Intel X25-M SSD (80GB) Crucial RealSSD C300 |
Memory: | 4 x 4GB G.Skill Ripjaws X DDR3-1600 9-9-9-20 |
Video Card: | ATI Radeon HD 5870 (Windows 7) |
Video Drivers: | AMD Catalyst 11.10 Beta (Windows 7) |
Desktop Resolution: | 1920 x 1200 |
OS: | Windows 7 x64 |
Cache and Memory Bandwidth Performance
The biggest changes from the original Sandy Bridge are the increased L3 cache size and the quad-channel memory interface. We'll first look at the impact a 15MB L3 has on latency:
Cache/Memory Latency Comparison | ||||||
L1 | L2 | L3 | Main Memory | |||
AMD FX-8150 (3.6GHz) | 4 | 21 | 65 | 195 | ||
AMD Phenom II X4 975 BE (3.6GHz) | 3 | 15 | 59 | 182 | ||
AMD Phenom II X6 1100T (3.3GHz) | 3 | 14 | 55 | 157 | ||
Intel Core i5 2500K (3.3GHz) | 4 | 11 | 25 | 148 | ||
Intel Core i7 3960X (3.3GHz) | 4 | 11 | 30 | 167 |
Cachemem shows us a 5 cycle increase in latency. Hits in L3 can take 20% longer to get to the core that requested the data, if this is correct. For small, lightly threaded applications, you may see a slight regression in performance compared to Sandy Bridge. More likely than not however, the ~2 - 2.5x increase in L3 cache size will more than make up for the added latency. Also note that despite the large cache and thanks to its ring bus, Sandy Bridge E's L3 is still lower latency than Gulftown's.
Memory Bandwidth Comparison - Sandra 2012.01.18.10 | |||||
Intel Core i7 3960X (Quad Channel, DDR3-1600) | Intel Core i7 2600K (Dual Channel, DDR3-1600) | Intel Core i7 990X (Triple Channel, DDR3-1333) | |||
Aggregate Memory Bandwidth | 37.0 GB/s | 21.2 GB/s | 19.9 GB/s |
Memory bandwidth is also up significantly. Populating all four channels with DDR3-1600 memory, Sandy Bridge E delivered 37GB/s of bandwidth in Sandra's memory bandwidth test. Given the 51GB/s theoretical max of this configuration and a fairly standard 20% overhead, 37GB/s is just about what we want to see here.
Windows 7 Application Performance
3dsmax 9
Today's desktop processors are more than fast enough to do professional level 3D rendering at home. To look at performance under 3dsmax we ran the SPECapc 3dsmax 8 benchmark (only the CPU rendering tests) under 3dsmax 9 SP1. The results reported are the rendering composite scores.
Offline 3D rendering applications make some of the best use of CPU cores, unfortunately our test here doesn't scale all that well. We only see a 7% increase over the 2600K. If we look at a more modern 3D workload however...
Cinebench 11.5
Created by the Cinema 4D folks we have Cinebench, a popular 3D rendering benchmark that gives us both single and multi-threaded 3D rendering results.
Single threaded performance is marginally better than the 2600K thanks to the 3960X's slightly higher max turbo speed. What's more important than the performance here is the fact that the 3960X is able to properly power gate all idle cores and give a single core full reign of the chip's TDP. Turbo is alive and well in SNB-E, just as it was in Sandy Bridge.
Here the performance gains are staggering. The 3960X is 53% faster than the 2600K and 19% faster than Intel's previous 6-core flagship, the 990X. The Bulldozer comparison is almost unfair, the 3960X is 75% faster (granted it is also multiple times the price of the FX-8150).
7-Zip Benchmark
While Cinebench shows us multithreaded floating point performance, the 7-zip benchmark gives us an indication of multithreaded integer performance:
Here we see huge gains over the 2600K (58%), indicating that the increase in cache size and memory bandwidth help the boost in core count a bit here. The advantage over the 990X is only 7%. This gives us a bit of a preview of what we can expect from SNB-EP Xeon server performance.
PAR2 Benchmark
Par2 is an application used for reconstructing downloaded archives. It can generate parity data from a given archive and later use it to recover the archive
Chuchusoft took the source code of par2cmdline 0.4 and parallelized it using Intel’s Threading Building Blocks 2.1. The result is a version of par2cmdline that can spawn multiple threads to repair par2 archives. For this test we took a 708MB archive, corrupted nearly 60MB of it, and used the multithreaded par2cmdline to recover it. The scores reported are the repair and recover time in seconds.
Here we see a 40% increase in performance over the 2600K and FX-8150.
TrueCrypt Benchmark
TrueCrypt is a very popular encryption package that offers full AES-NI support. The application also features a built-in encryption benchmark that we can use to measure CPU performance with:
As both the 990X and 3960X have AES-NI support, both are equally capable at cranking through an AES workload. Per core performance doesn't appear to have changed all that much with the move to Sandy Bridge, so here we have a situation where the 3960X is much faster than the 2600K but no faster than the 990X. I suspect these types of scenarios will be fairly rare.
x264 HD 3.03 Benchmark
Graysky's x264 HD test uses x264 to encode a 4Mbps 720p MPEG-2 source. The focus here is on quality rather than speed, thus the benchmark uses a 2-pass encode and reports the average frame rate in each pass.
Single threaded performance isn't significantly faster than your run-of-the-mill Sandy Bridge, which means the first x264 HD pass doesn't look all that impressive on SNB-E.
The second pass however stresses all six cores far more readily, resulting in a 47.5% increase in performance over the 2600K. Even compared to the 990X there's a 15% increase in performance.
Adobe Photoshop CS4
To measure performance under Photoshop CS4 we turn to the Retouch Artists’ Speed Test. The test does basic photo editing; there are a couple of color space conversions, many layer creations, color curve adjustment, image and canvas size adjustment, unsharp mask, and finally a gaussian blur performed on the entire image.
The whole process is timed and thanks to the use of Intel's X25-M SSD as our test bed hard drive, performance is far more predictable than back when we used to test on mechanical disks.
Time is reported in seconds and the lower numbers mean better performance. The test is multithreaded and can hit all four cores in a quad-core machine.
Our Photoshop test is multithreaded but there are only spikes that use more than four cores. That combined with the short duration of the benchmark shows no real advantage to the 3960X over the 2600K. Sandy Bridge E is faster than Intel's old 6-core solution though.
Compile Chromium Test
You guys asked for it and finally I have something I feel is a good software build test. Using Visual Studio 2008 I'm compiling Chromium. It's a pretty huge project that takes over forty minutes to compile from the command line on the Core i3 2100. But the results are repeatable and the compile process will stress all 12 threads at 100% for almost the entire time on a 980X so it works for me.
Our compile test is extremely well threaded, which once again does well on the 3960X. The gains aren't as big as what we saw in some of our earlier 3D/transcoding tests, but if you're looking to build the fastest development workstation you'll want a Sandy Bridge E.
Excel Monte Carlo
Multithreaded compute does well on SNB-E regardless of the type of application. Excel is multithreaded and if you have a beefy enough workload, you'll see huge gains over the 2600K.
Gaming Performance
Most games have a tough enough time stressing more than four cores, so the move to the 3960X won't do much for gaming in most cases (particularly when GPU bound). That being said, the added cache may help give SNB-E a slight bump over its quad-core brethren.
Civilization V
Civ V's lateGameView benchmark presents us with two separate scores: average frame rate for the entire test as well as a no-render score that only looks at CPU performance.
In GPU bound scenarios the 3960X is no different than the 2600K. Civ V is a unique game in that its CPU workload does scale reasonable well across multiple cores:
Here the 3960X is nearly 30% faster than the 2600K.
Crysis: Warhead
Dawn of War II
The larger cache helps give the 3960X a 9% advantage over the 2600K in Dawn of War II. At 1680 x 1050 the game isn't entirely GPU bound on our 5870.
DiRT 3
We ran two DiRT 3 benchmarks to get an idea for CPU bound and GPU bound performance. First the CPU bound settings:
DiRT 3 is an example of a CPU bound title (at lower resolutions) that doesn't scale well with core count or cache size. The 3960X is barely 2% faster than the 2600K.
Metro 2033
It is interesting to note that while SNB-E and SNB perform similarly here, both parts do offer a performance improvement over the Gulftown based 990X.
Rage vt_benchmark
While id's long awaited Rage title doesn't exactly have the best benchmarking abilities, there is one unique aspect of the game that we can test: Megatexture. Megatexture works by dynamically taking texture data from disk and constructing texture tiles for the engine to use (note that Rage doesn't store textures in a GPU-usable format). As a result whenever you load a texture, Rage is transcoding the texture on the fly. This is normally done by the CPU.
The Benchmark: vt_ are all the virtual texture commands. Vt_benchmark flushes the texture cache and then times how long it takes to transcode all the textures needed for the current scene, from 1 thread to X threads. Thus when you run vt_benchmark 8, for example, it will benchmark from 1 to 8 threads (the default appears to depend on the CPU you have). Since transcoding is done by the CPU this is a pure CPU benchmark. I present the best case transcode time at the maximum number of concurrent threads each CPU can handle:
Starcraft 2
World of Warcraft
WoW does enjoy the 3960X's larger cache, here we see a 13% increase in performance compared to the regular Sandy Bridge parts.
Power Consumption
At idle, the 3960X's power consumption is barely discernible from the 2600K. Under load however, Sandy Bridge E can draw significantly more power. We measured 35% more power draw over a 2600K. The added power consumption makes sense. The chip has more cores and a larger cache, without introducing a more power efficient architecture or a new manufacturing process.
Overclocked Performance
I mentioned earlier that I hit 4.6GHz on my 3960X sample, if you're curious about just how fast that makes the system have a look at this:
The 3960X at 4.6GHz is almost twice as fast as the Core i7 2600K! The added performance does come at the expense of power consumption:
Under load our overclocked testbed consumes 52% more power than a stock 3960X.
Final Words
There are two aspects of today's launch that bother me: the lack of Quick Sync and the chipset. The former is easy to understand. Sandy Bridge E is supposed to be a no-compromise, ultra high-end desktop solution. The lack of an on-die GPU with Quick Sync support means you have to inherently compromise in adopting the platform. I'm not sure what sort of a solution Intel could've come to (I wouldn't want to give up a pair of cores for a GPU+QuickSync) but I don't like performance/functionality tradeoffs with this class of product. Secondly, while I'm not a SAS user, I would've at least appreciated some more 6Gbps SATA ports on the chipset. Native USB 3.0 support would've been nice as well. Instead what we got was effectively a 6-series chipset with a new name. As Intel's flagship chipset, the X79 falls short.
From left to right: Intel Core i7 (SNB-E), Core i7 (Gulftown), Core i5 (SNB), Core i5 (Clarkdale), Core 2 Duo
LGA-2011, 1366, 1155, 1156, 775
The vast majority of desktop users, even enthusiast-class users, will likely have no need for Sandy Bridge E. The Core i7 3960X may be the world's fastest desktop CPU, but it really requires a heavily threaded workload to prove it. What the 3960X doesn't do is make your gaming experience any better or speed up the majority of desktop applications. The 3960X won't be any slower than the fastest Sandy Bridge CPUs, but it won't be tremendously faster either. The desktop market is clearly well served by Intel's LGA-1155 platform (and its lineage); LGA-2011 is simply a platform for users who need a true powerhouse.
There are no surprises there, we came to the same conclusion when we reviewed Intel's first 6-core CPU last year. If you do happen to have a heavily threaded workload that needs the absolute best performance, the Core i7 3960X can deliver. In our most thread heavy tests the 3960X had no problems outpacing the Core i7 2600K by over 50%. If your livelihood depends on it, the 3960X is worth its entry fee. I suspect for those same workloads, the 3930K will be a good balance of price/performance despite having a smaller L3 cache. I'm not terribly interested in next year's Core i7 3820. Its point is obviously for those users who need the memory bandwidth or PCIe lanes of SNB-E, but don't need more than four cores. I would've liked to have seen a value 6-core offering instead, but I guess with a 435mm2 die size it's a tough sell for Intel management.
Of course compute isn't the only advantage of the Sandy Bridge E platform. With eight DIMM slots on most high end LGA-2011 motherboards you'll be able to throw tons of memory at your system if you need it without having to shop for workstation motherboards with fewer frills.
As for the future of the platform, Intel has already begun talking about Ivy Bridge E. If it follows the pattern set for Ivy Bridge on LGA-1155, IVB-E should be a drop in replacement for LGA-2011 motherboards. The biggest issue there is timing. Ivy will arrive for the mainstream LGA-1155 platforms around the middle of 2012. At earliest, I don't know that we'd see it for LGA-2011 until the end of next year, or perhaps even early 2013 given the late launch of SNB-E. This seems to be the long-term downside to these ultra high-end desktop platforms these days: you end up on a delayed release cadence for each tick/tock on the roadmap. If you've always got to have the latest and greatest, this may prove to be frustrating. Based on what we know of Ivy Bridge however, I suspect that if you're using all six of these cores in SNB-E that you'll wish you had IVB-E sooner, but won't be tempted away from the platform by a quad-core Ivy Bridge on LGA-1155.
I do worry about the long term viability of the ultra high-end desktop platform. As we showed here, some of the gains in threaded apps exceed 50% over a standard Sandy Bridge. That's tangible performance to those who can use it. With the growth in cloud computing it's clear there's demand for these types of chips in servers. I just hope Intel continues to offer a version for desktop users as well.