High-End x86: The Nehalem EX Xeon 7500 and Dell R810
by Johan De Gelas on April 12, 2010 6:00 PM EST- Posted in
- IT Computing
- Intel
- Nehalem EX
AMD Opteron and Intel Xeon SKUs
The Intel slide below gives you an overview of the available SKUs.
Only the top Xeon X7560 gets the massive 24MB L3 cache. The two top CPUs, the X7560 and X7550, have eight cores and can speed up their clock speed by 400MHz (via Turbo Boost) if several cores are idle. That is quite handy: if there is no virtualization layer, single threaded tasks do happen quite often even on an eight socket, 64 core machine. Speeding those tasks up by 20% can save some valuable time. When all cores are busy but not at 100% load, the CPU will probably be capable of running a speed bin higher. For example, Intel's performance engineers in Portland report that the SAP benchmark, not really a low CPU load workload, runs about 3% faster with Turbo Boost.
For all Windows 2008 users, Turbo Boost is probably going to be disabled. We learned through experimentation that the most likely power plan, "balanced", does not use Turbo Boost. Turbo Boost is only available when you use the "high performance" setting. Use this power plan setting and Windows 2008 R2 uses the highest clock possible. Look at the picture below.
Our current Linux build, SUSE SLES 11 (kernel 2.6.27 SMP x86-64), does not have that problem. The most aggressive performance plan "low latency computing" sets the Nehalem EX X7560 at 2.26 when running idle and not at 2.66GHz. Let us check out the pricing.
Intel Xeon model | Cores | TDP | Speed (GHz) | Price | AMD Opteron model | Cores | ACP/TDP | GHz | Price |
X7560 | 8 | 130W | 2.26 | $3692 | |||||
X7550 | 8 | 130W | 2.00 | $2729 | 6176 SE | 12 | 105/137W | 2.3 | $1386 |
E7540 | 6 | 105W | 2.00 | $1980 | |||||
E7530 | 6 | 105W | 1.86 | $1391 | 6174 | 12 | 80/115W | 2.2 | $1165 |
E7520 | 4 | 95W | 1.86 | $856 | 6172 | 12 | 80/115W | 2.1 | $989 |
X7542 | 4 | 130W | 3.46 | $1663 | |||||
X6550 | 8 | 130W | 2.00 | $2461 | 6176 SE | 12 | 105/137W | 2.3 | $1386 |
E6540 | 6 | 105W | 2.00 | $1712 | 6174 | 12 | 80/115W | 2.2 | $1165 |
E6510 | 4 | 105W | 1.73 | $744 | 6168 | 12 | 80/115W | 1.9 | $744 |
6136 | 8 | 80/115W | 2.4 | $744 | |||||
L7555 | 8 | 95W | 1.86 | $3157 | 6164 HE | 12 | 65/85W | 1.7 | $744 |
L7545 | 6 | 95W | 1.86 | $2087 | 6128 HE | 8 | 65/85W | 2.0 | $523 |
6124 HE | 8 | 65/85W | 1.8 | $455 |
The different strategies of Intel and AMD get tangible when you look at the price list. Intel wants "RISC-like" prices for its best CPUs, with four CPUs costing $8000 to $12000. Those markets that demand the reliability features for running expensive applications will not worry about this. But if those reliability features are not on the top of checklist and price/performance is, AMD's aggressive pricing is very attractive. AMD's cores might be slower, but AMD offers more cores at higher clock speeds at a lower price. Four of the best Opteron 6100s will lower your budget by $4000 to $5500.
The low power versions of the Xeon Nehalem EX are unattractive. For example, the TDP of the L7545 is only 10W lower than the E7540 but Intel demands a $700 premium. The fact that the Nehalem EX is still a 45nm chip seems to have limited the options for low power chips. Of course, the demand for lower power chips in quad-CPU machines is low, albeit growing.
The Xeon 6500 series are no bargains either. Limited to only two CPUs, they lack the scalability of the 7500 series. The only thing you get in exchange is a meager $300 price cut. The 6500 series do make sense, as they can use up to 32 DIMMs and have all the reliability features of their big brothers. But Intel missed a chance here: many people are RAM limited, not CPU limited, when virtualizing. And not all of them are willing to pay a premium for reliability features. These people will probably turn to AMD.
It all boils down to two questions: how much memory do you need and how much are you willing to pay for reliability features? The more memory and the more reliability you demand, the more the 7500 series reliability features make sense—ERP and database applications are a prime candidate. For virtualization, you have two options. Some of you might prefer fewer, more reliable machines. Others may leverage the fact that HA (High Availability) is pretty easy with modern virtualization platforms and go for less RAS feature rich servers and get the availability by HA software instead of RAS features. Our first bet is that the low pricing of the AMD quad-CPU servers might seduce a lot of system administrators to go for the latter. Let us know what you prefer and why.
23 Comments
View All Comments
dastruch - Monday, April 12, 2010 - link
Thanks AnandTech! I've been waiting for an year for this very moment and if only those 25nm Lyndonville SSDs were here too.. :)thunng8 - Monday, April 12, 2010 - link
For reference, IBM just released their octal chip Power7 3.8Ghz result for the SAP 2 tier benchmark. The result is 202180 saps for approx 2.32x faster than the Octal chipNehalem-EXJammrock - Monday, April 12, 2010 - link
The article cover on the front page mentions 1 TB maximum on the R810 and then 512 GB on page one. The R910 is the 1TB version, the R810 is "only" 512GB. You can also do a single processor in the R810. Though why you would drop the cash on an R810 and a single proc I don't know.vol7ron - Tuesday, April 13, 2010 - link
I wish I could afford something like this!I'm also curious how good it would be at gaming :) I know in many cases these server setups under-perform high end gaming machines, but I'd settle :) Still, something like this would be nice for my side business.
whatever1951 - Tuesday, April 13, 2010 - link
None of the Nehalem-EX numbers are accurate, because Nehalem-EX kernel optimization isn't in Windows 2008 Enterprise. There are only 3 commercial OSes right now that have Nehalem-EX optimization: Windows Server R2 with SQL Server 2008 R2, RHEL 5.5, SLES 11, and soon to be released CentOS 5.5 based on RHEL 5.5. Windows 2008 R1 has trouble scaling to 64 threads, and SQL Server 2008 R1 absolutely hates Nehalem-EX. You are cutting Nehalem-EX benchmarks short by 20% or so by using Windows 2008 R1.The problem isn't as severe for Magny cours, because the OS sees 4 or 8 sockets of 6 cores each via the enumerator, thus treats it with the same optimization as an 8 socket 8400 series CPU.
So, please rerun all the benchmarks.
JohanAnandtech - Tuesday, April 13, 2010 - link
It is a small mistake in our table. We have been using R2 for months now. We do use Windows 2008 R2 Enterprise.whatever1951 - Tuesday, April 13, 2010 - link
Ok. Change the table to reflect Windows Server 2008 R2 and SQL Server 2008 R2 information please.Any explanation for such poor memory bandwidth? Damn, those SMBs must really slow things down or there must be a software error.
whatever1951 - Tuesday, April 13, 2010 - link
It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps. Can you remove half of the memory dimms to make sure that it isn't Dell's flex memory technology that's slowing things down intentionally to push sales toward R910?whatever1951 - Tuesday, April 13, 2010 - link
As far as I know, when you only populate two sockets on the R810, the Dell R810 flex memory technology routes the 16 dimms that used to be connected to the 2 empty sockets over to the 2 center CPUs, there could be significant memory bandwidth penalties induced by that.whatever1951 - Tuesday, April 13, 2010 - link
"This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers."Sorry, should have read a little slower. Damn, Dell cut half the memory channels from the R810!!!! That's a retarded design, no wonder the memory bandwidth is so low!!!!!