Dynamic Power Management: A Quantitative Approach
by Johan De Gelas on January 18, 2010 2:00 AM EST- Posted in
- IT Computing
Not So Fast!
Power management, especially dynamic voltage and frequency scaling, does come with a performance cost. Since its introduction both Intel and AMD have been claiming that this performance cost is "negligible", but we all know better now. On dual-core Athlon X2 and Phenom I, it was for example impossible to use DVFS and get decent HD-video decoding. There are three important performance problems with dynamic power management:
- Transitioning from one P-state to another takes a while, especially if you scale up.
- Active cores will probe idle or lower P-state cores quite frequently.
- The OS power manager has to predict whether or not the process will need more processing power soon or not. As a result the OS transitions a lot slower than the hardware.
Suppose that the OS decides that the CPU can clock down to a lower P-state. Just a few ms later, a running process requires a lot more performance. The result is that the voltage must be increased and this takes a while. During that time, the CPU is wasting more power than it should: processing is suspended for a small time and the clock speed cannot increase unless the higher voltage is reached and is stable enough. If this scenario is repeated a lot, the small power savings of going to a lower P-state will be overshadowed by the power losses of scaling quickly back up to a higher clock and voltage. It is important to understand that each voltage increase results in a small period where power is wasted without any processing happening. The same problem is true for entering a C-state: enter it too quickly and performance is lowered as it takes some time to wake that core up again.
The last problem is a bit more subtle: if you lower the P-state of one core, another core that sends a snoop towards this "slow" core will get a much slower answer. As a result the performance of the active core will be lower. According to some researchers [5], this performance decrease is about 5% at 800MHz on a "Barcelona" Opteron. If P-states could go as low as 400MHz, the performance impact would be 30% and more! That is the reason why lower P-states are not used: a core with P-states lower than 800MHz would wreak havoc on the performance/watt ratio of the CPU. That is also why "Smart Fetch" dumps the L1 and L2 caches in the L3 cache. This avoids not only waking the idle core up too soon, but it also avoids the performance hit associated with snooping a "napping" core. Intel's CPUs do not have this problem: the inclusive nature of the L3 cache means that if data cannot be found in the L3 cache, you will not find that data in any core's L1 or L2 caches.
The bottom line is that power management is quite complex: there is no silver bullet. Go to low/idle states too quickly and you end up burning more power while delivering less performance. At the same time, if the OS keeps the clock speed too high, the CPU might never achieve decent power savings. The OS must take into account the most likely behavior of the application and the capabilities of the hardware.
35 Comments
View All Comments
JohanAnandtech - Tuesday, January 19, 2010 - link
Well, Oracle has a few downsides when it comes to this kind of testing. It is not very popular in the smaller and medium business AFAIK (our main target), and we still haven't worked out why it performs much worse on Linux than on Windows. So chosing Oracle is a sure way to make the projecttime explode...IMHO.ChristopherRice - Thursday, January 21, 2010 - link
Works worse on Linux then windows? You have a setup issue likely with the kernel parameters or within oracle itself. I actually don't know of any enterprise location that uses oracle on windows anymore. "Generally all Rhel4/Rhel5/Sun".TeXWiller - Monday, January 18, 2010 - link
The 34xx series supports four quad rank modules, giving today a maximum supported amount of 32GB per CPU (and board). The 24GB limit is that of the three channel controller with unbuffered memory modules.pablo906 - Monday, January 18, 2010 - link
I love Johan's articles. I think this has some implications in how virtualization solutions may be the most cost effective. When you're running at 75% capacity on every server I think the AMD solution could have possibly become more attractive. I think I'm going to have to do some independent testin in my datacenter with this.I'd like to mention that focusing on VMWare is a disservice to Vt technology as a whole. It would be like not having benchmarked the K6-3+ just because P2's and Celerons were the mainstream and SS7 boards weren't quite up to par. There are situations, primarily virtualizing Linux, where Citrix XenServer is a better solution. Also many people who are buying Server '08 licenses are getting Hyper-V licenses bundled in for "free."
I've known several IT Directors in very large Health Care organization who are deploying a mixed Hyper-V XenServer environment because of the "integration" between the two. Many of the people I've talked to at events around the country are using this model for at least part of the Virtualization deployments. I believe it would be important to publish to the industry what kind of performance you can expect from deployments.
You can do some really interesting HomeBrew SAN deployments with OpenFiler or OpeniSCSI that can compete with the performance of EMC, Clarion, NetApp, LeftHand, etc. NFS deployments I've found can bring you better performance and manageability. I would love to see some articles about the strengths and weaknesses of the storage subsystem used and how it affects each type of deployment. I would absolutely be willing to devote some datacenter time and experience with helping put something like this together.
I think this article really lends itself well into tieing with the Virtualization talks and I would love to see more comments on what you think this means to someone with a small, medium, and large datacenter.
maveric7911 - Tuesday, January 19, 2010 - link
I'd personally prefer to see kvm over xenserver. Even redhat is ditching xen for kvm. In the environments I work in, xen is actually being decommissioned for VMware.JohanAnandtech - Tuesday, January 19, 2010 - link
I can see the theoretical reasons why some people are excited about KVM, but I still don't see the practical ones. Who is using this in production? Getting Xen, VMware or Hyper-V do their job is pretty easy, KVM does not seem to be even close to being beta. It is hard to get working, and it nowhere near to Xen when it comes to reliabilty. Admitted, those are our first impressions, but we are no virtualization rookies.Why do you prefer KVM?
VJ - Wednesday, January 20, 2010 - link
"It is hard to get working, and it nowhere near to Xen when it comes to reliabilty. "I found Xen (separate kernel boot at the time) more difficult to work with than KVM (kernel module) so I'm thinking that the particular (host) platform you're using (windows?) may be geared towards one platform.
If you had to set it up yourself then that may explain reliability issues you've had?
On Fedora linux, it shouldn't be more difficult than Xen.
Toadster - Monday, January 18, 2010 - link
One of the new technologies released with Xeon 5500 (Nehalem) is Intel Intelligent Power Node Manager which controls P/T states within the server CPU. This is a good article on existing P/C states, but will you guys be doing a review of newer control technologies as well?http://communities.intel.com/community/openportit/...">http://communities.intel.com/community/...r-intel-...
JohanAnandtech - Tuesday, January 19, 2010 - link
I don't think it is "newer". Going to C6 for idle cores is less than a year old remember :-).It seems to be a sort of manager which monitors the electrical input (PDU based?) and then lowers the p-states to keep the power at certain level. Did I miss something? (quickly glanced)
I think personally that HP is more onto something by capping the power inside their server management software. But I still have to evaluate both. We will look into that.
n0nsense - Monday, January 18, 2010 - link
May be i missed something in the article, but from what I see at home C2Q (and C2D) can manage frequencies per core.i'm not sure it is possible under Windows, but in Linux it just works this way. You can actually see each core at its own frequency.
Moreover, you can select for each core which frequency it should run.