Thursday, May 3, 2012

VMware CPU Ready, What is it?

It seems like there are still a lot of people out there that don't really understand or even know about CPU ready within your VMware environment. CPU Ready is a metric that is measured in VMware either by % or ms. Depending on where you look depends on which value you get.

CPU Ready
CPU ready is the time a virtual CPU is ready to run but is not being scheduled on a physical CPU. It means that when the OS wants to process something on the processor the hypervisor is saying, ok hang on until it is your turn. Think of the metered ramps when getting onto the highway during rush hour. This happens when you have too many vCPUs on your host. Now a lot of new virtualization admins tend to think that just because you have available CPU resources (usage) then you can throw more vCPUs or VMs on the host. This is not always the case because even though the CPU isn't actively processing data it is still scheduling VMs to process data if they have it. Think if it sort of like round-robin or old token ring network. Everybody gets a turn and you will have to wait your turn. If you add 5 VMs with 1 vCPU on each VM then there are 5 vCPUs that need to be scheduled on the physical host. If you add 5 more VMs with 1 vCPU each then you have 10 vCPUs that need to be scheduled. Now to make matters worse you start adding VMs with multiple vCPUS, lets say 3 VMs with 2 vCPUs each. Now not only do you have 16 vCPUs that need to be scheduled you have 3 pairs of 2 that need to be scheduled a relatively the same time. And all of this is happening even if all of the VMs are idle.

I mentioned that depending on where you look you will see different values for CPU Ready. In the vSphere client you will see Milliseconds (ms). In ESXTOP you will see %RDY (Percent Ready). There is a conversion chart to help show the relationship.

1% = 200ms (0.2 seconds)
5% = 1,000ms (1 second)
10% = 2,000ms (2 seconds)
50% = 10,000ms (10 seconds)
100% = 20,000ms (quit your job before you're fired)

How do I know to look for CPU Ready?
First off you should always be looking at your CPU Ready times. I'm not saying to always have your vSphere client open on the CPU Ready graph, but you should be checking it out often. CPU Ready is probably the easiest thing to notice when it is high. You will start to see lagging inside the VM. The users will notice this when trying to run things within the VM, or you are processing something in the VM and there isn't much CPU utilization when there should be. I first got smacked with CPU Ready when I was managing a large Citrix XenApp farm running on vSphere 3.5 (it was still be called Presentation Server at the time). My users were experiencing lagging within their sessions but the CPU utilization wasn't very high. I had originally given my Citrix servers 2 vCPUs at the time because I thought with so many users and so many threads it will utilize the 2 vCPUs. Boy was I wrong. To start with I was getting high CPU Ready times because I was running about 5 - 2 vCPU Citrix Presentations servers on a host with 2 - dual core processors. So I was running 10 vCPUs on 4 physical cores. That is not good, now to make things worse I was running Citrix on these VMs with about 10-15 users per VM which caused a ton of context switching (Citrix servers have a lot of context switching by themselves, then you add virtualization into it and you multiply the problem) and then compound that on the virtualization context switching and then having to schedule 2 vCPUs at relatively the same time. I know it was crazy. My %Ready was about 8%-14% (1.5s - 3s) which meant my users were not happy because when they clicked nothing came up for a second or two. I ended up dropping my Citrix servers to 1 vCPU and my issues subsided and I was able to keep the user count the same with less latency issues.

I hope this helps you understand CPU Ready and gets you looking for it so you can head it off before it starts disrupting end user experience. Now I'm sure I'll get asked "What is the optimal\acceptable CPU Ready times?" Well that depends on your workload. If your workload can sustain a 5% CPU Ready time then that is optimal. I would try to keep it under 3% if possible, not always possible so at the most 10%, once you start getting above that you start seeing the latency in the VMs.

I recommend you set up a custom alarm in your vSphere client to warn you of high CPU Ready times, or get something like Veeam Monitor that already has a built in alarm for CPU Ready with a warning at 10% and an error at 20%.

5 comments:

  1. This was really helpful! Thanks for sharing this Brandon!

    ReplyDelete
  2. Thank you. I'm glad you found it useful.

    ReplyDelete
  3. Can you base your decision on the summation Ready value or do you base it on all of the individual cores of a VM? I.E. if the individual cores show low ready times but the all add up to a higher Ready time, do I really have a problem?

    ReplyDelete
  4. CPU Ready is not cumulative. If you have 10ms CPU Ready time on 8 cores this does not mean that you have 80ms CPU Ready time on the entire CPU because the delay is still only 10ms per thread running through the CPU.

    ReplyDelete