ok

25/01/11

Multi-Core Scaling In A KVM Virtualized Environment

Earlier this week we published benchmarks comparing Oracle VM VirtualBox to Linux KVM and the Linux system host performance. Some of the feedback from individuals said that it was a bad idea using the Intel Core i7 "Gulftown" with all twelve of its CPU threads available to the hardware-virtualized guest since virtualization technologies are bad in dealing with multiple virtual CPUs. But is this really the case? With not being able to find any concrete benchmarks in that area, we carried out another set of tests to see how well the Linux Kernel-based Virtual Machine scales against the host as the number of CPU cores available to each is increased. It can be both good and bad for Linux virtualization.
This series of tests was again carried out on the Intel Core i7 970 "Gulftown" system with its six physical cores plus Hyper Threading to provide a total count of 12 threads. While Intel’s next-generation products will soon outdo this CPU, the i7 970 has a base frequency of 3.2GHz and a turbo frequency of 3.46GHz. There is 12MB of "Smart Cache" between the cores, support for SSE 4.2, and the latest Intel Virtualization Technology capabilities for providing the best Linux virtualization experience.
The motherboard was still the ASRock X58 SuperComputer, since from its BIOS it allows manipulating the number of enabled CPU cores as well as Hyper Threading, which allows us to easily adjust the number of cores during the testing process. We previously used this for looking at the LLVMpipe scaling performance with the same Intel CPU. Other hardware included 3GB of DDR3 system memory, 320GB Seagate ST3320620AS HDD, and a NVIDIA GeForce GTX 460 graphics card.
For the tests published earlier this week we used Ubuntu 10.10, however, at the request of Red Hat's virtualization group, we switched to Fedora 14 for this testing to represent more a more recent and proper KVM virtualization experience. Fedora 14 x86_64 has the Linux 2.6.35 kernel, GNOME 2.32.0, X.Org Server 1.9.0, GCC 4.5.1, and an EXT4 file-system. Fedora 14 was used on both the host and guest virtualized instance.
To look at the multi-core virtualization performance we tested the system host and KVM virtualized instance when available were 1, 2, 4, 6, and 12 cores. All except for the 12 core testing was done when simply enabling the respective number of CPU cores on the Core i7 970 and then with 12 cores, all CPU cores were enabled plus flipping on Hyper Threading.
The SMP virtual test suite available within Phoronix Test Suite 3.0 "Iveland" was used as our battery of CPU-focused benchmarks to see how well virtualized guests perform and scale to  multiple cores. These tests include Apache, a timed compilation process of Apache, C-Ray, CLOMP, 7-Zip compression, PBZIP2 compression, GraphicsMagick, HMMer, NASA NAS Parallel Benchmarks, Smallpt, TTSIOD Renderer, and x264. Now let us see whether "Virtualization is known to work badly with virtual CPUs" is fact or fiction!


When starting with the Apache web-server benchmark, the KVM virtualized guest does perform very poorly against the system's host performance. The performance between the two instances were close when only one CPU core was enabled, while the system's host performance went up linearly until hitting four cores and from there began to flatten out in this web benchmark. The KVM instance meanwhile was flat the entire time and did no scaling at all with the CPU core count.
Our Apache web-server benchmark was a bit shocking with the KVM guest not changing at all, but when moving onto other tests it was a different story. With the timed Apache compilation, the guest was expectedly slower than the host was, but it scaled at each step of the way to the same abilities as the host operating system. This is an example of the virtualization scaling performance being done well.
C-Ray is one of the test profiles where our earlier virtualization tests have shown its performance to be nearly at the same level as the host. Today's tests show that not only this ray-tracing software runs the same as the Linux host with one core enabled or with all cores enabled, but at each step of the way too. There is no overhead of the virtualized guest due to the number of "virtual CPUs" within the KVM guest.
CLOMP is one of the newest test profiles to OpenBenchmarking.org / Phoronix Test Suite 3.0 and it is a government test looking at the OpenMP efficiency across multiple cores. The CLOMP test shows the host and guest speeding up the same up until four cores are hit. Once enabling six or twelve cores, the virtualized guest was much less efficient than the host.

It is a similar story with the 7-Zip compression benchmark where the efficiency of the guest begins to deviate from the host when having more than four CPU cores to tap.
The Parallel BZIP2 test did not illustrate this problem and was like the C-Ray results where it scaled very well with the increasing core count.
With the OpenMP-powered GraphicsMagick resizing test, the virtualized guest performance actually dropped when six and twelve cores were enabled.
With the image sharpening operation in GraphicsMagick, at least the performance didn't degrade when going beyond four cores, but it wasn't as fast as the system host.

For both the system host and KVM guest, the CG.B test in NPB did not take advantage of more than four cores.
In the NPB EP.B test, however, it did continue scaling up to 12 threads.
With LU.A, the performance dropped off for the KVM guest after four cores.
Running in a virtualized environment, regardless of the core count, minimally affects Smallpt, like C-Ray.

The TTSIOD 3D Renderer results for the KVM Fedora 14 guest were interesting and similar to one of the GraphicsMagick results from earlier where having six or twelve cores available to the guest instance had negatively affected the performance. This was to the point that having 12 cores available to the guest running TTSIOD was at the same speed as having one core available, while four cores was the sweet spot running more than twice as fast. This is while the host Fedora 14 continued taking advantage of the extra threads on the Intel Core i7.
Lastly, with the x264 media encoding benchmark, with one and two cores enabled the performance was close between the host and guest, but the VT-x virtualized guest began to stray as the core count increased.
"Virtualization is known to work badly with virtual CPUs." So is that a fact? Not entirely. There are some cases in the results published today where the KVM guest didn't scale too well after a certain point, but there are also cases where the Kernel-based Virtual Machine guest running Fedora 14 had no problems running to the same speed as the Fedora 14 through the available 12 threads on the Intel Core i7 970 processor. It was really a mixed bag, but regardless, there is always room to optimize the Linux virtualization performance in a multi-threaded environment. This though could potentially be a greater issue as CPUs continue gaining more processing cores.



0 comments: