Hi All, We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit. This time, we are focusing on newer versions and also planning to include Xen HVM and KVM with PV drivers results. As well as adding some more tests. I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for all the guests. Below are some raw kernbench results, which clearly show that I have a problem with Xen HVM. It may just be a configuration issue, but we have tried all that we could think of so far (i.e file:, instead of tap:aio). I have also tried xen-unstable and it doesn''t seem to produce any better results. I am also in the process of trying kernbench on older versions of Xen HVM. here is the xm command line xm create /dev/null name=benchvm0 memory=2048 kernel="/usr/lib/xen/boot/hvmloader" builder="hvm" device_model=/usr/lib64/xen/bin/qemu-dm disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0 vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes" on_poweroff=destroy on_reboot=restart on_crash=preserve I will also consider an IO test, such as iozone to see if the disk IO problems are a cause. The dom0 cpu doesn''t seem to be under much load at all during the kernbench run. System time on the kernbench run is 1/2 of the time, so does that suggest either disk IO or guest scheduling problem? System time on the other cases is 1/4 or less on the other cases. If anybody has any ideas, suggestions, or can even run Xen HVM kernbench vs. native on their setup to compare against that would be very helpful. The system setup is a Intel core2 dual 4 GB of ram. The HVM guest does run the libata driver similar to KVM with emulated drivers. Thanks, Todd KVM PV drivers Average Optimal load -j 4 Run (std deviation): Elapsed Time 527.572 (0.681337) User Time 404.3 (0.982141) System Time 122.552 (0.468636) Percent CPU 99 (0) Context Switches 116020 (180.82) Sleeps 31307 (94.2072) KVM Emulated drivers Average Optimal load -j 4 Run (std deviation): Elapsed Time 527.968 (0.450744) User Time 403.95 (0.342929) System Time 122.134 (0.550709) Percent CPU 99 (0) Context Switches 115907 (214.3) Sleeps 31302.4 (88.7175) Xen PV Average Optimal load -j 4 Run (std deviation): Elapsed Time 446.876 (0.130115) User Time 392.088 (0.339367) System Time 54.76 (0.391088) Percent CPU 99 (0) Context Switches 64601.4 (163.314) Sleeps 31214.8 (183.53) Xen HVM Average Optimal load -j 4 Run (std deviation): Elapsed Time 2081.71 (34.0459) User Time 617.36 (3.61771) System Time 1430.36 (28.3309) Percent CPU 98 (0) Context Switches 331843 (5283.28) Sleeps 37329.8 (91.538) KVM Native (Linux) Average Optimal load -j 8 Run (std deviation): Elapsed Time 216.076 (0.121778) User Time 381.122 (0.259557) System Time 43.242 (0.278783) Percent CPU 196 (0) Context Switches 75483.2 (389.988) Sleeps 38078.8 (354.267) Xen native 2.6.18.8 dom0 kernel Average Optimal load -j 8 Run (std deviation): Elapsed Time 228.504 (0.0808084) User Time 384.014 (0.657632) System Time 64.028 (0.733669) Percent CPU 195.8 (0.447214) Context Switches 35270.4 (264.36) Sleeps 39493.4 (266.222) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
As an update, Xen HVM on Xen 3.2 on Ubuntu 8.04 from packages: Average Optimal load -j 4 Run (std deviation): Elapsed Time 954.4 (4.95457) User Time 441.744 (2.56251) System Time 506.018 (7.45156) Percent CPU 99 (0) Context Switches 160222 (1113.68) Sleeps 37604.8 (182.796) This is actually more what would be expected of Xen 3.2 right? Xen 3.3 should be an improvement with shadow3 right? Should I need to adjust the shadow_memory parameter for the guest? I''m going to try Xen 3.2.1 from source next. Todd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Magenheimer
2008-Sep-10 21:37 UTC
RE: [Xen-devel] Poor performance on HVM (kernbench)
This doesn''t answer the HVM question but it appears that you are running guests with 1 vCPU but comparing against a dual-CPU native. True?> -----Original Message----- > From: Todd Deshane [mailto:deshantm@gmail.com] > Sent: Wednesday, September 10, 2008 12:23 PM > To: xen-devel mailing list > Subject: [Xen-devel] Poor performance on HVM (kernbench) > > > Hi All, > > We are continuing our Xen vs. KVM benchmarking that I > presented at Xen summit. > > This time, we are focusing on newer versions and also planning to > include Xen HVM > and KVM with PV drivers results. As well as adding some more tests. > > I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for > all the guests. > > Below are some raw kernbench results, which clearly show that > I have a problem > with Xen HVM. It may just be a configuration issue, but we have tried > all that we > could think of so far (i.e file:, instead of tap:aio). I have also > tried xen-unstable and > it doesn''t seem to produce any better results. I am also in the > process of trying > kernbench on older versions of Xen HVM. > > here is the xm command line > xm create /dev/null name=benchvm0 memory=2048 > kernel="/usr/lib/xen/boot/hvmloader" builder="hvm" > device_model=/usr/lib64/xen/bin/qemu-dm > disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w > vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0 > vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes" > on_poweroff=destroy on_reboot=restart on_crash=preserve > > I will also consider an IO test, such as iozone to see if > the disk IO problems are a cause. The dom0 cpu > doesn''t seem to be under much load at all during the > kernbench run. > > System time on the kernbench run is 1/2 of the time, so does > that suggest either disk IO or guest scheduling problem? > > System time on the other cases is 1/4 or less on the other > cases. > > If anybody has any ideas, suggestions, or can even run Xen > HVM kernbench > vs. native on their setup to compare against that would be > very helpful. > > The system setup is a Intel core2 dual 4 GB of ram. > The HVM guest does run the libata driver similar to KVM with > emulated drivers. > > Thanks, > Todd > > KVM PV drivers > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 527.572 (0.681337) > User Time 404.3 (0.982141) > System Time 122.552 (0.468636) > Percent CPU 99 (0) > Context Switches 116020 (180.82) > Sleeps 31307 (94.2072) > > > KVM Emulated drivers > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 527.968 (0.450744) > User Time 403.95 (0.342929) > System Time 122.134 (0.550709) > Percent CPU 99 (0) > Context Switches 115907 (214.3) > Sleeps 31302.4 (88.7175) > > Xen PV > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 446.876 (0.130115) > User Time 392.088 (0.339367) > System Time 54.76 (0.391088) > Percent CPU 99 (0) > Context Switches 64601.4 (163.314) > Sleeps 31214.8 (183.53) > > > > Xen HVM > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 2081.71 (34.0459) > User Time 617.36 (3.61771) > System Time 1430.36 (28.3309) > Percent CPU 98 (0) > Context Switches 331843 (5283.28) > Sleeps 37329.8 (91.538) > > > KVM Native (Linux) > > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 216.076 (0.121778) > User Time 381.122 (0.259557) > System Time 43.242 (0.278783) > Percent CPU 196 (0) > Context Switches 75483.2 (389.988) > Sleeps 38078.8 (354.267) > > > Xen native 2.6.18.8 dom0 kernel > > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 228.504 (0.0808084) > User Time 384.014 (0.657632) > System Time 64.028 (0.733669) > Percent CPU 195.8 (0.447214) > Context Switches 35270.4 (264.36) > Sleeps 39493.4 (266.222) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer <dan.magenheimer@oracle.com> wrote:> This doesn''t answer the HVM question but it appears that > you are running guests with 1 vCPU but comparing against > a dual-CPU native. True?Yes. The intuition is that we don''t want to overcommit virtual CPUs since then you are stressing the schedulers more. I ran some tests with 2 vCPUs and all the numbers are a bit higher (as having two CPUs tackling a compile is faster). Although overcommit (of CPUs and memory ;) is interesting, we leave a CPU dedicated to the host system (linux/dom0) on purpose. Todd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Magenheimer
2008-Sep-10 21:45 UTC
RE: [Xen-devel] Poor performance on HVM (kernbench)
Perhaps you should run the native with nosmp then, to ensure the comparison isn''t taken out of context.> -----Original Message----- > From: Todd Deshane [mailto:deshantm@gmail.com] > Sent: Wednesday, September 10, 2008 3:42 PM > To: Daniel Magenheimer > Cc: xen-devel mailing list > Subject: Re: [Xen-devel] Poor performance on HVM (kernbench) > > > On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer > <dan.magenheimer@oracle.com> wrote: > > This doesn''t answer the HVM question but it appears that > > you are running guests with 1 vCPU but comparing against > > a dual-CPU native. True? > > Yes. The intuition is that we don''t want to overcommit virtual CPUs > since then you are stressing the schedulers more. > > I ran some tests with 2 vCPUs and all the numbers are a bit higher (as > having two CPUs tackling a compile is faster). > > Although overcommit (of CPUs and memory ;) is interesting, we leave a > CPU dedicated to the host system (linux/dom0) > on purpose. > > Todd > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane wrote:> On Wed, Sep 10, 2008 at 5:37 PM, Daniel Magenheimer > <dan.magenheimer@oracle.com> wrote: >> This doesn''t answer the HVM question but it appears that >> you are running guests with 1 vCPU but comparing against >> a dual-CPU native. True? > > Yes. The intuition is that we don''t want to overcommit virtual CPUs > since then you are stressing the schedulers more.I think what Dan is getting at is, the native execution run should restrict it''s cpu and memory usage to be identical with the guest tests. So restrict the native test cpus with "maxcpus=1" or "nosmp" on the boot line. Similarly you can restrict memory using "mem=xxxM".> I ran some tests with 2 vCPUs and all the numbers are a bit higher (as > having two CPUs tackling a compile is faster). > > Although overcommit (of CPUs and memory ;) is interesting, we leave a > CPU dedicated to the host system (linux/dom0) > on purpose. > > Todd > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane wrote:> As an update, Xen HVM on Xen 3.2 on Ubuntu 8.04 from packages: > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 954.4 (4.95457) > User Time 441.744 (2.56251) > System Time 506.018 (7.45156) > Percent CPU 99 (0) > Context Switches 160222 (1113.68) > Sleeps 37604.8 (182.796) > > This is actually more what would be expected of Xen 3.2 right?It''s pretty close to what I''ve seen. In my experience with shadow2, xen pv is about twice as fast with kernbench. You''re results for a pv were: Elapsed Time 446.876 (0.130115) So this result is a bit higher than what I''ve seen, but certainly within the realm of possibility.> Xen 3.3 should be an improvement with shadow3 right?I know it is for Windows, but there''s always the possibility that it has caused a regression in Linux performance. Regards, Anthony Liguori> Should I need to adjust the shadow_memory parameter for the guest? > > I''m going to try Xen 3.2.1 from source next. > > Todd_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 10, 2008 at 5:45 PM, Daniel Magenheimer <dan.magenheimer@oracle.com> wrote:> Perhaps you should run the native with nosmp then, to ensure > the comparison isn''t taken out of context. >Dom0, nosmp Average Optimal load -j 4 Run (std deviation): Elapsed Time 428.058 (0.18431) User Time 371.648 (0.86358) System Time 56.346 (0.989181) Percent CPU 99 (0) Context Switches 29144.6 (52.3479) Sleeps 36239.8 (441.198) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I think what Dan is getting at is, the native execution run should restrict > it''s cpu and memory usage to be identical with the guest tests. So restrict > the native test cpus with "maxcpus=1" or "nosmp" on the boot line. > Similarly you can restrict memory using "mem=xxxM".Linux kernel (only had a spare Ubuntu 2.6.24) to work with nosmp mem=2048M Average Optimal load -j 4 Run (std deviation): Elapsed Time 403.744 (0.26857) User Time 373.23 (0.421485) System Time 30.32 (0.389551) Percent CPU 99 (0) Context Switches 90961.2 (105.838) Sleeps 52311.8 (83.8373) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Another quick update: xen-unstable HVM guest Average Optimal load -j 4 Run (std deviation): Elapsed Time 1859.81 (30.4446) User Time 595.756 (3.71579) System Time 1246.4 (25.0328) Percent CPU 98.8 (0.447214) Context Switches 298999 (4736.48) Sleeps 37258.2 (75.3638) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 10, 2008 at 02:23:17PM -0400, Todd Deshane wrote:> Hi All, > > We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit. > > This time, we are focusing on newer versions and also planning to > include Xen HVM > and KVM with PV drivers results. As well as adding some more tests. > > I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for > all the guests. > > Below are some raw kernbench results, which clearly show that I have a problem > with Xen HVM. It may just be a configuration issue, but we have tried > all that we could think of so far (i.e file:, instead of tap:aio).You could also try "phy:" and use raw devices or LVM volumes.. I think this should be the best performing method? Then again I''m pretty sure changing that doesn''t explain/change your (bad) HVM results.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Sep-11 09:35 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
On Wed, Sep 10, 2008 at 11:42 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:>> Xen 3.3 should be an improvement with shadow3 right? > > I know it is for Windows, but there''s always the possibility that it has > caused a regression in Linux performance.Shadow3 was definitely developed with Windows in mind. Since it makes shadows act more like a hardware TLB, I''d expect it to perform better, or at least no worse; but since that''s the biggest change with Xen HVM between 3.2 and 3.3, that''s the first place I''d look. Todd, would it be possible to send me a 30-second xentrace "sample" of kernbench running under 3.2 and 3.3? The relevant command: xentrace -S 256 -e all /tmp/[filename].trace Set the kernbench run going in the guest, let it get going for about 30 seconds or so, and then start xentrace. Let it run for 30 seconds, then kill it. In 3.3, you can use the -T parameter to have it stop after 30 seconds; in 3.2, you can do something like: xentrace -S 256 -e all /tmp/[filename].trace & sleep 30 ; killall -INT xentrace You can send me the files via something like http://yousendit.com. If you could possibly take a trace with a recent xen-unstable build, that would be even more helpful, since there are some key xentrace changes that make the information even more useful. Thanks, -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 10:00 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Todd Deshane wrote:> Xen 3.3 should be an improvement with shadow3 right?As other people already said, shadow3 (especially the current unsync policy) was mostly developed with Windows performance in mind. I would be surprised if the shadow algorithm is causing such a performance loss though; but in any case, you can disable the out-of-sync feature by removing SHOPT_OUT_OF_SYNC from SHADOW_OPTIMIZATIONS in xen/arch/x86/mm/shadows/private.h (setting it to 0xff instead of 0x1ff), reverting the shadow code back to shadow 2 again. Can you please test that and see if it makes any difference? Thanks, Gianluca> > Should I need to adjust the shadow_memory parameter for the guest? > > I''m going to try Xen 3.2.1 from source next. > > Todd > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 15:17 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Hi, Gianluca Guida wrote:> Todd Deshane wrote: >> Xen 3.3 should be an improvement with shadow3 right?I made a few test, in an amd64 kernel, with shadow2 and shadow 3. Results attached. What you can see is that in 1 vcpu environment the two system compare very well (with shadow3 being 1.5% faster that shadow2, system time being much lower). It''s disturbing that in 2 vcpus, instead, the shadow2 is about 11% faster. I''ll try to look at that and make the shadow3 algorithm a bit more linux-friendly but, in general, I don''t think that the slow down was due *only* to shadow3. Was it a 32bit guest? PAE? Thanks, Gianluca 1 vcpu, shadow2 Thu Sep 11 14:09:14 EDT 2008 2.6.18-4-amd64 Average Optimal load -j 4 Run (std deviation): Elapsed Time 414.792 (0.4772) User Time 303.276 (0.508409) System Time 111.148 (0.592891) Percent CPU 99.2 (0.447214) Context Switches 25682.6 (146.046) Sleeps 28827.8 (99.8884) 1 vcpu, shadow3 Thu Sep 11 13:16:23 EDT 2008 2.6.18-4-amd64 Average Optimal load -j 4 Run (std deviation): Elapsed Time 408.948 (1.96799) User Time 326.482 (0.553191) System Time 81.184 (2.0058) Percent CPU 99 (0) Context Switches 25239.6 (205.305) Sleeps 28995 (152.91) 2 vcpus, shadow2 Thu Sep 11 11:59:27 EDT 2008 2.6.18-4-amd64 Average Optimal load -j 8 Run (std deviation): Elapsed Time 223.144 (0.709704) User Time 314.844 (0.933852) System Time 121.536 (1.46827) Percent CPU 195.2 (0.447214) Context Switches 27307.4 (249.907) Sleeps 34875.6 (212.731) 2 vcpus, shadow3 Thu Sep 11 12:32:41 EDT 2008 2.6.18-4-amd64 Average Optimal load -j 8 Run (std deviation): Elapsed Time 251.832 (1.27152) User Time 368.878 (0.745366) System Time 124.472 (1.15751) Percent CPU 195.4 (0.547723) Context Switches 28585.2 (135.509) Sleeps 35620.4 (447.023) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 15:25 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Argh, sorry for putting twice the log. I also forgot to tell that the test was done on a Intel Core2 6420 @ 2.13GHz. Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane
2008-Sep-11 15:30 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
On Thu, Sep 11, 2008 at 5:35 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Wed, Sep 10, 2008 at 11:42 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: >>> Xen 3.3 should be an improvement with shadow3 right? >> >> I know it is for Windows, but there''s always the possibility that it has >> caused a regression in Linux performance. > > Shadow3 was definitely developed with Windows in mind. Since it makes > shadows act more like a hardware TLB, I''d expect it to perform better, > or at least no worse; but since that''s the biggest change with Xen HVM > between 3.2 and 3.3, that''s the first place I''d look. > > Todd, would it be possible to send me a 30-second xentrace "sample" > of kernbench running under 3.2 and 3.3? The relevant command: > > xentrace -S 256 -e all /tmp/[filename].trace > > Set the kernbench run going in the guest, let it get going for about > 30 seconds or so, and then start xentrace. Let it run for 30 seconds, > then kill it. In 3.3, you can use the -T parameter to have it stop > after 30 seconds; in 3.2, you can do something like: > > xentrace -S 256 -e all /tmp/[filename].trace & sleep 30 ; killall -INT xentrace > > You can send me the files via something like http://yousendit.com. > > If you could possibly take a trace with a recent xen-unstable build, > that would be even more helpful, since there are some key xentrace > changes that make the information even more useful. >George: I sent both xen 3.2.1 and xen-unstable straces to you with the service you suggested. Let me know if you have any problems getting them. If anyone else would like to see the traces, just let me know. Cheers, Todd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane
2008-Sep-11 15:35 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
On Thu, Sep 11, 2008 at 11:17 AM, Gianluca Guida <gianluca.guida@eu.citrix.com> wrote:> Hi, > > Gianluca Guida wrote: >> >> Todd Deshane wrote: >>> >>> Xen 3.3 should be an improvement with shadow3 right? > > I made a few test, in an amd64 kernel, with shadow2 and shadow 3. > > Results attached. What you can see is that in 1 vcpu environment the two > system compare very well (with shadow3 being 1.5% faster that shadow2, > system time being much lower). It''s disturbing that in 2 vcpus, instead, the > shadow2 is about 11% faster. I''ll try to look at that and make the shadow3 > algorithm a bit more linux-friendly but, in general, I don''t think that the > slow down was due *only* to shadow3. > > Was it a 32bit guest? PAE? >The guest is 64 bit Can you also run kernbench on native for comparison? We have a fairly similar setup, mine is Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz 4 GB of Ram How much RAM do you have (native and guest)? Are all your tests on Xen unstable with the code changes on and off as you suggested? What is the backend disk type for your HVM guest? What is the kernel in your HVM guest? I will make the same changes to the xen unstable code and re-run kernbench with shadow3 disabled on my system. Thanks, Todd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
another number Xen 3.2.1 HVM guest, much faster than on 3.3/unstable Elapsed Time 834.082 (3.25046) User Time 492.68 (1.61651) System Time 328.778 (1.78148) Percent CPU 98 (0) Context Switches 146272 (437.262) Sleeps 36858 (127.805) On Wed, Sep 10, 2008 at 2:23 PM, Todd Deshane <deshantm@gmail.com> wrote:> Hi All, > > We are continuing our Xen vs. KVM benchmarking that I presented at Xen summit. > > This time, we are focusing on newer versions and also planning to > include Xen HVM > and KVM with PV drivers results. As well as adding some more tests. > > I have setup Xen 3.3 from source, and am using Linux 2.6.27-rc4 for > all the guests. > > Below are some raw kernbench results, which clearly show that I have a problem > with Xen HVM. It may just be a configuration issue, but we have tried > all that we > could think of so far (i.e file:, instead of tap:aio). I have also > tried xen-unstable and > it doesn''t seem to produce any better results. I am also in the > process of trying > kernbench on older versions of Xen HVM. > > here is the xm command line > xm create /dev/null name=benchvm0 memory=2048 > kernel="/usr/lib/xen/boot/hvmloader" builder="hvm" > device_model=/usr/lib64/xen/bin/qemu-dm > disk=file:/root/benchvm/bin/img-perf_xen_hvm_test1/image-0.img,hda,w > vnc=1 vncdisplay=0 vif=mac=AA:BB:CC:DD:EE:00,bridge=br0 > vif=mac=AA:BB:CC:DD:EE:7b,bridge=br1 vncviewer="yes" > on_poweroff=destroy on_reboot=restart on_crash=preserve > > I will also consider an IO test, such as iozone to see if > the disk IO problems are a cause. The dom0 cpu > doesn''t seem to be under much load at all during the > kernbench run. > > System time on the kernbench run is 1/2 of the time, so does > that suggest either disk IO or guest scheduling problem? > > System time on the other cases is 1/4 or less on the other > cases. > > If anybody has any ideas, suggestions, or can even run Xen HVM kernbench > vs. native on their setup to compare against that would be very helpful. > > The system setup is a Intel core2 dual 4 GB of ram. > The HVM guest does run the libata driver similar to KVM with emulated drivers. > > Thanks, > Todd > > KVM PV drivers > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 527.572 (0.681337) > User Time 404.3 (0.982141) > System Time 122.552 (0.468636) > Percent CPU 99 (0) > Context Switches 116020 (180.82) > Sleeps 31307 (94.2072) > > > KVM Emulated drivers > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 527.968 (0.450744) > User Time 403.95 (0.342929) > System Time 122.134 (0.550709) > Percent CPU 99 (0) > Context Switches 115907 (214.3) > Sleeps 31302.4 (88.7175) > > Xen PV > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 446.876 (0.130115) > User Time 392.088 (0.339367) > System Time 54.76 (0.391088) > Percent CPU 99 (0) > Context Switches 64601.4 (163.314) > Sleeps 31214.8 (183.53) > > > > Xen HVM > > Average Optimal load -j 4 Run (std deviation): > Elapsed Time 2081.71 (34.0459) > User Time 617.36 (3.61771) > System Time 1430.36 (28.3309) > Percent CPU 98 (0) > Context Switches 331843 (5283.28) > Sleeps 37329.8 (91.538) > > > KVM Native (Linux) > > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 216.076 (0.121778) > User Time 381.122 (0.259557) > System Time 43.242 (0.278783) > Percent CPU 196 (0) > Context Switches 75483.2 (389.988) > Sleeps 38078.8 (354.267) > > > Xen native 2.6.18.8 dom0 kernel > > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 228.504 (0.0808084) > User Time 384.014 (0.657632) > System Time 64.028 (0.733669) > Percent CPU 195.8 (0.447214) > Context Switches 35270.4 (264.36) > Sleeps 39493.4 (266.222) >-- Todd Deshane http://todddeshane.net check out our book: http://runningxen.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 16:48 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Hello, Todd Deshane wrote:> Can you also run kernbench on native for comparison?I will.> How much RAM do you have (native and guest)?2Gb host, 512Mb guest.> Are all your tests on Xen unstable with the code changes on and off as > you suggested?Yes.> What is the backend disk type for your HVM guest?Here''s my configuration for disks (no stubdomains, btw). disk = [ ''file:/root/prova,hda,w'', ''file:/local-images/debian-40r0-amd64-netinst.iso,hdc:cdrom,r'']> What is the kernel in your HVM guest?It''s the standard debian 4.0 I guess, 2.6.18-4-amd64.> I will make the same changes to the xen unstable code and re-run > kernbench with shadow3 disabled > on my system.Thanks, that would be interesting to know! Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 17:25 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Todd Deshane wrote:> Can you also run kernbench on native for comparison?Here tehy are, with a two CPUs dom0. Thu Sep 11 13:03:57 EDT 2008 2.6.18.8-xen Average Optimal load -j 8 Run (std deviation): Elapsed Time 181.51 (0.550318) User Time 300.494 (0.965572) System Time 54.198 (0.784391) Percent CPU 195 (0.707107) Context Switches 26611.8 (205.029) Sleeps 29778.8 (330.637) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Sep-11 18:07 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
So, the problem appears to be with a ton of brute-force searches to remove writable mappings, both during resync and promotion. My analysis tool is reporting that of the 30 seconds or so in the trace from xen-unstable, the guest spent a whopping 67% in the hypervisor: * 26% doing resyncs as a result of marking another page out-of-sync * 9% promoting pages * 27% resyncing as a result of cr3 switches And almost the entirety of all of those can be attributed to brute-force searches to remove writable mappings. (Caveat emptor: My tool was designed to analyze XenServer product traces, which have a different trace file format than xen-unstable. I''ve just taught it to read the xen-unstable trace formats, so the exact percentages may be incorrect still. But the preponderance of brute-force searches is unmistakable.) The good news is that if we can finger the cause of the brute-force searches, we should be able to reduce all those numbers down to respectable levels; my guess is totaling not more than 5%. -George On Thu, Sep 11, 2008 at 6:25 PM, Gianluca Guida <gianluca.guida@eu.citrix.com> wrote:> Todd Deshane wrote: >> >> Can you also run kernbench on native for comparison? > > Here tehy are, with a two CPUs dom0. > > Thu Sep 11 13:03:57 EDT 2008 > 2.6.18.8-xen > Average Optimal load -j 8 Run (std deviation): > Elapsed Time 181.51 (0.550318) > User Time 300.494 (0.965572) > System Time 54.198 (0.784391) > Percent CPU 195 (0.707107) > Context Switches 26611.8 (205.029) > Sleeps 29778.8 (330.637) > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-11 18:26 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
George Dunlap wrote:> So, the problem appears to be with a ton of brute-force searches to > remove writable mappings, both during resync and promotion. My > analysis tool is reporting that of the 30 seconds or so in the trace > from xen-unstable, the guest spent a whopping 67% in the hypervisor: > * 26% doing resyncs as a result of marking another page out-of-sync > * 9% promoting pages > * 27% resyncing as a result of cr3 switches > And almost the entirety of all of those can be attributed to > brute-force searches to remove writable mappings.Fantastic (well, sort of)! If I understand it correctly, Todd is using PV drivers in Linux HVM guests, so the reason for brute-force search is due to former L1 page-tables being used as I/O pages, not being unshadowed because they can get writable mappings out of it. It is, shortly, an unshadowing problem. Should be `easy` to fix. I wasn''t using PV drivers, so I was not experiencing this behaviour. Or, it could be a fixup table bug, but I doubt it. George, did you saw excessive fixup faults in the trace? Todd, could you try without PV drivers (plain qemu emulation) and see if the results get better? Thanks, Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane
2008-Sep-11 19:04 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
On Thu, Sep 11, 2008 at 2:26 PM, Gianluca Guida <gianluca.guida@eu.citrix.com> wrote:> George Dunlap wrote: >> >> So, the problem appears to be with a ton of brute-force searches to >> remove writable mappings, both during resync and promotion. My >> analysis tool is reporting that of the 30 seconds or so in the trace >> from xen-unstable, the guest spent a whopping 67% in the hypervisor: >> * 26% doing resyncs as a result of marking another page out-of-sync >> * 9% promoting pages >> * 27% resyncing as a result of cr3 switches >> And almost the entirety of all of those can be attributed to >> brute-force searches to remove writable mappings. > > Fantastic (well, sort of)! > > If I understand it correctly, Todd is using PV drivers in Linux HVM guests, > so the reason for brute-force search is due to former L1 page-tables being > used as I/O pages, not being unshadowed because they can get writable > mappings out of it. > It is, shortly, an unshadowing problem. Should be `easy` to fix. I wasn''t > using PV drivers, so I was not experiencing this behaviour. > > Or, it could be a fixup table bug, but I doubt it. > > George, did you saw excessive fixup faults in the trace? > > Todd, could you try without PV drivers (plain qemu emulation) and see if the > results get better?To the best of my knowledge, I am not using PV on HVM drivers since they are not upstream. I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers and KVM virtio built in. Am I mistaken? I am running the test with shadow3 disabled now. I''ll report the results when they come out. Any other suggestions or things for me to try, let me know. Cheers, Todd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Todd Deshane
2008-Sep-11 19:54 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
> I am running the test with shadow3 disabled now. > I''ll report the results when they come out.So with shadow3 disabled, the kernbench time is much more reasonable. Better than 3.2.1 even. xen-unstable HVM guest Average Optimal load -j 4 Run (std deviation): Elapsed Time 737.144 (5.19414) User Time 498.508 (2.52895) System Time 235.056 (2.71348) Percent CPU 99 (0) Context Switches 133127 (823.517) Sleeps 36295.4 (124.088) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-11 20:02 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Todd Deshane wrote:> To the best of my knowledge, I am not using PV on HVM drivers since > they are not upstream. > I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers > and KVM virtio built in. > Am I mistaken? >No, there''s no pv-hvm driver support upstream yet. When I get around to it I intend adding support for the pagetable shootdown paravirtualization too, so that unshadowing shouldn''t be a problem. (Is that in xen-unstable yet?) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Sep 11, 2008 at 01:02:51PM -0700, Jeremy Fitzhardinge wrote:> > To the best of my knowledge, I am not using PV on HVM drivers since > > they are not upstream. > > I am using 2.6.27-rc4 xen domU kernel, with the normal xen PV drivers > > and KVM virtio built in. > > Am I mistaken? > > > > No, there''s no pv-hvm driver support upstream yet. When I get around to > it I intend adding support for the pagetable shootdown > paravirtualization too, so that unshadowing shouldn''t be a problem.Hmm, this came up recently, but I don''t remember seeing this. Sounds interesting, is there somewhere we can read more? regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianluca Guida
2008-Sep-12 10:41 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Jeremy Fitzhardinge wrote:> No, there''s no pv-hvm driver support upstream yet. When I get around to > it I intend adding support for the pagetable shootdown > paravirtualization too, so that unshadowing shouldn''t be a problem.Oh, OK. I didn''t know that. The only good news is that this behavior is not expected by the shadow3 algorithm, since most of the developing time has been spent in ways to prevent completely brute-force search of writable mappings. I''ll get back with a patch to fix this after I can reproduce it on my machine. Thanks, Gianluca _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Sep-12 11:04 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
On Thu, Sep 11, 2008 at 7:26 PM, Gianluca Guida <gianluca.guida@eu.citrix.com> wrote:> Or, it could be a fixup table bug, but I doubt it. > > George, did you saw excessive fixup faults in the trace?No, nothing excessive; 273,480 over 30 seconds isn''t that bad. The main thing was that out of 15024 attempts to remove writable mappings, 13775 had to fall back to a brute-force search. Looking at the trace, I can''t really tell why there should be a problem... I''m seeing tons of circumstances where there should only be one writable mapping, but it falls back to brute-force search anyway. Here''s an example: 24.999159660 -x vmexit exit_reason EXCEPTION_NMI eip 2b105dcee330 24.999159660 -x wrmap-bf gfn 7453c 24.999159660 -x fixup va 2b105f000000 gl1e 800000005caf0067 flags (60c)-gp-Pw------ 24.999748980 -x vmentry [...] 24.999759577 -x vmexit exit_reason EXCEPTION_NMI eip ffffffff8022a3b0 24.999759577 -x fixup:unsync va ffff88007453c008 gl1e 7453c067 flags (c000c)-gp------ua- 24.999762562 -x vmentry [...] 25.002946338 -x vmexit exit_reason CR_ACCESS eip ffffffff80491a63 25.002946338 -x wrmap-bf gfn 7e18c 25.002946338 -x oos resync full gfn 7e18c 25.002946338 -x wrmap-bf gfn 7453c 25.002946338 -x oos resync full gfn 7453c 25.003526640 -x vmentry Here we see gfn 7453c: * promoted to be a shadow (the big ''P'' in the flag string); at the vmentry, there should be no writable mappings. * marked out-of-sync (one writable mapping, with fixup table) * re-sync''ed because of a CR write, and a brute-force search. Note that the times behind the "wrmap-bf" and "oos resync full" are not valid; but the whole vmexit->vmentry arc takes over 1.5 milliseconds. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2008-Sep-12 11:19 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
Ah, that''s the problem... Linux seems to have changed the location of the 1:1 map. Gianluca''s using an older kernel, where it''s at 0xffff810000000000, but this trace has it at 0xffff880000000000, so the "guess" heuristic is missing. Jereme, is this a permanent long-term move, or is it going to be something random? I.e., should we just add a new heuristic "guess" at this address, or do we need to do something more complicated? That will solve brute-force searched for promotions, but the fixup table for out-of-sync mappings should still be fixed... -George On Fri, Sep 12, 2008 at 12:04 PM, George Dunlap <dunlapg@umich.edu> wrote:> On Thu, Sep 11, 2008 at 7:26 PM, Gianluca Guida > <gianluca.guida@eu.citrix.com> wrote: >> Or, it could be a fixup table bug, but I doubt it. >> >> George, did you saw excessive fixup faults in the trace? > > No, nothing excessive; 273,480 over 30 seconds isn''t that bad. The > main thing was that out of 15024 attempts to remove writable mappings, > 13775 had to fall back to a brute-force search. > > Looking at the trace, I can''t really tell why there should be a > problem... I''m seeing tons of circumstances where there should only be > one writable mapping, but it falls back to brute-force search anyway. > Here''s an example: > > 24.999159660 -x vmexit exit_reason EXCEPTION_NMI eip 2b105dcee330 > 24.999159660 -x wrmap-bf gfn 7453c > 24.999159660 -x fixup va 2b105f000000 gl1e 800000005caf0067 flags > (60c)-gp-Pw------ > 24.999748980 -x vmentry > [...] > 24.999759577 -x vmexit exit_reason EXCEPTION_NMI eip ffffffff8022a3b0 > 24.999759577 -x fixup:unsync va ffff88007453c008 gl1e 7453c067 flags > (c000c)-gp------ua- > 24.999762562 -x vmentry > [...] > 25.002946338 -x vmexit exit_reason CR_ACCESS eip ffffffff80491a63 > 25.002946338 -x wrmap-bf gfn 7e18c > 25.002946338 -x oos resync full gfn 7e18c > 25.002946338 -x wrmap-bf gfn 7453c > 25.002946338 -x oos resync full gfn 7453c > 25.003526640 -x vmentry > > Here we see gfn 7453c: > * promoted to be a shadow (the big ''P'' in the flag string); at the > vmentry, there should be no writable mappings. > * marked out-of-sync (one writable mapping, with fixup table) > * re-sync''ed because of a CR write, and a brute-force search. > > Note that the times behind the "wrmap-bf" and "oos resync full" are > not valid; but the whole vmexit->vmentry arc takes over 1.5 > milliseconds. > > -George >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2008-Sep-12 14:20 UTC
Re: [Xen-devel] Re: Poor performance on HVM (kernbench)
George Dunlap wrote:> Ah, that''s the problem... Linux seems to have changed the location of > the 1:1 map. Gianluca''s using an older kernel, where it''s at > 0xffff810000000000, but this trace has it at 0xffff880000000000, so > the "guess" heuristic is missing. > > Jereme, is this a permanent long-term move, or is it going to be > something random? I.e., should we just add a new heuristic "guess" at > this address, or do we need to do something more complicated? >It''s a permanent move. I moved it up to 0xffff880000000000 to leave space for Xen when running as a PV kernel, but there''s no reasonable way to make it variable so the linear map will be there regardless of what mode the kernel is operating in. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel