Xiang, Kai
2008-Dec-22 10:16 UTC
[Xen-devel] [PATCH] Quick path for PIO instructions which cut more than half of the expense
Hi all: Happy Holidays to you :) We found the PIO instruction path is changed in the Xen 3.3 tree compare to earlier Xen 3.1 tree. We suspect this will put more burdens for Xen itself, which hurt the performance. This patch is worked out to address this issue (c/s: 18933), which gives a short path for none-string PIO. To demonstrate how much performance influence this could bring in, we have experiment/data as below: 1) Direct TSC Data from Xentrace We use a small piece of code to read port repetitively which runs in a RHEL5 guest. And collect the xentrace data at the same time. We see the TSC from VMEXIT to blocked_to_runnable (This could be viewed as one indicator for code path handling VMEXIT inside) is ~59% cut off (from 2616 to 1064) 2) Port IO TSC observed from this piece of code This includes the response from the QEMU side. While we can also get about ~18% TSC reduced for one simple PIO (From 16112 to 13296) 3) The influence for more realistic workloads: We tested on Windows 2003 Server Guest, while using IOmeter to run a Disk bound test, the IO pattern is "Default" which use 67% random read and 33% random write with 2K request size. To reduce the influence of file cache, I run 3 times (1 minutes each) from the start of the computer (both xen and the guest) Compare before and after IO per second (3 runs) | average response time (3 runs) ---------------------------------------------------------------- Before: 100.004; 109.447; 110.801 | 9.988; 9.133; 9.022 After: 101.951; 110.893; 114.179 | 9.806; 9.016; 8.756 ------------------------------------------------------------------ So we are having a 1%~3% percent IO performance gain while reduce the average response time by 2%~3% at the same time. Considering this is just an ordinary SATA disk and an IO bound workload, we are expecting more with faster Disks and more cached IO. &BTW: And I also fix one wrong comments in the patch. Looking forwards your feedback, Thanks in advantages. Best wishes Kai -------------------------------------------------------------------- Backups: We attached the piece of test code in the attachments also: And the configurations as below: Intel(r) Supermicro Tylersburg-EP Server System CPU Info: 2x Quad-core processor 2.8GHZ with 8MB L3 Cache (Nehalem) Disk: Seagate SATA 500G Memory Info 12GB memory (12 x 1GB DDR3 1066MHZ) Guest configurations: Memory: 512MB VCPU: 1 Device model: Stub domain IO meter test status: Two visual disks used: hda for system and hdb for test hard disk for IO meter. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Dec-22 10:43 UTC
Re: [Xen-devel] [PATCH] Quick path for PIO instructions which cut more than half of the expense
On 22/12/2008 10:16, "Xiang, Kai" <kai.xiang@intel.com> wrote:> 3) The influence for more realistic workloads: > We tested on Windows 2003 Server Guest, while using IOmeter to run a Disk > bound test, the IO pattern is "Default" which use 67% random read and 33% > random write with 2K request size. > To reduce the influence of file cache, I run 3 times (1 minutes each) from the > start of the computer (both xen and the guest) > > Compare before and after > IO per second (3 runs) | average response time (3 runs) > ---------------------------------------------------------------- > Before: 100.004; 109.447; 110.801 | 9.988; 9.133; 9.022 > After: 101.951; 110.893; 114.179 | 9.806; 9.016; 8.756 > ------------------------------------------------------------------ > > So we are having a 1%~3% percent IO performance gain while reduce the average > response time by 2%~3% at the same time. > Considering this is just an ordinary SATA disk and an IO bound workload, we > are expecting more with faster Disks and more cached IO.The difference is in the noise, almost. Not, I think, sufficient for me ever to want to see the VMX PIO code back in Xen ever again. Those who actually care about performance would run PV drivers anyway, and see much greater speedups. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Xiang, Kai
2008-Dec-23 14:03 UTC
RE: [Xen-devel] [PATCH] Quick path for PIO instructions which cut more than half of the expense
Thanks Keir for the comments! For the noise problem, in experiment 1 and 2, thousands of IO traced show the data is pretty stable. For experiment 3, to fight against the noise, I increase the time from 1 minute to 3 minutes each and try another four runs each. After that, I can still see 2%~ 3% performance gains clearly for every run which make the possibility of noise influence very low. See backup if you are interested. But I have to agree with you that maybe those who care performances will go to PV driver. Just raise the open questions here if there is possible that anyone has strong reason to use HVM for performance, or some other situation that may be PIO sensitive which may makes this helpful. Regards Kai ---------------------------------------------------------------- Backups IOPS: Before: 3 minutes for each single run that is 9 minutes each group Run group 1: (3 runs from start up) 106.059, 112.147, 114.640 Run group 2: (3 runs from start up) 106.340, 112.024, 114.584 Run group 3: (3 runs from start up) 105.919, 111.405, 114.598 Run group 4: (3 runs from start up) 106.065, 112.455, 114.930 After: Run group 1: (3 runs from start up) 109.435, 114.977, 117.662 Run group 2: (3 runs from start up) 109.961, 115.395, 117.576 Run group 3: (3 runs from start up) 109.41, 114.623, 118.046 Run group 4: (3 runs from start up) 110.464, 116.757, 118.790 ------------------------------------------------------------------- -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: 2008年12月22日 18:44 To: Xiang, Kai; xen-devel@lists.xensource.com Subject: Re: [Xen-devel] [PATCH] Quick path for PIO instructions which cut more than half of the expense On 22/12/2008 10:16, "Xiang, Kai" <kai.xiang@intel.com> wrote:> 3) The influence for more realistic workloads: > We tested on Windows 2003 Server Guest, while using IOmeter to run a Disk > bound test, the IO pattern is "Default" which use 67% random read and 33% > random write with 2K request size. > To reduce the influence of file cache, I run 3 times (1 minutes each) from the > start of the computer (both xen and the guest) > > Compare before and after > IO per second (3 runs) | average response time (3 runs) > ---------------------------------------------------------------- > Before: 100.004; 109.447; 110.801 | 9.988; 9.133; 9.022 > After: 101.951; 110.893; 114.179 | 9.806; 9.016; 8.756 > ------------------------------------------------------------------ > > So we are having a 1%~3% percent IO performance gain while reduce the average > response time by 2%~3% at the same time. > Considering this is just an ordinary SATA disk and an IO bound workload, we > are expecting more with faster Disks and more cached IO.The difference is in the noise, almost. Not, I think, sufficient for me ever to want to see the VMX PIO code back in Xen ever again. Those who actually care about performance would run PV drivers anyway, and see much greater speedups. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel