Adam Goryachev
2012-Sep-13 12:25 UTC
Poor Windows 2003 + GPLPV performance compared to VMWare
I have an application server running on MS Windows 2003, which was a clean install on VMWare last year (after a failed migration attempt to XenServer). At this point, a benchmark (actually a live process run regularly on the machine) produced a result of 7800 to 7900 transactions per second I''ve recently migrated this to Xen by: 1) Uninstall VMWare tools 2) Shutdown Windows 3) Use VMWare to download the vmdk 4) Write the VMDK file (dd) to the same LVM that contained the VMWare storage area (which was exported by iSCSI to VMWare 5) Installed Debian Testing with Xen 4.1 6) Using the exactl same iSCSI server/LVM config etc, started the VM 7) Installed the GPLPV drivers Everything seemed to work, and all was good. Then, the user ran the above process, and got consistently, results of approx 2500 transactions per second I increased the vcpus from 2 to 4, but this didn''t change the result at all. I modifed the domU config for the disk line from hda to xvda, and the result increased slightly to 2560 / sec (these tests are just one off tests, no verification of actual performance increase etc...). In any case, I seem to have a significant loss of performance on the domU when compared to VMWare. The storage server/network is identical The dom0 machine is identical to the VMWare machine Here is my current domU config file: kernel = "/usr/lib/xen-4.1/boot/hvmloader" builder = ''hvm'' memory = 4096 shadow_memory = 12 device_model = ''/usr/lib/xen-default/bin/qemu-dm'' localtime = 1 name = "vm1" cpus = "2,3,4,5" # Which physical CPU''s to allow vcpus = 4 # How many Virtual CPU''s to present viridian = 1 disk = [ ''phy:/dev/disk/by-path/ip-10.30.10.23:3260-iscsi-iqn.2012-06.domain:vm1-lun-0,xvda,w'' ] vif = [''bridge=xenbr0, mac=00:16:3e:39:10:1a''] boot = ''c'' sdl = 0 vnc = 1 vncdisplay = 10 vncviewer = 0 vncconsole = 0 vncunused = 0 stdvga = 1 usb = 1 usbdevice = ''tablet'' acpi = 1 apic = 1 on_reboot = ''restart'' on_poweroff = ''destroy'' on_crash = ''restart'' audio = 0 Any suggestions on how to improve performance would be greatly appreciated. Thank you, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Adam Goryachev
2012-Sep-13 12:49 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
Additional information accidentally omitted: The dom0 machine is a debian testing running Xen 4.1 xm info host : pm08 release : 3.2.0-3-amd64 version : #1 SMP Mon Jul 23 02:45:17 UTC 2012 machine : x86_64 nr_cpus : 6 nr_nodes : 1 cores_per_socket : 6 threads_per_core : 1 cpu_mhz : 3300 hw_caps : 178bf3ff:efd3fbff:00000000:00001310:00802001:00000000:000837ff:00000000 virt_caps : hvm total_memory : 16351 free_memory : 2 free_cpus : 0 xen_major : 4 xen_minor : 1 xen_extra : .3 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : unavailable xen_commandline : placeholder cc_compiler : gcc version 4.7.1 (Debian 4.7.1-7) cc_compile_by : waldi cc_compile_domain : debian.org cc_compile_date : Fri Aug 17 09:41:02 UTC 2012 xend_config_format : 4 If I should provide any other information, please let me know. Regards, Adam On 13/09/12 22:25, Adam Goryachev wrote:> I have an application server running on MS Windows 2003, which was a > clean install on VMWare last year (after a failed migration attempt to > XenServer). > > At this point, a benchmark (actually a live process run regularly on the > machine) produced a result of 7800 to 7900 transactions per second > > I''ve recently migrated this to Xen by: > 1) Uninstall VMWare tools > 2) Shutdown Windows > 3) Use VMWare to download the vmdk > 4) Write the VMDK file (dd) to the same LVM that contained the VMWare > storage area (which was exported by iSCSI to VMWare > 5) Installed Debian Testing with Xen 4.1 > 6) Using the exactl same iSCSI server/LVM config etc, started the VM > 7) Installed the GPLPV drivers > > Everything seemed to work, and all was good. > > Then, the user ran the above process, and got consistently, results of > approx 2500 transactions per second > > I increased the vcpus from 2 to 4, but this didn''t change the result at all. > > I modifed the domU config for the disk line from hda to xvda, and the > result increased slightly to 2560 / sec (these tests are just one off > tests, no verification of actual performance increase etc...). > > In any case, I seem to have a significant loss of performance on the > domU when compared to VMWare. > > The storage server/network is identical > The dom0 machine is identical to the VMWare machine > > Here is my current domU config file: > > kernel = "/usr/lib/xen-4.1/boot/hvmloader" > builder = ''hvm'' > memory = 4096 > shadow_memory = 12 > device_model = ''/usr/lib/xen-default/bin/qemu-dm'' > localtime = 1 > name = "vm1" > cpus = "2,3,4,5" # Which physical CPU''s to allow > vcpus = 4 # How many Virtual CPU''s to present > viridian = 1 > disk = [ > ''phy:/dev/disk/by-path/ip-10.30.10.23:3260-iscsi-iqn.2012-06.domain:vm1-lun-0,xvda,w'' > ] > vif = [''bridge=xenbr0, mac=00:16:3e:39:10:1a''] > boot = ''c'' > sdl = 0 > vnc = 1 > vncdisplay = 10 > vncviewer = 0 > vncconsole = 0 > vncunused = 0 > stdvga = 1 > usb = 1 > usbdevice = ''tablet'' > acpi = 1 > apic = 1 > on_reboot = ''restart'' > on_poweroff = ''destroy'' > on_crash = ''restart'' > audio = 0 > > Any suggestions on how to improve performance would be greatly appreciated. > > Thank you, > Adam >-- Adam Goryachev Website Managers www.websitemanagers.com.au
James Harper
2012-Sep-14 07:53 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
Is it AMD or Intel architecture? You mention iSCSI... is your benchmark measuring disk activity or some combination of things to give a total performance benchmark? If I understand correctly, you have iSCSI, then Xen, then GPLPV. Is there a way you could test iscsi performance in Dom0 without involving a DomU? The probably is that there are a few layers involved here so it''s hard to know which one is letting you down. James> -----Original Message----- > From: xen-users-bounces@lists.xen.org [mailto:xen-users- > bounces@lists.xen.org] On Behalf Of Adam Goryachev > Sent: Thursday, 13 September 2012 10:26 PM > To: xen-users@lists.xen.org > Subject: [Xen-users] Poor Windows 2003 + GPLPV performance compared to > VMWare > > I have an application server running on MS Windows 2003, which was a clean > install on VMWare last year (after a failed migration attempt to XenServer). > > At this point, a benchmark (actually a live process run regularly on the > machine) produced a result of 7800 to 7900 transactions per second > > I''ve recently migrated this to Xen by: > 1) Uninstall VMWare tools > 2) Shutdown Windows > 3) Use VMWare to download the vmdk > 4) Write the VMDK file (dd) to the same LVM that contained the VMWare > storage area (which was exported by iSCSI to VMWare > 5) Installed Debian Testing with Xen 4.1 > 6) Using the exactl same iSCSI server/LVM config etc, started the VM > 7) Installed the GPLPV drivers > > Everything seemed to work, and all was good. > > Then, the user ran the above process, and got consistently, results of approx > 2500 transactions per second > > I increased the vcpus from 2 to 4, but this didn''t change the result at all. > > I modifed the domU config for the disk line from hda to xvda, and the result > increased slightly to 2560 / sec (these tests are just one off tests, no > verification of actual performance increase etc...). > > In any case, I seem to have a significant loss of performance on the domU > when compared to VMWare. > > The storage server/network is identical > The dom0 machine is identical to the VMWare machine > > Here is my current domU config file: > > kernel = "/usr/lib/xen-4.1/boot/hvmloader" > builder = ''hvm'' > memory = 4096 > shadow_memory = 12 > device_model = ''/usr/lib/xen-default/bin/qemu-dm'' > localtime = 1 > name = "vm1" > cpus = "2,3,4,5" # Which physical CPU''s to allow > vcpus = 4 # How many Virtual CPU''s to present > viridian = 1 > disk = [ > ''phy:/dev/disk/by-path/ip-10.30.10.23:3260-iscsi-iqn.2012-06.domain:vm1- > lun-0,xvda,w'' > ] > vif = [''bridge=xenbr0, mac=00:16:3e:39:10:1a''] > boot = ''c'' > sdl = 0 > vnc = 1 > vncdisplay = 10 > vncviewer = 0 > vncconsole = 0 > vncunused = 0 > stdvga = 1 > usb = 1 > usbdevice = ''tablet'' > acpi = 1 > apic = 1 > on_reboot = ''restart'' > on_poweroff = ''destroy'' > on_crash = ''restart'' > audio = 0 > > Any suggestions on how to improve performance would be greatly > appreciated. > > Thank you, > Adam > > -- > Adam Goryachev > Website Managers > www.websitemanagers.com.au > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-14 08:04 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Thu, 2012-09-13 at 13:25 +0100, Adam Goryachev wrote:> Then, the user ran the above process, and got consistently, results of > approx 2500 transactions per secondAre you certain the GPLPV drivers have taken hold and you aren''t using emulated devices? I don''t know how you can tell from within Windows but from dom0 you can look in the output of "xenstore-ls -fp" for the "state" node associated with each device frontend -- they should be in state 4 (connected). [...]> memory = 4096 > shadow_memory = 12This seems low to me. The default is 1M per CPU, plus 8K per M of RAM, which is 4M + 8*4096K = 4M+32M = 36M. Do you have any reason to second guess this? (Usually this option is used to increase shadow RAM where the workload demands it). Does your system have HAP (hardware assisted paging, EPT or NPT on Intel/AMD respectively)?> device_model = ''/usr/lib/xen-default/bin/qemu-dm'' > localtime = 1 > name = "vm1" > cpus = "2,3,4,5" # Which physical CPU''s to allowHave you pinned dom0 to use pCPU 1 and/p pCPUs > 6? How many dom0 vcpus have you configured? Does your system have any NUMA properties? And as James suggests it would also be useful to benchmark iSCSI running in dom0 and perhaps even running on the same system without Xen (just Linux) using the same kernel. I''m not sure if VMware offers something similar which could be used for comparison. Ian.
Adam Goryachev
2012-Sep-14 13:11 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 14/09/12 18:04, Ian Campbell wrote:> On Thu, 2012-09-13 at 13:25 +0100, Adam Goryachev wrote: >> Then, the user ran the above process, and got consistently, results of >> approx 2500 transactions per second > > Are you certain the GPLPV drivers have taken hold and you aren't using > emulated devices?Within Windows, Device Manager shows the Disk Drives as "XEV PV DISK SCSI Disk Device", this is the newest one which it detected and installed after I changed the config from hda to xvda.> I don't know how you can tell from within Windows but from dom0 you can > look in the output of "xenstore-ls -fp" for the "state" node associated > with each device frontend -- they should be in state 4 (connected).root@pm08:~# xenstore-ls -fp|grep state|grep vbd /local/domain/0/backend/vbd/8/51712/state = "4" (n0,r8) /local/domain/8/device/vbd/51712/state = "4" (n8,r0) I assume dom id 8 is the VM, and dom0 is the first line above.> [...] >> memory = 4096 >> shadow_memory = 12 > > This seems low to me. The default is 1M per CPU, plus 8K per M of RAM, > which is 4M + 8*4096K = 4M+32M = 36M. Do you have any reason to second > guess this? (Usually this option is used to increase shadow RAM where > the workload demands it).OK, I must admit I have no idea, I copied this value from an example a long time ago, and I've just copied it into each new vm as I go. From here: http://wiki.prgmr.com/mediawiki/index.php/Chapter_12:_HVM:_Beyond_Paravirtualization It says: The shadow_memory directive specifies the amount of memory to use for shadow page tables. (Shadow page tables, of course, are the aforementioned copies of the tables that map process-virtual memory to physical memory.) Xen advises allocating at least 2KB per MB of domain memory, and “a few” MB per virtual CPU. Note that this memory is in addition to the domU’s allocation specified in the memory line. I'm not really sure where to find definitive documentation on all the config file options within xen.... I will re-run the test with shadow_memory = 36 and let you know. Was going to run it now and advise, but some scheduled task has started, so will wait until it is finished and re-test.> Does your system have HAP (hardware assisted paging, EPT or NPT on > Intel/AMD respectively)?(XEN) HVM: ASIDs enabled. (XEN) SVM: Supported advanced features: (XEN) - Nested Page Tables (NPT) (XEN) - Last Branch Record (LBR) Virtualisation (XEN) - Next-RIP Saved on #VMEXIT (XEN) - Pause-Intercept Filter (XEN) HVM: SVM enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB I'm guessing that is a yes to HAP and NPT but no for EPT.... This is a AMD Phenom(tm) II X6 1100T Processor>> device_model = '/usr/lib/xen-default/bin/qemu-dm' >> localtime = 1 >> name = "vm1" >> cpus = "2,3,4,5" # Which physical CPU's to allow > > Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6?No, how should I pin dom0 to cpu0 ? Also, xm vcpu-list shows this: xm vcpu-list Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 34093.4 any cpu Domain-0 0 1 5 -b- 1239.3 any cpu Domain-0 0 2 1 -b- 1134.4 any cpu Domain-0 0 3 3 -b- 1049.9 any cpu Domain-0 0 4 0 -b- 1340.5 any cpu Domain-0 0 5 2 -b- 1123.2 any cpu vm1 9 0 2 -b- 20.5 2-5 vm1 9 1 4 -b- 15.2 2-5 vm1 9 2 3 -b- 14.9 2-5 vm1 9 3 4 -b- 15.1 2-5 I've set the vm to use cpus 2,3,4,5 but how do I force it so: vcpu 0 = 2 vcpu 1 = 3 vcpu 2 = 4 vcpu 3 = 5 Without running: xm vcpu-pin vm1 0 2 xm vcpu-pin vm1 1 3 xm vcpu-pin vm1 2 4 xm vcpu-pin vm1 3 5> How many dom0 vcpus have you configured?I assume by default it takes all of them...> Does your system have any NUMA properties?I don't really understand this question.... is there a simple method to check? It is a AMD Phenom(tm) II X6 1100T Processor on a reasonable desktop motherboard, nothing fancy....> And as James suggests it would also be useful to benchmark iSCSI running > in dom0 and perhaps even running on the same system without Xen (just > Linux) using the same kernel. I'm not sure if VMware offers something > similar which could be used for comparison.Well, that is where things start to get complicated rather quickly... There are a lot of layers here, but I'd prefer to look at the issues closer to xen first, since vmware was working from an identically configured san/etc, so nothing at all has changed there. Ultimately, the san is using 3 x SSD in RAID5. I have done various testing in the past from plain linux (with older kernel 2.6.32 from debian stable) and achieved reasonable figures (I don't recall exactly). Thank you for your responses, if there is any further information I can provide, or additional suggestions you are able to make, I'd be really appreciative. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au _______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-14 13:30 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Fri, 2012-09-14 at 14:11 +0100, Adam Goryachev wrote:> On 14/09/12 18:04, Ian Campbell wrote: > > On Thu, 2012-09-13 at 13:25 +0100, Adam Goryachev wrote: > >> Then, the user ran the above process, and got consistently, results of > >> approx 2500 transactions per second > > > > Are you certain the GPLPV drivers have taken hold and you aren't using > > emulated devices? > > Within Windows, Device Manager shows the Disk Drives as "XEV PV DISK > SCSI Disk Device", this is the newest one which it detected and > installed after I changed the config from hda to xvda. > > > I don't know how you can tell from within Windows but from dom0 you can > > look in the output of "xenstore-ls -fp" for the "state" node associated > > with each device frontend -- they should be in state 4 (connected). > > root@pm08:~# xenstore-ls -fp|grep state|grep vbd > /local/domain/0/backend/vbd/8/51712/state = "4" (n0,r8) > /local/domain/8/device/vbd/51712/state = "4" (n8,r0) > > I assume dom id 8 is the VM, and dom0 is the first line above. > > > [...] > >> memory = 4096 > >> shadow_memory = 12 > > > > This seems low to me. The default is 1M per CPU, plus 8K per M of RAM, > > which is 4M + 8*4096K = 4M+32M = 36M. Do you have any reason to second > > guess this? (Usually this option is used to increase shadow RAM where > > the workload demands it). > > OK, I must admit I have no idea, I copied this value from an example a > long time ago, and I've just copied it into each new vm as I go. > > From here: > http://wiki.prgmr.com/mediawiki/index.php/Chapter_12:_HVM:_Beyond_Paravirtualization > It says: > The shadow_memory directive specifies the amount of memory to use for > shadow page tables. (Shadow page tables, of course, are the > aforementioned copies of the tables that map process-virtual memory to > physical memory.) Xen advises allocating at least 2KB per MB of domain > memory, and “a few” MB per virtual CPU. Note that this memory is in > addition to the domU’s allocation specified in the memory line. > > I'm not really sure where to find definitive documentation on all the > config file options within xen....http://xenbits.xen.org/docs/4.2-testing/ has man pages for the config files. These are also installed on the host as part of the build. If you are using xend then the xm ones are a bit lacking. However xl is mostly compatible with xm so the xl manpages largely apply. There's also a bunch of stuff on http://wiki.xen.org/wiki.> (XEN) HVM: ASIDs enabled. > (XEN) SVM: Supported advanced features: > (XEN) - Nested Page Tables (NPT) > (XEN) - Last Branch Record (LBR) Virtualisation > (XEN) - Next-RIP Saved on #VMEXIT > (XEN) - Pause-Intercept Filter > (XEN) HVM: SVM enabled > (XEN) HVM: Hardware Assisted Paging (HAP) detected > (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB > > I'm guessing that is a yes to HAP and NPT but no for EPT.... > > This is a AMD Phenom(tm) II X6 1100T ProcessorEPT is the Intel equivalent of NPT so you wouldn't have that one.> >> device_model = '/usr/lib/xen-default/bin/qemu-dm' > >> localtime = 1 > >> name = "vm1" > >> cpus = "2,3,4,5" # Which physical CPU's to allow > > > > Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6? > > No, how should I pin dom0 to cpu0 ?dom0_vcpus_pin as described in http://xenbits.xen.org/docs/4.2-testing/misc/xen-command-line.html> Also, xm vcpu-list shows this: > xm vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 0 r-- 34093.4 any cpu > Domain-0 0 1 5 -b- 1239.3 any cpu > Domain-0 0 2 1 -b- 1134.4 any cpu > Domain-0 0 3 3 -b- 1049.9 any cpu > Domain-0 0 4 0 -b- 1340.5 any cpu > Domain-0 0 5 2 -b- 1123.2 any cpu > vm1 9 0 2 -b- 20.5 2-5 > vm1 9 1 4 -b- 15.2 2-5 > vm1 9 2 3 -b- 14.9 2-5 > vm1 9 3 4 -b- 15.1 2-5 > > I've set the vm to use cpus 2,3,4,5 but how do I force it so: > vcpu 0 = 2 > vcpu 1 = 3 > vcpu 2 = 4 > vcpu 3 = 5 > > Without running: > xm vcpu-pin vm1 0 2 > xm vcpu-pin vm1 1 3 > xm vcpu-pin vm1 2 4 > xm vcpu-pin vm1 3 5You have: cpus = "2,3,4,5" which means "let all the guests VCPUs run on any of PCPUS 2-5". It sounds like what you are asking for above is: cpus = [2,3,4,5] Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. Subtle I agree. Do you have a specific reason for pinning? I'd be tempted to just let the scheduler do its thing unless/until you determine that it is causing problems.> > How many dom0 vcpus have you configured? > > I assume by default it takes all of them...Correct. dom0_max_vcpus will adjust this for you.> > Does your system have any NUMA properties? > > I don't really understand this question.... is there a simple method to > check? It is a AMD Phenom(tm) II X6 1100T Processor on a reasonable > desktop motherboard, nothing fancy.... > > > And as James suggests it would also be useful to benchmark iSCSI running > > in dom0 and perhaps even running on the same system without Xen (just > > Linux) using the same kernel. I'm not sure if VMware offers something > > similar which could be used for comparison. > > Well, that is where things start to get complicated rather quickly... > There are a lot of layers here, but I'd prefer to look at the issues > closer to xen first, since vmware was working from an identically > configured san/etc, so nothing at all has changed there. Ultimately, the > san is using 3 x SSD in RAID5. I have done various testing in the past > from plain linux (with older kernel 2.6.32 from debian stable) and > achieved reasonable figures (I don't recall exactly).I was worried about the Linux side rather than the SAN itself, but it sounds like you've got that covered.> Thank you for your responses, if there is any further information I can > provide, or additional suggestions you are able to make, I'd be really > appreciative. > > Regards, > Adam >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Adam Goryachev
2012-Sep-14 14:53 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 14/09/12 23:30, Ian Campbell wrote:> http://xenbits.xen.org/docs/4.2-testing/ has man pages for the config > files. These are also installed on the host as part of the build. > > If you are using xend then the xm ones are a bit lacking. However xl is > mostly compatible with xm so the xl manpages largely apply. There''s also > a bunch of stuff on http://wiki.xen.org/wiki.Thanks for the pointer, I''m using 4.1 though, but I guess most of it will still be the same.>>>> device_model = ''/usr/lib/xen-default/bin/qemu-dm'' >>>> localtime = 1 >>>> name = "vm1" >>>> cpus = "2,3,4,5" # Which physical CPU''s to allow >>> >>> Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6? >> >> No, how should I pin dom0 to cpu0 ? > > dom0_vcpus_pin as described in > http://xenbits.xen.org/docs/4.2-testing/misc/xen-command-line.htmlThanks, I''ll need to reboot the dom0 to apply this, will do as soon as this current scheduled task is complete.> You have: > cpus = "2,3,4,5" > which means "let all the guests VCPUs run on any of PCPUS 2-5". > > It sounds like what you are asking for above is: > cpus = [2,3,4,5] > Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. > > Subtle I agree.Ugh... ok, I''ll give that a try. BTW, it would seem this is different from xen 4.0 (from debian stable) where it seems to magically do what I meant to say, or I''m just lucky on those machines :)> Do you have a specific reason for pinning? I''d be tempted to just let > the scheduler do its thing unless/until you determine that it is causing > problems.The only reason for pinning is: a) To stop the scheduler from moving the vCPU around on the pCPU, from my understanding this improves performance b) when running multiple DOMU, I either want a bunch of DOMU to share one cpu, while I want one or more dedicated CPU other DOMU. (ie, I use this as a type of prioritisation/performance tuning. In this case, there is only a single VM, though if some hardware is lost (other physical machines) then will end up with multiple VM''s...>>> How many dom0 vcpus have you configured? >> >> I assume by default it takes all of them... > > Correct. dom0_max_vcpus will adjust this for you.Will adjust on the next reboot....>>> And as James suggests it would also be useful to benchmark iSCSI running >>> in dom0 and perhaps even running on the same system without Xen (just >>> Linux) using the same kernel. I''m not sure if VMware offers something >>> similar which could be used for comparison. >> >> Well, that is where things start to get complicated rather quickly... >> There are a lot of layers here, but I''d prefer to look at the issues >> closer to xen first, since vmware was working from an identically >> configured san/etc, so nothing at all has changed there. Ultimately, the >> san is using 3 x SSD in RAID5. I have done various testing in the past >> from plain linux (with older kernel 2.6.32 from debian stable) and >> achieved reasonable figures (I don''t recall exactly). > > I was worried about the Linux side rather than the SAN itself, but it > sounds like you''ve got that covered.At this stage, the limiting performance should be the single gig ethernet for the physical machine to connect to the network. (The san side has 4 x gig ethernet). This is a live network/system, but it has been a work in progress for the past 12 months... I''ll update further once I can get some testing and answers... Will do a test with only changing the shadow_memory, and then if no big improvement, will reboot with the changes to the dom0 cpus etc, and test again. Thank you for your advice. Regards, adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
James Harper
2012-Sep-15 05:59 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
> > Are you certain the GPLPV drivers have taken hold and you aren''t using > > emulated devices? > > Within Windows, Device Manager shows the Disk Drives as "XEV PV DISK SCSI > Disk Device", this is the newest one which it detected and installed after I > changed the config from hda to xvda. >That''s definitely using GPLPV. Changing hda to xvda should have any impact on anything that GPLPV cares about. Can you confirm that you definitely are running Windows 2003 SP2? Anything prior will have a big impact on performance. James
Dion Kant
2012-Sep-16 13:50 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
Adam, Can you give it a retry with the Windows DomU running from a file container? mount another LV somewhere on dom0 (e.g. /var/lib/xen/images/windows) Use "qemu-img convert your.vmdk -O raw /var/lib/xen/images/windows/disk0.raw" to create the file container I see significant disk I/O performance improvement on all my Windows DomU''s when running them from file containers as compared to running them directly from a block device. Cheers, Dion On 09/13/2012 02:25 PM, Adam Goryachev wrote:> I have an application server running on MS Windows 2003, which was a > clean install on VMWare last year (after a failed migration attempt to > XenServer). > > At this point, a benchmark (actually a live process run regularly on the > machine) produced a result of 7800 to 7900 transactions per second > > I''ve recently migrated this to Xen by: > 1) Uninstall VMWare tools > 2) Shutdown Windows > 3) Use VMWare to download the vmdk > 4) Write the VMDK file (dd) to the same LVM that contained the VMWare > storage area (which was exported by iSCSI to VMWare > 5) Installed Debian Testing with Xen 4.1 > 6) Using the exactl same iSCSI server/LVM config etc, started the VM > 7) Installed the GPLPV drivers > > Everything seemed to work, and all was good. > > Then, the user ran the above process, and got consistently, results of > approx 2500 transactions per second > > I increased the vcpus from 2 to 4, but this didn''t change the result at all. > > I modifed the domU config for the disk line from hda to xvda, and the > result increased slightly to 2560 / sec (these tests are just one off > tests, no verification of actual performance increase etc...). > > In any case, I seem to have a significant loss of performance on the > domU when compared to VMWare. > > The storage server/network is identical > The dom0 machine is identical to the VMWare machine > > Here is my current domU config file: > > kernel = "/usr/lib/xen-4.1/boot/hvmloader" > builder = ''hvm'' > memory = 4096 > shadow_memory = 12 > device_model = ''/usr/lib/xen-default/bin/qemu-dm'' > localtime = 1 > name = "vm1" > cpus = "2,3,4,5" # Which physical CPU''s to allow > vcpus = 4 # How many Virtual CPU''s to present > viridian = 1 > disk = [ > ''phy:/dev/disk/by-path/ip-10.30.10.23:3260-iscsi-iqn.2012-06.domain:vm1-lun-0,xvda,w'' > ] > vif = [''bridge=xenbr0, mac=00:16:3e:39:10:1a''] > boot = ''c'' > sdl = 0 > vnc = 1 > vncdisplay = 10 > vncviewer = 0 > vncconsole = 0 > vncunused = 0 > stdvga = 1 > usb = 1 > usbdevice = ''tablet'' > acpi = 1 > apic = 1 > on_reboot = ''restart'' > on_poweroff = ''destroy'' > on_crash = ''restart'' > audio = 0 > > Any suggestions on how to improve performance would be greatly appreciated. > > Thank you, > Adam >
Andrew Bobulsky
2012-Sep-16 15:26 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
Hello Adam, On Sep 14, 2012, at 10:57 AM, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: <snip>>>>> And as James suggests it would also be useful to benchmark iSCSI running >>>> in dom0 and perhaps even running on the same system without Xen (just >>>> Linux) using the same kernel. I''m not sure if VMware offers something >>>> similar which could be used for comparison. >>> >>> Well, that is where things start to get complicated rather quickly... >>> There are a lot of layers here, but I''d prefer to look at the issues >>> closer to xen first, since vmware was working from an identically >>> configured san/etc, so nothing at all has changed there. Ultimately, the >>> san is using 3 x SSD in RAID5. I have done various testing in the past >>> from plain linux (with older kernel 2.6.32 from debian stable) and >>> achieved reasonable figures (I don''t recall exactly). >> >> I was worried about the Linux side rather than the SAN itself, but it >> sounds like you''ve got that covered. > > At this stage, the limiting performance should be the single gig > ethernet for the physical machine to connect to the network. (The san > side has 4 x gig ethernet).</snip> I''m finding myself fascinated with this thread you''ve started, lots of details going on and I''m really hopeful you figure this out. However, in case you don''t, I may have a suggestion: Is it an option for you to connect this DomU to your iSCSI LUN directly? Bypass the initiator in Dom0, and the uncertainty of your disk assignment to the DomU? With Windows prior to NT6, you of course need to download from Microsoft and install the iSCSI initiator, but with that, you could create a dedicated LUN on your SAN and use that device as the backing store for your application''s data. If you like the idea, try installing the initiator and connecting to a small RAM disk on your SAN (or something where you know the storage IOPs won''t be a limiting factor) and benchmark the disk with IOMeter or CrystalDiskMark, compare that to the performance of the Xen-mapped disk, and see if that will yield the appropriate throughput for your needs. If you want to go deeper down the rabbit hole (so to speak), you could also try booting the DomU directly from your SAN, as Xen bundles iPXE as its HVM network boot ROM. With your DomU already existing as raw data on an iSCSI LUN, you could basically install the initiator and sanbootconf package, configure a DHCP reservation (or the ROM, if the NVRAM storage works with the Xen NIC), and boot right up. And finally, and also the deepest down the rabbit hole that I''d suggest going, if your host supports PCI passthrough *and* you have a "spare" NIC available, you could assign that NIC directly to your DomU and use my first suggestion. The DomU will be "tied" to the host at that point though, so if you''re looking to leverage migration or failover, it''s not a good idea :P Best of luck to you, and, while I hope you don''t need my suggestions, I''d be glad to be of any assistance if you have some questions! Cheers, Andrew Bobulsky
Ian Campbell
2012-Sep-17 08:54 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Fri, 2012-09-14 at 14:30 +0100, Ian Campbell wrote:> > > Does your system have any NUMA properties? > > > > I don''t really understand this question.... is there a simple method to > > check? It is a AMD Phenom(tm) II X6 1100T Processor on a reasonable > > desktop motherboard, nothing fancy....I don''t have a NUMA system to hand, but on my non-NUMA system I see in the logs: (XEN) No NUMA configuration found You should see that or something more informative. Also at least in 4.2 "xl info -n" gives some details. Not sure if xm has the same option? Ian.
Ian Campbell
2012-Sep-17 08:54 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Fri, 2012-09-14 at 15:53 +0100, Adam Goryachev wrote:> On 14/09/12 23:30, Ian Campbell wrote: > > http://xenbits.xen.org/docs/4.2-testing/ has man pages for the config > > files. These are also installed on the host as part of the build. > > > > If you are using xend then the xm ones are a bit lacking. However xl is > > mostly compatible with xm so the xl manpages largely apply. There''s also > > a bunch of stuff on http://wiki.xen.org/wiki. > > Thanks for the pointer, I''m using 4.1 though, but I guess most of it > will still be the same.Right.> > > You have: > > cpus = "2,3,4,5" > > which means "let all the guests VCPUs run on any of PCPUS 2-5". > > > > It sounds like what you are asking for above is: > > cpus = [2,3,4,5] > > Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. > > > > Subtle I agree. > > Ugh... ok, I''ll give that a try. BTW, it would seem this is different > from xen 4.0 (from debian stable) where it seems to magically do what I > meant to say, or I''m just lucky on those machines :)It''s not impossible, xend is largely unmaintained but it does get occasional "obvious" fixes (which sometimes turn out not to be so obvious)> > Do you have a specific reason for pinning? I''d be tempted to just let > > the scheduler do its thing unless/until you determine that it is causing > > problems. > > The only reason for pinning is: > a) To stop the scheduler from moving the vCPU around on the pCPU, from > my understanding this improves performanceIt can, it can also cause the opposite if not used carefully. I''m no expert on scheduling vs. pinning but one thing to watch for in particular is the relationship between dom0 and guest VCPUs when pinning one or both of them. Depending on the workload either putting them on the same or distinct sets of pCPUs can be beneficial. I''ve also heard that mixing pinned and unpinned VCPUs on a pCPU can cause unexpected behaviours.> b) when running multiple DOMU, I either want a bunch of DOMU to share > one cpu, while I want one or more dedicated CPU other DOMU. (ie, I use > this as a type of prioritisation/performance tuning.You might find cpupools in 4.1+ quite handy for managing this.> In this case, there is only a single VM, though if some hardware is lost > (other physical machines) then will end up with multiple VM''s...Don''t forget that dom0 counts as a VM as well. Ian.
Adam Goryachev
2012-Sep-18 12:24 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 17/09/12 18:54, Ian Campbell wrote:> On Fri, 2012-09-14 at 14:30 +0100, Ian Campbell wrote: >>>> Does your system have any NUMA properties? >>> I don''t really understand this question.... is there a simple method to >>> check? It is a AMD Phenom(tm) II X6 1100T Processor on a reasonable >>> desktop motherboard, nothing fancy.... > I don''t have a NUMA system to hand, but on my non-NUMA system I see in > the logs: > (XEN) No NUMA configuration found > > You should see that or something more informative."xm dmesg|grep -i numa" produces no output, so again, I still have no definitive answer to this. Actually, got it... : dmesg|grep -i numa [ 0.000000] Scanning NUMA topology in Northbridge 24 [ 0.000000] No NUMA configuration found That is not xen, that is the normal kernel... or perhaps xen has hidden the numa config from linux... I''m really not sure...> Also at least in 4.2 "xl info -n" gives some details. Not sure if xm has > the same option?This is getting better. xm info -n numa_info : none Amongst lots of other interesting information. So, seems pretty definitive that there is no numa support here. Is this still relevant to the diagnosis of very slow windows domu? Thanks for your help and patience. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Adam Goryachev
2012-Sep-18 12:51 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 16/09/12 23:50, Dion Kant wrote:> Adam, > > Can you give it a retry with the Windows DomU running from a file container? > > mount another LV somewhere on dom0 (e.g. /var/lib/xen/images/windows) > Use "qemu-img convert your.vmdk -O raw > /var/lib/xen/images/windows/disk0.raw" to create the file container > > I see significant disk I/O performance improvement on all my Windows > DomU''s when running them from file containers as compared to running > them directly from a block device. >I''d really like to try this and see if it helps, but I''m not sure how I can achieve that. The dom0 machine has a 60GB SSD drive internally, and the VM is 230G, so I don''t think I can put it on the local system. Also, I don''t think I would need to convert it, the disk is already in raw format... I should be able to simply do: dd if=/dev/sdX of=disk0.raw where sdX is whatever disk linux has assigned to this iSCSI device. In any case, I can''t do this at this stage. About the only thing I could consider would be to make a copy of the VM onto another host (not on the SAN) and share that to the dom0 using NFS, which would then get me a file based image (over NFS) to HDD backed storage. Though I think this is changing far too many factors to really be useful (the SAN is SSD backed btw). Finally, I was always of the impression that physical devices provided better performance due to lower overhead. I presume you are suggesting better performance from RAM based caching on the dom0. disk = [ ''phy:/dev/disk/by-path/ip-172.30.10.23:3260-iscsi-iqn.2012-06.domain:host-lun-0,xvda,w'' ] Could I simply change that line to this: disk = [ ''file:/dev/disk/by-path/ip-172.30.10.23:3260-iscsi-iqn.2012-06.domain:host-lun-0,xvda,w'' ] Since a device is just a file... or would that not make a difference? OK, well, it doesn''t work: Error: Disk image does not exist: /dev/sdd (well, originally it had the above path, but I tried the direct /dev/sdd and same error). Any further suggestions please? Thanks, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Adam Goryachev
2012-Sep-18 13:06 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 15/09/12 00:53, Adam Goryachev wrote:> On 14/09/12 23:30, Ian Campbell wrote: > >>>>> device_model = ''/usr/lib/xen-default/bin/qemu-dm'' >>>>> localtime = 1 >>>>> name = "vm1" >>>>> cpus = "2,3,4,5" # Which physical CPU''s to allow >>>> Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6? >>> No, how should I pin dom0 to cpu0 ? >> dom0_vcpus_pin as described in >> http://xenbits.xen.org/docs/4.2-testing/misc/xen-command-line.html > Thanks, I''ll need to reboot the dom0 to apply this, will do as soon as > this current scheduled task is complete.OK, I have pinned dom0 to cpu0, and this had no effect on performance.>> You have: >> cpus = "2,3,4,5" >> which means "let all the guests VCPUs run on any of PCPUS 2-5". >> >> It sounds like what you are asking for above is: >> cpus = [2,3,4,5] >> Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. >> >> Subtle I agree. > Ugh... ok, I''ll give that a try. BTW, it would seem this is different > from xen 4.0 (from debian stable) where it seems to magically do what I > meant to say, or I''m just lucky on those machines :)Actually, the above syntax doesn''t work: cpus = [2,3,4,5] # Which physical CPU''s to allow Error: ''int'' object has no attribute ''split'' Once I reverted to: cpus = "2,3,4,5" I can then boot again, but on reboot I get this: xm vcpu-list Name ID VCPU CPU State Time(s) CPU Affinity Domain-0 0 0 0 r-- 148.9 0 cobweb 6 0 5 --- 0.5 2-5 cobweb 6 1 - --p 0.0 2-5 cobweb 6 2 - --p 0.0 2-5 cobweb 6 3 - --p 0.0 2-5 So it isn''t pinning each vcpu to a specific cpu... but I suppose it should be smart enough to do it well anyway... Performance is still at the same level.>> Do you have a specific reason for pinning? I''d be tempted to just let >> the scheduler do its thing unless/until you determine that it is causing >> problems. > The only reason for pinning is: > a) To stop the scheduler from moving the vCPU around on the pCPU, from > my understanding this improves performance > b) when running multiple DOMU, I either want a bunch of DOMU to share > one cpu, while I want one or more dedicated CPU other DOMU. (ie, I use > this as a type of prioritisation/performance tuning. > > In this case, there is only a single VM, though if some hardware is lost > (other physical machines) then will end up with multiple VM''s... > >>>> How many dom0 vcpus have you configured? >>> I assume by default it takes all of them... >> Correct. dom0_max_vcpus will adjust this for you. > Will adjust on the next reboot....Done, dom0 is set to 1 cpu, but still makes no difference to performance.>>>> And as James suggests it would also be useful to benchmark iSCSI running >>>> in dom0 and perhaps even running on the same system without Xen (just >>>> Linux) using the same kernel. I''m not sure if VMware offers something >>>> similar which could be used for comparison. >>> Well, that is where things start to get complicated rather quickly... >>> There are a lot of layers here, but I''d prefer to look at the issues >>> closer to xen first, since vmware was working from an identically >>> configured san/etc, so nothing at all has changed there. Ultimately, the >>> san is using 3 x SSD in RAID5. I have done various testing in the past >>> from plain linux (with older kernel 2.6.32 from debian stable) and >>> achieved reasonable figures (I don''t recall exactly). >> I was worried about the Linux side rather than the SAN itself, but it >> sounds like you''ve got that covered. > At this stage, the limiting performance should be the single gig > ethernet for the physical machine to connect to the network. (The san > side has 4 x gig ethernet). > > This is a live network/system, but it has been a work in progress for > the past 12 months... > > I''ll update further once I can get some testing and answers... Will do a > test with only changing the shadow_memory, and then if no big > improvement, will reboot with the changes to the dom0 cpus etc, and test > again. >I''m really at a bit of a loss on where to go from here.... The standard performance improvements don''t seem to make any difference at all, and I''m running out of ideas.... Could you suggest a "standard" tool which would allow me to test disk IO performance (this is my initial suspicion for slow performance), and also CPU performance (I''m starting to suspect this too now) in both windows (domU), linux (domU I can create one for testing) and Linux (dom0). Then I can see where performance is lost (CPU/disk) and at what layer (dom0/domU) etc... Thanks, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Adam Goryachev
2012-Sep-18 13:14 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 15/09/12 15:59, James Harper wrote:>>> Are you certain the GPLPV drivers have taken hold and you aren''t using >>> emulated devices? >> >> Within Windows, Device Manager shows the Disk Drives as "XEV PV DISK SCSI >> Disk Device", this is the newest one which it detected and installed after I >> changed the config from hda to xvda. >> > > That''s definitely using GPLPV. Changing hda to xvda should have any impact on anything that GPLPV cares about. > > Can you confirm that you definitely are running Windows 2003 SP2? Anything prior will have a big impact on performance.Hmmm, ooops, to be honest I almost didn''t bother checking this to confirm, as I assumed it would obviously be the latest service pack. However, it would seem that I am very wrong, it is Windows 2003 service pack 1. I''ll see if I can get all the latest service packs installed and updated, and then re-test and see how it goes. Thanks so much for your suggestion. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Fajar A. Nugraha
2012-Sep-18 13:20 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Tue, Sep 18, 2012 at 8:06 PM, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote:> Could you suggest a "standard" tool which would allow me to test disk IO > performance (this is my initial suspicion for slow performance), andI''d suggest you try fio, with random read/write load, and data size at least twice the ammount of memory.> also CPU performance (I''m starting to suspect this too now) in bothnot sure about that one. And I''m not sure it would be useful either. What is your application, and what kind of load was it? If it was a synthetic load, then it''s already a good benchmark tool. However if it''s live load that you can''t control, I suggest you try something like sysbench to test sql performance. I usually use sysbench with mysql and 1 million rows. -- Fajar
Adam Goryachev
2012-Sep-18 13:49 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 18/09/12 23:20, Fajar A. Nugraha wrote:> On Tue, Sep 18, 2012 at 8:06 PM, Adam Goryachev > <mailinglists@websitemanagers.com.au> wrote: >> Could you suggest a "standard" tool which would allow me to test disk IO >> performance (this is my initial suspicion for slow performance), and > > I''d suggest you try fio, with random read/write load, and data size at > least twice the ammount of memory.Thanks, will give this a go tomorrow night..>> also CPU performance (I''m starting to suspect this too now) in both > > not sure about that one. And I''m not sure it would be useful either. > > What is your application, and what kind of load was it? If it was a > synthetic load, then it''s already a good benchmark tool. However if > it''s live load that you can''t control, I suggest you try something > like sysbench to test sql performance. I usually use sysbench with > mysql and 1 million rows.The load is a "live" load, it uses some sort of Database (proprietary, non standard), and I am running a "maintenance" task of some sort which is used regularly to do backups of the system. A very similar task is also run when doing a "upgrade" to the program, etc. As far as I can tell, it is creating a new "database/table" and copying the current database/table into this new one. I assume there are the standard transactions/locking/etc associated. The values obtained from these processes vary significantly (as per normal) for each data type or table. My measurements are based on the speed of the first table in the pre-backup process. Normal backup time for this table would be about 10 minutes, I''m usually just letting it run for about 30 seconds to a minute, as the speed remains fairly constant throughout the process. So, I''m currently seeing speeds of approx 2500 transactions / sec, and "normal" should be around 7000 / sec (where normal is the speed of the same domU under vmware ESXi, and also comparable on a standalone physical machine (original, older machine). For now, I''m going to try and get service pack 2 installed before I proceed, if things are still looking bad after that, I''ll proceed with doing some performance testing to see if I can pinpoint what is causing the slowdown (disk, cpu, ram, etc...) Thanks again for your help and suggestions. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Ian Campbell
2012-Sep-19 08:24 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Tue, 2012-09-18 at 13:24 +0100, Adam Goryachev wrote:> On 17/09/12 18:54, Ian Campbell wrote: > > On Fri, 2012-09-14 at 14:30 +0100, Ian Campbell wrote: > >>>> Does your system have any NUMA properties? > >>> I don''t really understand this question.... is there a simple method to > >>> check? It is a AMD Phenom(tm) II X6 1100T Processor on a reasonable > >>> desktop motherboard, nothing fancy.... > > I don''t have a NUMA system to hand, but on my non-NUMA system I see in > > the logs: > > (XEN) No NUMA configuration found > > > > You should see that or something more informative. > "xm dmesg|grep -i numa" produces no output, so again, I still have no > definitive answer to this. > > Actually, got it... : > dmesg|grep -i numa > [ 0.000000] Scanning NUMA topology in Northbridge 24 > [ 0.000000] No NUMA configuration found > > That is not xen, that is the normal kernel... or perhaps xen has hidden > the numa config from linux... I''m really not sure... > > Also at least in 4.2 "xl info -n" gives some details. Not sure if xm has > > the same option? > This is getting better. > xm info -n > numa_info : none > > Amongst lots of other interesting information. > > So, seems pretty definitive that there is no numa support here. > > Is this still relevant to the diagnosis of very slow windows domu?No, I was thinking of issues like front/backend on different NUMA nodes and the like. Ian.
Ian Campbell
2012-Sep-19 08:41 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On Tue, 2012-09-18 at 14:06 +0100, Adam Goryachev wrote:> On 15/09/12 00:53, Adam Goryachev wrote: > > On 14/09/12 23:30, Ian Campbell wrote: > > > >>>>> device_model = ''/usr/lib/xen-default/bin/qemu-dm'' > >>>>> localtime = 1 > >>>>> name = "vm1" > >>>>> cpus = "2,3,4,5" # Which physical CPU''s to allow > >>>> Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6? > >>> No, how should I pin dom0 to cpu0 ? > >> dom0_vcpus_pin as described in > >> http://xenbits.xen.org/docs/4.2-testing/misc/xen-command-line.html > > Thanks, I''ll need to reboot the dom0 to apply this, will do as soon as > > this current scheduled task is complete. > OK, I have pinned dom0 to cpu0, and this had no effect on performance. > >> You have: > >> cpus = "2,3,4,5" > >> which means "let all the guests VCPUs run on any of PCPUS 2-5". > >> > >> It sounds like what you are asking for above is: > >> cpus = [2,3,4,5] > >> Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. > >> > >> Subtle I agree. > > Ugh... ok, I''ll give that a try. BTW, it would seem this is different > > from xen 4.0 (from debian stable) where it seems to magically do what I > > meant to say, or I''m just lucky on those machines :) > Actually, the above syntax doesn''t work: > cpus = [2,3,4,5] # Which physical CPU''s to allow > Error: ''int'' object has no attribute ''split''Is it ["2","3"...] then I wonder?> Once I reverted to: > cpus = "2,3,4,5" > I can then boot again, but on reboot I get this: > xm vcpu-list > Name ID VCPU CPU State Time(s) CPU > Affinity > Domain-0 0 0 0 r-- 148.9 0 > cobweb 6 0 5 --- 0.5 2-5 > cobweb 6 1 - --p 0.0 2-5 > cobweb 6 2 - --p 0.0 2-5 > cobweb 6 3 - --p 0.0 2-5 > > So it isn''t pinning each vcpu to a specific cpu... but I suppose it > should be smart enough to do it well anyway... > Performance is still at the same level.To be honest I wouldn''t expect it to make much difference at this stage. I notice the state is "--p" for all but VCPU0 -- which means they are paused. It''s probably just that you ran the xm vcpu-list before Windows booted as far as brining up secondary CPUs but it would be worth checking that Windows is actually bringing up / using all the CPUs! Once the VM is fully booted then the state for each vcpu should either be "r--" (running) or "-b-" (currently blocked). I presume something like Windows task manager will also confirm that all the CPUs are in use.> I''m really at a bit of a loss on where to go from here.... The standard > performance improvements don''t seem to make any difference at all, and > I''m running out of ideas.... > > Could you suggest a "standard" tool which would allow me to test disk IO > performance (this is my initial suspicion for slow performance), and > also CPU performance (I''m starting to suspect this too now) in both > windows (domU), linux (domU I can create one for testing) and Linux > (dom0). Then I can see where performance is lost (CPU/disk) and at what > layer (dom0/domU) etc...Fajar''s suggestion of fio is a good one. Ian.
Adam Goryachev
2012-Sep-19 09:43 UTC
Re: Poor Windows 2003 + GPLPV performance compared to VMWare
On 19/09/12 18:41, Ian Campbell wrote:> On Tue, 2012-09-18 at 14:06 +0100, Adam Goryachev wrote: >> On 15/09/12 00:53, Adam Goryachev wrote: >>> On 14/09/12 23:30, Ian Campbell wrote: >>> >>>>>>> device_model = ''/usr/lib/xen-default/bin/qemu-dm'' >>>>>>> localtime = 1 >>>>>>> name = "vm1" >>>>>>> cpus = "2,3,4,5" # Which physical CPU''s to allow >>>>>> Have you pinned dom0 to use pCPU 1 and/p pCPUs > 6? >>>>> No, how should I pin dom0 to cpu0 ? >>>> dom0_vcpus_pin as described in >>>> http://xenbits.xen.org/docs/4.2-testing/misc/xen-command-line.html >>> Thanks, I''ll need to reboot the dom0 to apply this, will do as soon as >>> this current scheduled task is complete. >> OK, I have pinned dom0 to cpu0, and this had no effect on performance. >>>> You have: >>>> cpus = "2,3,4,5" >>>> which means "let all the guests VCPUs run on any of PCPUS 2-5". >>>> >>>> It sounds like what you are asking for above is: >>>> cpus = [2,3,4,5] >>>> Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. >>>> >>>> Subtle I agree. >>> Ugh... ok, I''ll give that a try. BTW, it would seem this is different >>> from xen 4.0 (from debian stable) where it seems to magically do what I >>> meant to say, or I''m just lucky on those machines :) >> Actually, the above syntax doesn''t work: >> cpus = [2,3,4,5] # Which physical CPU''s to allow >> Error: ''int'' object has no attribute ''split'' > Is it ["2","3"...] then I wonder? > >> Once I reverted to: >> cpus = "2,3,4,5" >> I can then boot again, but on reboot I get this: >> xm vcpu-list >> Name ID VCPU CPU State Time(s) CPU >> Affinity >> Domain-0 0 0 0 r-- 148.9 0 >> cobweb 6 0 5 --- 0.5 2-5 >> cobweb 6 1 - --p 0.0 2-5 >> cobweb 6 2 - --p 0.0 2-5 >> cobweb 6 3 - --p 0.0 2-5 >> >> So it isn''t pinning each vcpu to a specific cpu... but I suppose it >> should be smart enough to do it well anyway... >> Performance is still at the same level. > To be honest I wouldn''t expect it to make much difference at this stage. > > I notice the state is "--p" for all but VCPU0 -- which means they are > paused. > > It''s probably just that you ran the xm vcpu-list before Windows booted > as far as brining up secondary CPUs but it would be worth checking that > Windows is actually bringing up / using all the CPUs! > > Once the VM is fully booted then the state for each vcpu should either > be "r--" (running) or "-b-" (currently blocked). I presume something > like Windows task manager will also confirm that all the CPUs are in > use.That''s correct, cpu time was only .5s, so very likely it was pretty immediately after I did the create. Once the machine is up and running I see all 4 vcpu''s in b or r state, and the cpu time column incrementing. Within Windows task manager I also see all 4 vcpu''s being used during the process I am running. Windows reports approx 50% cpu utilisation, and that actually equates fairly well to 50% of each vcpu, ie, the work load is well balanced across all 4 cores (I don''t know if that is by chance, etc... but it is good). I''ve updated to SP2, and will re-test in the next few hours, failing that, I''ll move back to performance measuring with fio/etc. Thanks for all the suggestions and help so far I really appreciate it. Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
Adam Goryachev
2012-Sep-19 12:19 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
On 19/09/12 19:43, Adam Goryachev wrote:> On 19/09/12 18:41, Ian Campbell wrote: >> On Tue, 2012-09-18 at 14:06 +0100, Adam Goryachev wrote: >>> On 15/09/12 00:53, Adam Goryachev wrote: >>>> On 14/09/12 23:30, Ian Campbell wrote: >>>> >>>> You have: >>>> cpus = "2,3,4,5" >>>> which means "let all the guests VCPUs run on any of PCPUS 2-5". >>>> >>>> It sounds like what you are asking for above is: >>>> cpus = [2,3,4,5] >>>> Which forces guest vcpu0=>pcpu=2, 1=>3, 2=>4 and 3=>5. >>>> >>>> Subtle I agree. >>>> >>> Actually, the above syntax doesn''t work: >>> cpus = [2,3,4,5] # Which physical CPU''s to allow >>> Error: ''int'' object has no attribute ''split'' >> Is it ["2","3"...] then I wonder?I''m not sure, I won''t play with it now, and I expect that it will not make a lot of difference. In fact, I might remove all those lines completely, and let xen organise who gets to run on which cpu. I''ve now got a dedicated cpu (number 0) for the dom0, and the domU''s can all share the remaining cpu''s. Of course, 99.9% of the time, there will only be a single domU on any physical box, the other 0.1% of the time, xen can deal with the squabling...> I''ve updated to SP2, and will re-test in the next few hours,Excellent news, the issue is completely resolved, and in fact, performance is now better than vmware, and better than the old original physical box pre-virtualisation. The entire task now completed in 30 minutes, the previous best was 40 minutes (ie, prior to xen). I made note before that I was getting around 2500 transactions / sec, and this was too slow, just now, I''m getting around 7900 transactions / sec. The only change was the installation of Windows 2003 Service Pack 2. So, thank you so much to everyone that made suggestions, or discussed different aspects of the system. Hopefully this will assist someone else now or in the future, to ensure they have SP2 installed or suffer the horrible performance. Regards, Adam -- Adam Goryachev Website Managers Ph: +61 2 8304 0000 adam@websitemanagers.com.au Fax: +61 2 8304 0001 www.websitemanagers.com.au
Ian Campbell
2012-Sep-19 12:27 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
On Wed, 2012-09-19 at 13:19 +0100, Adam Goryachev wrote:> On 19/09/12 19:43, Adam Goryachev wrote: > > I''ve updated to SP2, and will re-test in the next few hours, > > Excellent news, the issue is completely resolved, and in fact, > performance is now better than vmware, and better than the old original > physical box pre-virtualisation. The entire task now completed in 30 > minutes, the previous best was 40 minutes (ie, prior to xen).Excellent news! James, since you suggested it, do you happen to know what it is about pre-SP2 W2K3 that is so bad? Do those versions beat on the TPR or something else? (Just curious) Ian.
James Harper
2012-Sep-19 12:55 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
> > Excellent news, the issue is completely resolved, and in fact, performance is > now better than vmware, and better than the old original physical box pre- > virtualisation. The entire task now completed in 30 minutes, the previous > best was 40 minutes (ie, prior to xen). >I wasn''t expecting it to be faster. Are you comparing 2003sp1 under VMWare with 2003sp2 under Xen?> I made note before that I was getting around 2500 transactions / sec, and this > was too slow, just now, I''m getting around 7900 transactions / sec. The only > change was the installation of Windows 2003 Service Pack 2.Microsoft updated Windows 2003 with SP2 to remove the use of the TPR register, which was a big performance hit on a virtual system. You can add /PATCHTPR to boot.ini to patch the windows kernel to use an alternate method of TPR access under AMD systems, and to cache reads of TPR under Intel systems, but I thought the xen guys had implemented their own acceleration of this in xen itself? SP2 is definitely the best option for 2003, but XP and 2000 don''t have this performance enhancement so I thought I''d mention it here too. James
Adam Goryachev
2012-Sep-19 13:00 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
On 19/09/12 22:55, James Harper wrote:>> Excellent news, the issue is completely resolved, and in fact, performance is >> now better than vmware, and better than the old original physical box pre- >> virtualisation. The entire task now completed in 30 minutes, the previous >> best was 40 minutes (ie, prior to xen). >> > I wasn''t expecting it to be faster. Are you comparing 2003sp1 under VMWare with 2003sp2 under Xen?Yes, comparing SP1 under VMWare to SP2 under Xen. (So not entirely fair, but at the end of the day, it looks good for me and Xen :)>> I made note before that I was getting around 2500 transactions / sec, and this >> was too slow, just now, I''m getting around 7900 transactions / sec. The only >> change was the installation of Windows 2003 Service Pack 2. > Microsoft updated Windows 2003 with SP2 to remove the use of the TPR register, which was a big performance hit on a virtual system. You can add /PATCHTPR to boot.ini to patch the windows kernel to use an alternate method of TPR access under AMD systems, and to cache reads of TPR under Intel systems, but I thought the xen guys had implemented their own acceleration of this in xen itself?Are you saying the SP1 with the /PATCHTPR will work well (not that I''m interested in this, but just interested), or are you saying the with SP2 you can use /PATCHTPR to improve performance even further (this would be interesting)? Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
James Harper
2012-Sep-19 13:00 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
> > On Wed, 2012-09-19 at 13:19 +0100, Adam Goryachev wrote: > > On 19/09/12 19:43, Adam Goryachev wrote: > > > I''ve updated to SP2, and will re-test in the next few hours, > > > > Excellent news, the issue is completely resolved, and in fact, > > performance is now better than vmware, and better than the old > > original physical box pre-virtualisation. The entire task now > > completed in 30 minutes, the previous best was 40 minutes (ie, prior to > xen). > > Excellent news! > > James, since you suggested it, do you happen to know what it is about > pre-SP2 W2K3 that is so bad? Do those versions beat on the TPR or > something else? > > (Just curious) >Yes it will almost certainly be TPR access. I actually thought MS made the change in SP1 but maybe it was SP2 (don''t know if the OP upgraded from RTM to sp2 or sp1 to sp2, but quite possibly I just don''t remember :) The change they made means that the TPR doesn''t get touched at all anymore, so is much faster. For prior versions (and XP and 2000), GPLPV can patch AMD systems to use the CR8 (I think) registr for TPR access which is much faster. For Intel, the best I could do was cache TPR so that reads were fast... it''s still quite a speedup. I thought Xen optimised this for Intel though and my patching wasn''t necessary anymore? Or maybe the OP is running a version of xen that doesn''t have that feature?? James
James Harper
2012-Sep-19 13:08 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
> > Microsoft updated Windows 2003 with SP2 to remove the use of the TPR > > register, which was a big performance hit on a virtual system. You can add > > /PATCHTPR to boot.ini to patch the windows kernel to use an alternate > > method of TPR access under AMD systems, and to cache reads of TPR under > > Intel systems, but I thought the xen guys had implemented their own > > acceleration of this in xen itself? > > Are you saying the SP1 with the /PATCHTPR will work well (not that I''m > interested in this, but just interested), or are you saying the with SP2 > you can use /PATCHTPR to improve performance even further (this would be > interesting)? >The former. The TPR register is what Windows uses to manage interrupt priorities, and is accessed many many times per second. Every access causes a VMEXIT, so a whole lot of work gets done every time it is accessed. /PATCHTPR modifies the windows kernel a bit to change the way this is accessed. With SP2, Microsoft modified the kernel to manage interrupt priorities a different way so the TPR register is not used at all, which virtualises much better. /PATCHTPR is still useful for XP and 2000 which don''t have these optimisations though. James
Adam Goryachev
2012-Sep-19 13:12 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
On 19/09/12 23:00, James Harper wrote:>> On Wed, 2012-09-19 at 13:19 +0100, Adam Goryachev wrote: >>> On 19/09/12 19:43, Adam Goryachev wrote: >>>> I''ve updated to SP2, and will re-test in the next few hours, >>> Excellent news, the issue is completely resolved, and in fact, >>> performance is now better than vmware, and better than the old >>> original physical box pre-virtualisation. The entire task now >>> completed in 30 minutes, the previous best was 40 minutes (ie, prior to >> xen). >> >> Excellent news! >> >> James, since you suggested it, do you happen to know what it is about >> pre-SP2 W2K3 that is so bad? Do those versions beat on the TPR or >> something else? >> >> (Just curious) >> > Yes it will almost certainly be TPR access. I actually thought MS made the change in SP1 but maybe it was SP2 (don''t know if the OP upgraded from RTM to sp2 or sp1 to sp2, but quite possibly I just don''t remember :)Nope, definitely had SP1, and installed SP2 to get the performance boost (massive performance boost).> The change they made means that the TPR doesn''t get touched at all anymore, so is much faster. > > For prior versions (and XP and 2000), GPLPV can patch AMD systems to use the CR8 (I think) registr for TPR access which is much faster. For Intel, the best I could do was cache TPR so that reads were fast... it''s still quite a speedup.Am running this on AMD, with the GPLPV drivers, so I guess something isn''t working right with that optimisation... Specifically, using the AMD Phenom(tm) II X6 1100T Processor Perhaps this processor doesn''t have the CR8 register, maybe an opteron is needed or something?> I thought Xen optimised this for Intel though and my patching wasn''t necessary anymore? Or maybe the OP is running a version of xen that doesn''t have that feature??So Xen is patched only for Intel, and therefore I missed out again since I am on AMD, or my xen doesn''t have this feature. xen_major : 4 xen_minor : 1 xen_extra : .3 All software is Debian Testing current. Sure, I''d be happy to get additional performance from further tuning, but I''m seriously happy at this stage of things... Now all I need to do is get a modem working with a serial port over ethernet device.... That I''ll leave to another day/thread :) Regards, Adam -- Adam Goryachev Website Managers Ph: +61 2 8304 0000 adam@websitemanagers.com.au Fax: +61 2 8304 0001 www.websitemanagers.com.au
Ian Campbell
2012-Sep-19 13:12 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
On Wed, 2012-09-19 at 14:00 +0100, James Harper wrote:> > > > On Wed, 2012-09-19 at 13:19 +0100, Adam Goryachev wrote: > > > On 19/09/12 19:43, Adam Goryachev wrote: > > > > I''ve updated to SP2, and will re-test in the next few hours, > > > > > > Excellent news, the issue is completely resolved, and in fact, > > > performance is now better than vmware, and better than the old > > > original physical box pre-virtualisation. The entire task now > > > completed in 30 minutes, the previous best was 40 minutes (ie, prior to > > xen). > > > > Excellent news! > > > > James, since you suggested it, do you happen to know what it is about > > pre-SP2 W2K3 that is so bad? Do those versions beat on the TPR or > > something else? > > > > (Just curious) > > > > Yes it will almost certainly be TPR access. I actually thought MS made > the change in SP1 but maybe it was SP2 (don''t know if the OP upgraded > from RTM to sp2 or sp1 to sp2, but quite possibly I just don''t > remember :) > > The change they made means that the TPR doesn''t get touched at all > anymore, so is much faster. > > For prior versions (and XP and 2000), GPLPV can patch AMD systems to > use the CR8 (I think) registr for TPR access which is much faster. For > Intel, the best I could do was cache TPR so that reads were fast... > it''s still quite a speedup. I thought Xen optimised this for Intel > though and my patching wasn''t necessary anymore? Or maybe the OP is > running a version of xen that doesn''t have that feature??I think there was a h/w feature introduced at some point on both AMD and Intel which also optimised these vmexits away. I''m not 100% sure of the specifics though. Ian
James Harper
2012-Sep-20 12:42 UTC
Re: SOLVED - Poor Windows 2003 + GPLPV performance compared to VMWare
> > I thought Xen optimised this for Intel though and my patching wasn''t > necessary anymore? Or maybe the OP is running a version of xen that > doesn''t have that feature?? > So Xen is patched only for Intel, and therefore I missed out again since I am > on AMD, or my xen doesn''t have this feature. > xen_major : 4 > xen_minor : 1 > xen_extra : .3 >With 2003 sp2, none of the optimisation is necessary as 2003sp2 doesn''t use the TPR at all. James