Peter Maloney
2012-Oct-20 17:21 UTC
xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
I ran a bisect to find out when Windows XP 32 bit becomes unusably slow. And I found the changeset that caused it. =========The problem: ========= Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions. Windows XP 32 bit runs unusably slow in anything new that I built from xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is running slow, "xm top" or "xl top" show cpu usage around 650% for the domu. The bug might be AMD specific. I''m running an AMD FX-8150. =========The result: ========= good: 24769:730f6ed72d70 bad: 24770:7f79475d3de7 The change was 8 months ago changeset: 24770:7f79475d3de7 user: Andres Lagar-Cavilla <andres@lagarcavilla.org> date: Fri Feb 10 16:07:07 2012 +0000 summary: x86/mm: Make p2m lookups fully synchronized wrt modifications =========My hardware: ========= AMD FX-8150 990 FX chipset Here''s a dmidecode: http://pastebin.com/XUZjmiVz =========My kernel: ========= I compiled the for-linus branch of cmason''s linux-btrfs git repo, around August 11th ( git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus ) peter:~/xen # uname -a Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64 x86_64 x86_64 GNU/Linux Here''s the kernel config: http://pastebin.com/1GQbiFZE (only weird thing I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or 256) =========My Windows XP VM config: ========= # grep -vE "^#|^$" windowsxp2 name="windowsxp2" description="None" uuid="292b0651-9913-2459-5cfa-fb828f9c4314" memory=4096 maxmem=4096 vcpus=7 on_poweroff="destroy" on_reboot="restart" on_crash="destroy" localtime=1 keymap="en-us" builder="hvm" device_model="/usr/lib/xen/bin/qemu-dm" kernel="/usr/lib/xen/boot/hvmloader" boot="c" disk=[ ''phy:/dev/data/winxp1_disk1,hda,w'', ''file:/var/lib/xen/winxp1_disk2.raw,hdb,w'', ] vif=[ ''mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000'', ] sdl=0 vnc=1 vncunused=1 audio=0 soundhw=''es1370'' viridian=1 usb=1 acpi=1 apic=0 pae=1 usbdevice=''tablet'' serial="pty" stdvga=1 gfx_passthru=0 # this is an AMD Radeon HD 6770 and it''s HDMI audio, and 2 USB ports pci = [ ''04:00.0'' , ''04:00.1'' , ''00:12.0'' , ''00:12.2'' ] xen_platform_pci=1 pci_msitranslate=1 The Windows 8 32 and 64 bit configs I used are the same except changed mac address, and different disk. Whether or not I use sound or PCI passthrough doesn''t (significantly) affect performance. =========my build process, including how to hack the build so it actually compiles: ========= # Install older libyajl-devel On openSUSE, this would be: zypper install libyajl1-devel # Delete everything (except .hg)... prevents unclean builds from breaking things. make distclean is not enough for very many builds. cd xen-unstable.hg rm -rf * # If you have permission denied errors (caused by running make install as root earlier), make sure to use chown and run rm again, or builds will fail. # Check out the revision hg update --clean "${build}" # hack up a troublesome Makefile that prevents builds vim tools/libxl/Makefile add "-lyajl": at the end of all 4 "$(CC) ..." lines to LIBXL_LIBS to LIBXLU_LIBS to LIBUUID_LIBS (don''t know which ones are important... but it works with all of it) make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status if [ -e configure ]; then ./configure else touch .config fi make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status =========my install process ========= To install the build, it''s important to clean out old lib files... uninstall doesn''t get them all. If you miss these, xm, xl, etc. may fail due to shared library issues. Also, "make uninstall" deletes important system files it should not (kernel, kernel modules, vm disks). As it says in the "make help": uninstall - attempt to remove installed Xen tools (use with extreme care!) Here is my process to solve the uninstall issues: http://pastebin.com/nXCavFTp
Peter Maloney
2012-Oct-20 18:40 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
And so Pasi suggested on IRC that I try with 2 vcpus, and in this situation it still runs slow, but it''s usably slow. When it first boots, "xl top" shows pretty high cpu usage, and then it goes down eventually and it''s harder to notice it is slower than usual. By comparison, with 4-8 vcpus, it is unbearably slow, and would take you probably 10 minutes to even log in, but also the cpu would go down over time. And also with 2 vcpus, in task manager, you can see the processes using CPU seem to be using much more than they should. So when the cpu usage is lower later on, while the system is still idle, a bunch of processes are using 2 or 3% each adding up to 20-50% (fluctuating). And with 2 vcpus, the mouse seems faster. And then I tested minecraft, which runs only 5-20 fps. So it''s definitely still slow, but usable for non-3d stuff. On 10/20/2012 07:21 PM, Peter Maloney wrote:> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow. > And I found the changeset that caused it. > > =========> The problem: > =========> > Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions. > > Windows XP 32 bit runs unusably slow in anything new that I built from > xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is > running slow, "xm top" or "xl top" show cpu usage around 650% for the domu. > > The bug might be AMD specific. I''m running an AMD FX-8150. > > =========> The result: > =========> > good: 24769:730f6ed72d70 > bad: 24770:7f79475d3de7 > > The change was 8 months ago > > changeset: 24770:7f79475d3de7 > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > date: Fri Feb 10 16:07:07 2012 +0000 > summary: x86/mm: Make p2m lookups fully synchronized wrt modifications > > =========> My hardware: > =========> > AMD FX-8150 > 990 FX chipset > > Here''s a dmidecode: http://pastebin.com/XUZjmiVz > > =========> My kernel: > =========> > I compiled the for-linus branch of cmason''s linux-btrfs git repo, around > August 11th ( > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git > for-linus ) > > peter:~/xen # uname -a > Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64 > x86_64 x86_64 GNU/Linux > > Here''s the kernel config: http://pastebin.com/1GQbiFZE (only weird thing > I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or > 256) > > =========> My Windows XP VM config: > =========> > # grep -vE "^#|^$" windowsxp2 > name="windowsxp2" > description="None" > uuid="292b0651-9913-2459-5cfa-fb828f9c4314" > memory=4096 > maxmem=4096 > vcpus=7 > on_poweroff="destroy" > on_reboot="restart" > on_crash="destroy" > localtime=1 > keymap="en-us" > builder="hvm" > device_model="/usr/lib/xen/bin/qemu-dm" > kernel="/usr/lib/xen/boot/hvmloader" > boot="c" > disk=[ ''phy:/dev/data/winxp1_disk1,hda,w'', > ''file:/var/lib/xen/winxp1_disk2.raw,hdb,w'', ] > vif=[ ''mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000'', ] > sdl=0 > vnc=1 > vncunused=1 > audio=0 > soundhw=''es1370'' > viridian=1 > usb=1 > acpi=1 > apic=0 > pae=1 > usbdevice=''tablet'' > serial="pty" > stdvga=1 > gfx_passthru=0 > # this is an AMD Radeon HD 6770 and it''s HDMI audio, and 2 USB ports > pci = [ ''04:00.0'' , ''04:00.1'' , ''00:12.0'' , ''00:12.2'' ] > xen_platform_pci=1 > pci_msitranslate=1 > > > The Windows 8 32 and 64 bit configs I used are the same except changed > mac address, and different disk. > > Whether or not I use sound or PCI passthrough doesn''t (significantly) > affect performance. > > > =========> my build process, including how to hack the build so it actually compiles: > =========> > # Install older libyajl-devel > > On openSUSE, this would be: > > zypper install libyajl1-devel > > # Delete everything (except .hg)... prevents unclean builds from > breaking things. make distclean is not enough for very many builds. > cd xen-unstable.hg > rm -rf * > # If you have permission denied errors (caused by running make install > as root earlier), make sure to use chown and run rm again, or builds > will fail. > > # Check out the revision > hg update --clean "${build}" > > # hack up a troublesome Makefile that prevents builds > vim tools/libxl/Makefile > add "-lyajl": > at the end of all 4 "$(CC) ..." lines > to LIBXL_LIBS > to LIBXLU_LIBS > to LIBUUID_LIBS > > (don''t know which ones are important... but it works with all of it) > > make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status > > if [ -e configure ]; then > ./configure > else > touch .config > fi > > make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status > > > =========> my install process > =========> > To install the build, it''s important to clean out old lib files... > uninstall doesn''t get them all. If you miss these, xm, xl, etc. may fail > due to shared library issues. > > Also, "make uninstall" deletes important system files it should not > (kernel, kernel modules, vm disks). > > As it says in the "make help": > uninstall - attempt to remove installed Xen tools > (use with extreme care!) > > Here is my process to solve the uninstall issues: > http://pastebin.com/nXCavFTp > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Andres Lagar-Cavilla
2012-Oct-22 13:56 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On Oct 20, 2012, at 1:21 PM, Peter Maloney <peter.maloney@brockmann-consult.de> wrote:> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow. > And I found the changeset that caused it. > > =========> The problem: > =========> > Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions. > > Windows XP 32 bit runs unusably slow in anything new that I built from > xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is > running slow, "xm top" or "xl top" show cpu usage around 650% for the domu. > > The bug might be AMD specific. I''m running an AMD FX-8150. > > =========> The result: > =========> > good: 24769:730f6ed72d70 > bad: 24770:7f79475d3de7 > > The change was 8 months ago > > changeset: 24770:7f79475d3de7 > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > date: Fri Feb 10 16:07:07 2012 +0000 > summary: x86/mm: Make p2m lookups fully synchronized wrt modificationsPeter, thanks for the bug report and the bisection. vcpus (and guest processes) look like they''re chewing CPU because they''re spending cycles within the hypervisor contending for spin locks. In the 4.2 time frame we had a similar report and we partially reverted the change set you mention to use read/write locks, ameliorating contention. It''s obviously critical to figure which code path is win xp exercising wrt the p2m lock. There are a number of profiling tools out there, so please go ahead with your favorite one to figure out what the vcpu''s are doing in hypervisor context. If unsure, my advice, in terms of quick initial turnaround, would be to xl dmesg -c for i in a_number_of_times; do xl debug-keys d; xl dmesg -c; done; This is gonna dump stack traces for all scheduled vcpus. We should be able to see the stack traces for your domU vcpus, and through sampling quickly infer where they are spending most of their time. Let us know what you find. Thanks Andres> > =========> My hardware: > =========> > AMD FX-8150 > 990 FX chipset > > Here''s a dmidecode: http://pastebin.com/XUZjmiVz > > =========> My kernel: > =========> > I compiled the for-linus branch of cmason''s linux-btrfs git repo, around > August 11th ( > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git > for-linus ) > > peter:~/xen # uname -a > Linux peter 3.5.0-1-default+ #3 SMP Sat Aug 11 21:30:44 CEST 2012 x86_64 > x86_64 x86_64 GNU/Linux > > Here''s the kernel config: http://pastebin.com/1GQbiFZE (only weird thing > I set was CONFIG_NR_CPUS=16 for no particular reason; default was 512 or > 256) > > =========> My Windows XP VM config: > =========> > # grep -vE "^#|^$" windowsxp2 > name="windowsxp2" > description="None" > uuid="292b0651-9913-2459-5cfa-fb828f9c4314" > memory=4096 > maxmem=4096 > vcpus=7 > on_poweroff="destroy" > on_reboot="restart" > on_crash="destroy" > localtime=1 > keymap="en-us" > builder="hvm" > device_model="/usr/lib/xen/bin/qemu-dm" > kernel="/usr/lib/xen/boot/hvmloader" > boot="c" > disk=[ ''phy:/dev/data/winxp1_disk1,hda,w'', > ''file:/var/lib/xen/winxp1_disk2.raw,hdb,w'', ] > vif=[ ''mac=00:16:3e:4e:c5:0c,bridge=br0,model=e1000'', ] > sdl=0 > vnc=1 > vncunused=1 > audio=0 > soundhw=''es1370'' > viridian=1 > usb=1 > acpi=1 > apic=0 > pae=1 > usbdevice=''tablet'' > serial="pty" > stdvga=1 > gfx_passthru=0 > # this is an AMD Radeon HD 6770 and it''s HDMI audio, and 2 USB ports > pci = [ ''04:00.0'' , ''04:00.1'' , ''00:12.0'' , ''00:12.2'' ] > xen_platform_pci=1 > pci_msitranslate=1 > > > The Windows 8 32 and 64 bit configs I used are the same except changed > mac address, and different disk. > > Whether or not I use sound or PCI passthrough doesn''t (significantly) > affect performance. > > > =========> my build process, including how to hack the build so it actually compiles: > =========> > # Install older libyajl-devel > > On openSUSE, this would be: > > zypper install libyajl1-devel > > # Delete everything (except .hg)... prevents unclean builds from > breaking things. make distclean is not enough for very many builds. > cd xen-unstable.hg > rm -rf * > # If you have permission denied errors (caused by running make install > as root earlier), make sure to use chown and run rm again, or builds > will fail. > > # Check out the revision > hg update --clean "${build}" > > # hack up a troublesome Makefile that prevents builds > vim tools/libxl/Makefile > add "-lyajl": > at the end of all 4 "$(CC) ..." lines > to LIBXL_LIBS > to LIBXLU_LIBS > to LIBUUID_LIBS > > (don''t know which ones are important... but it works with all of it) > > make distclean >/tmp/xen.distclean.log 2>&1 ; status=$? ; echo $status > > if [ -e configure ]; then > ./configure > else > touch .config > fi > > make dist >/tmp/xen.dist.log 2>&1 ; status=$? ; echo $status > > > =========> my install process > =========> > To install the build, it''s important to clean out old lib files... > uninstall doesn''t get them all. If you miss these, xm, xl, etc. may fail > due to shared library issues. > > Also, "make uninstall" deletes important system files it should not > (kernel, kernel modules, vm disks). > > As it says in the "make help": > uninstall - attempt to remove installed Xen tools > (use with extreme care!) > > Here is my process to solve the uninstall issues: > http://pastebin.com/nXCavFTp
Tim Deegan
2012-Oct-22 13:59 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote:> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow. > And I found the changeset that caused it. > > =========> The problem: > =========> > Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions. > > Windows XP 32 bit runs unusably slow in anything new that I built from > xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is > running slow, "xm top" or "xl top" show cpu usage around 650% for the domu. > > The bug might be AMD specific. I''m running an AMD FX-8150.The bug does seem to be AMD-specific, and NPT-specific; with ''hap=0'' it goes much faster.> =========> The result: > =========> > good: 24769:730f6ed72d70 > bad: 24770:7f79475d3de7 > > The change was 8 months ago > > changeset: 24770:7f79475d3de7 > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > date: Fri Feb 10 16:07:07 2012 +0000 > summary: x86/mm: Make p2m lookups fully synchronized wrt modificationsThis change was bad for performnace across the board and most of it has since been either reverted or amended, but clearly we missed something here. It''s interesting that Win8 isn''t slowed down. I wonder whether that''s to do with the way it drives the VGA card -- IIRC it uses a generic VESA driver rather than a Cirrus one. Tim.
Peter Maloney
2012-Oct-23 22:17 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On 10/22/2012 03:59 PM, Tim Deegan wrote:> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >> I ran a bisect to find out when Windows XP 32 bit becomes unusably slow. >> And I found the changeset that caused it. >> >> =========>> The problem: >> =========>> >> Windows 8 64 bit and 32 bit run fast and fine in the newest xen versions. >> >> Windows XP 32 bit runs unusably slow in anything new that I built from >> xen-unstable, but runs fast in 4.1.2 and 4.1.3 stable. While it is >> running slow, "xm top" or "xl top" show cpu usage around 650% for the domu. >> >> The bug might be AMD specific. I''m running an AMD FX-8150. > The bug does seem to be AMD-specific, and NPT-specific; with > ''hap=0'' it goes much faster.K. glad to hear it ;) I just guessed it was AMD specific since not so many of us seem to run AMDs and it seemed to be only me with the problem.>> =========>> The result: >> =========>> >> good: 24769:730f6ed72d70 >> bad: 24770:7f79475d3de7 >> >> The change was 8 months ago >> >> changeset: 24770:7f79475d3de7 >> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> date: Fri Feb 10 16:07:07 2012 +0000 >> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications > This change was bad for performnace across the board and most of it has > since been either reverted or amended, but clearly we missed something > here. > > It''s interesting that Win8 isn''t slowed down. I wonder whether that''s to > do with the way it drives the VGA card -- IIRC it uses a generic VESA > driver rather than a Cirrus one.My tests included - passthrough and "stdvga=1" which replaces the cirrus one with some other card. - no passthrougn, and did not set stdvga (used vnc and cirrus) Both were slow.> > Tim. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2012-Nov-01 17:00 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
Hi, At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote:> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: > > The change was 8 months ago > > > > changeset: 24770:7f79475d3de7 > > user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > > date: Fri Feb 10 16:07:07 2012 +0000 > > summary: x86/mm: Make p2m lookups fully synchronized wrt modifications > > This change was bad for performnace across the board and most of it has > since been either reverted or amended, but clearly we missed something > here. > > It''s interesting that Win8 isn''t slowed down. I wonder whether that''s to > do with the way it drives the VGA card -- IIRC it uses a generic VESA > driver rather than a Cirrus one.In fact this is to do with the APIC. On my test system, a busy 2-vcpu VM is making about 300k/s accesses to the APIC TPR. These accesses are all trapped and emulated by Xen, and that emulation has got more expensive as part of this change. Later Windows OSes have a feature called ''lazy IRQL'' which makes those accesses go away, but sadly that''s not been done for WinXP. On modern Intel CPUs, the hardware acceleration for TPR accesses works for XP; on AMD it requires the OS to use ''MOV reg32, CR8'' to access the TPR instead of MMIO, which XP is clearly not doing. :( Peter: if you have the option, you might find that installing the PV drivers that ship with Citrix XenServer 6.0 makes things work better. Andres: even though this load of APIC emulations is pretty extreme, it''s surprising that the VM runs faster on shadow pagetables! Any ideas for where this slowdown is coming from? Cheers, Tim.
Andres Lagar-Cavilla
2012-Nov-01 17:28 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote:> Hi, > > At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>> The change was 8 months ago >>> >>> changeset: 24770:7f79475d3de7 >>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>> date: Fri Feb 10 16:07:07 2012 +0000 >>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >> >> This change was bad for performnace across the board and most of it has >> since been either reverted or amended, but clearly we missed something >> here. >> >> It''s interesting that Win8 isn''t slowed down. I wonder whether that''s to >> do with the way it drives the VGA card -- IIRC it uses a generic VESA >> driver rather than a Cirrus one. > > In fact this is to do with the APIC. On my test system, a busy 2-vcpu > VM is making about 300k/s accesses to the APIC TPR. These accesses are > all trapped and emulated by Xen, and that emulation has got more > expensive as part of this change. > > Later Windows OSes have a feature called ''lazy IRQL'' which makes those > accesses go away, but sadly that''s not been done for WinXP. On modern > Intel CPUs, the hardware acceleration for TPR accesses works for XP; on > AMD it requires the OS to use ''MOV reg32, CR8'' to access the TPR instead > of MMIO, which XP is clearly not doing. :( > > Peter: if you have the option, you might find that installing the PV > drivers that ship with Citrix XenServer 6.0 makes things work better. > > Andres: even though this load of APIC emulations is pretty extreme, it''s > surprising that the VM runs faster on shadow pagetables! Any ideas for > where this slowdown is coming from?Not any immediate ideas without profiling. However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. How about the following patch, Peter, Tim? diff -r 5171750d133e xen/arch/x86/hvm/emulate.c --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -60,24 +60,28 @@ static int hvmemul_do_io( ioreq_t *p = get_ioreq(curr); unsigned long ram_gfn = paddr_to_pfn(ram_gpa); p2m_type_t p2mt; - struct page_info *ram_page; + struct page_info *ram_page = NULL; int rc; /* Check for paged out page */ - ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); - if ( p2m_is_paging(p2mt) ) + if ( ram_gpa != INVALID_MFN ) { - if ( ram_page ) - put_page(ram_page); - p2m_mem_paging_populate(curr->domain, ram_gfn); - return X86EMUL_RETRY; - } - if ( p2m_is_shared(p2mt) ) - { - if ( ram_page ) - put_page(ram_page); - return X86EMUL_RETRY; - } + ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); + if ( p2m_is_paging(p2mt) ) + { + if ( ram_page ) + put_page(ram_page); + p2m_mem_paging_populate(curr->domain, ram_gfn); + return X86EMUL_RETRY; + } + if ( p2m_is_shared(p2mt) ) + { + if ( ram_page ) + put_page(ram_page); + return X86EMUL_RETRY; + } + } else + value = 0; /* for pvalue */ /* * Weird-sized accesses have undefined behaviour: we discard writes @@ -455,7 +459,7 @@ static int __hvmemul_read( return X86EMUL_UNHANDLEABLE; gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off); if ( (off + bytes) <= PAGE_SIZE ) - return hvmemul_do_mmio(gpa, &reps, bytes, 0, + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, p_data); } @@ -480,7 +484,8 @@ static int __hvmemul_read( addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt); if ( rc != X86EMUL_OKAY ) return rc; - return hvmemul_do_mmio(gpa, &reps, bytes, 0, IOREQ_READ, 0, p_data); + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, + IOREQ_READ, 0, p_data); case HVMCOPY_gfn_paged_out: return X86EMUL_RETRY; case HVMCOPY_gfn_shared: @@ -552,7 +557,7 @@ static int hvmemul_write( unsigned int off = addr & (PAGE_SIZE - 1); gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off); if ( (off + bytes) <= PAGE_SIZE ) - return hvmemul_do_mmio(gpa, &reps, bytes, 0, + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, p_data); } @@ -573,7 +578,7 @@ static int hvmemul_write( addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt); if ( rc != X86EMUL_OKAY ) return rc; - return hvmemul_do_mmio(gpa, &reps, bytes, 0, + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, p_data); case HVMCOPY_gfn_paged_out: return X86EMUL_RETRY; @@ -804,7 +809,7 @@ static int hvmemul_read_io( { unsigned long reps = 1; *val = 0; - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val); + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, val); } static int hvmemul_write_io( @@ -814,7 +819,7 @@ static int hvmemul_write_io( struct x86_emulate_ctxt *ctxt) { unsigned long reps = 1; - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val); + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, &val); } static int hvmemul_read_cr( diff -r 5171750d133e xen/arch/x86/hvm/io.c --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -231,7 +231,7 @@ int handle_pio(uint16_t port, int size, if ( dir == IOREQ_WRITE ) data = guest_cpu_user_regs()->eax; - rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data); + rc = hvmemul_do_pio(port, &reps, size, INVALID_MFN, dir, 0, &data); switch ( rc ) {> > Cheers, > > Tim.
Peter Maloney
2012-Nov-13 13:17 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On 2012-11-01 18:28, Andres Lagar-Cavilla wrote:> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: > >> Hi, >> >> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>>> The change was 8 months ago >>>> >>>> changeset: 24770:7f79475d3de7 >>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>> date: Fri Feb 10 16:07:07 2012 +0000 >>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >>> This change was bad for performnace across the board and most of it has >>> since been either reverted or amended, but clearly we missed something >>> here. >>> >>> It''s interesting that Win8 isn''t slowed down. I wonder whether that''s to >>> do with the way it drives the VGA card -- IIRC it uses a generic VESA >>> driver rather than a Cirrus one. >> In fact this is to do with the APIC. On my test system, a busy 2-vcpu >> VM is making about 300k/s accesses to the APIC TPR. These accesses are >> all trapped and emulated by Xen, and that emulation has got more >> expensive as part of this change. >> >> Later Windows OSes have a feature called ''lazy IRQL'' which makes those >> accesses go away, but sadly that''s not been done for WinXP. On modern >> Intel CPUs, the hardware acceleration for TPR accesses works for XP; on >> AMD it requires the OS to use ''MOV reg32, CR8'' to access the TPR instead >> of MMIO, which XP is clearly not doing. :( >> >> Peter: if you have the option, you might find that installing the PV >> drivers that ship with Citrix XenServer 6.0 makes things work better. >> >> Andres: even though this load of APIC emulations is pretty extreme, it''s >> surprising that the VM runs faster on shadow pagetables! Any ideas for >> where this slowdown is coming from? > Not any immediate ideas without profiling. > > However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. > > How about the following patch, Peter, Tim?Thanks, I''ll give it a try sometime this week I guess.> diff -r 5171750d133e xen/arch/x86/hvm/emulate.c > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -60,24 +60,28 @@ static int hvmemul_do_io( > ioreq_t *p = get_ioreq(curr); > unsigned long ram_gfn = paddr_to_pfn(ram_gpa); > p2m_type_t p2mt; > - struct page_info *ram_page; > + struct page_info *ram_page = NULL; > int rc; > > /* Check for paged out page */ > - ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); > - if ( p2m_is_paging(p2mt) ) > + if ( ram_gpa != INVALID_MFN ) > { > - if ( ram_page ) > - put_page(ram_page); > - p2m_mem_paging_populate(curr->domain, ram_gfn); > - return X86EMUL_RETRY; > - } > - if ( p2m_is_shared(p2mt) ) > - { > - if ( ram_page ) > - put_page(ram_page); > - return X86EMUL_RETRY; > - } > + ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); > + if ( p2m_is_paging(p2mt) ) > + { > + if ( ram_page ) > + put_page(ram_page); > + p2m_mem_paging_populate(curr->domain, ram_gfn); > + return X86EMUL_RETRY; > + } > + if ( p2m_is_shared(p2mt) ) > + { > + if ( ram_page ) > + put_page(ram_page); > + return X86EMUL_RETRY; > + } > + } else > + value = 0; /* for pvalue */ > > /* > * Weird-sized accesses have undefined behaviour: we discard writes > @@ -455,7 +459,7 @@ static int __hvmemul_read( > return X86EMUL_UNHANDLEABLE; > gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off); > if ( (off + bytes) <= PAGE_SIZE ) > - return hvmemul_do_mmio(gpa, &reps, bytes, 0, > + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, > IOREQ_READ, 0, p_data); > } > > @@ -480,7 +484,8 @@ static int __hvmemul_read( > addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt); > if ( rc != X86EMUL_OKAY ) > return rc; > - return hvmemul_do_mmio(gpa, &reps, bytes, 0, IOREQ_READ, 0, p_data); > + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, > + IOREQ_READ, 0, p_data); > case HVMCOPY_gfn_paged_out: > return X86EMUL_RETRY; > case HVMCOPY_gfn_shared: > @@ -552,7 +557,7 @@ static int hvmemul_write( > unsigned int off = addr & (PAGE_SIZE - 1); > gpa = (((paddr_t)vio->mmio_gpfn << PAGE_SHIFT) | off); > if ( (off + bytes) <= PAGE_SIZE ) > - return hvmemul_do_mmio(gpa, &reps, bytes, 0, > + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, > IOREQ_WRITE, 0, p_data); > } > > @@ -573,7 +578,7 @@ static int hvmemul_write( > addr, &gpa, bytes, &reps, pfec, hvmemul_ctxt); > if ( rc != X86EMUL_OKAY ) > return rc; > - return hvmemul_do_mmio(gpa, &reps, bytes, 0, > + return hvmemul_do_mmio(gpa, &reps, bytes, INVALID_MFN, > IOREQ_WRITE, 0, p_data); > case HVMCOPY_gfn_paged_out: > return X86EMUL_RETRY; > @@ -804,7 +809,7 @@ static int hvmemul_read_io( > { > unsigned long reps = 1; > *val = 0; > - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_READ, 0, val); > + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_READ, 0, val); > } > > static int hvmemul_write_io( > @@ -814,7 +819,7 @@ static int hvmemul_write_io( > struct x86_emulate_ctxt *ctxt) > { > unsigned long reps = 1; > - return hvmemul_do_pio(port, &reps, bytes, 0, IOREQ_WRITE, 0, &val); > + return hvmemul_do_pio(port, &reps, bytes, INVALID_MFN, IOREQ_WRITE, 0, &val); > } > > static int hvmemul_read_cr( > diff -r 5171750d133e xen/arch/x86/hvm/io.c > --- a/xen/arch/x86/hvm/io.c > +++ b/xen/arch/x86/hvm/io.c > @@ -231,7 +231,7 @@ int handle_pio(uint16_t port, int size, > if ( dir == IOREQ_WRITE ) > data = guest_cpu_user_regs()->eax; > > - rc = hvmemul_do_pio(port, &reps, size, 0, dir, 0, &data); > + rc = hvmemul_do_pio(port, &reps, size, INVALID_MFN, dir, 0, &data); > > switch ( rc ) > { > >
Peter Maloney
2012-Nov-22 18:54 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On 11/13/2012 02:17 PM, Peter Maloney wrote:> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: >> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: >> >>> Hi, >>> >>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>>>> The change was 8 months ago >>>>> >>>>> changeset: 24770:7f79475d3de7 >>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>>> date: Fri Feb 10 16:07:07 2012 +0000 >>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >>> [...] >> Not any immediate ideas without profiling. >> >> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. >> >> How about the following patch, Peter, Tim? >I tried the patch applied to xen-unstable 4.2.0-branched 528f0708b6db+ 4.2.0-branched It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus it was slow, but fast enough that I could bother to log in and out during the test. Attached are logs generated with this command (using xm instead of xl): for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog xenxp_xm_dmesg_-c_7cpus_idle.log xenxp_xm_dmesg_-c_7cpus_logintooslow.log xenxp_xm_dmesg_-c_7cpus_shutdown.log xenxp_xm_dmesg_-c_duringlogin.log xenxp_xm_dmesg_-c_idling_login_screen.log Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w and p in case it''s relevant. BTW this time I am testing with kernel 3.6.7 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Peter Maloney
2013-Jan-12 15:25 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On 11/22/2012 07:54 PM, Peter Maloney wrote:> On 11/13/2012 02:17 PM, Peter Maloney wrote: >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: >>> >>>> Hi, >>>> >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>>>>> The change was 8 months ago >>>>>> >>>>>> changeset: 24770:7f79475d3de7 >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>>>> date: Fri Feb 10 16:07:07 2012 +0000 >>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >>>> [...] >>> Not any immediate ideas without profiling. >>> >>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. >>> >>> How about the following patch, Peter, Tim? > I tried the patch applied to xen-unstable 4.2.0-branched > 528f0708b6db+ 4.2.0-branched > > It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus > it was slow, but fast enough that I could bother to log in and out > during the test. > > Attached are logs generated with this command (using xm instead of xl): > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog > > xenxp_xm_dmesg_-c_7cpus_idle.log > xenxp_xm_dmesg_-c_7cpus_logintooslow.log > xenxp_xm_dmesg_-c_7cpus_shutdown.log > xenxp_xm_dmesg_-c_duringlogin.log > xenxp_xm_dmesg_-c_idling_login_screen.log > > Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w > and p in case it''s relevant. > > BTW this time I am testing with kernel 3.6.7 >I also tested 4.2.1 now, and it has the same problem. And after using it for a while with windows 8 (playing games), I get the general feel that it is laggier than with 4.1.3. And now I''m using 4.1.4 which is fast like 4.1.3. So any ideas on how to fix this or gather more useful information?
Pasi Kärkkäinen
2013-Jan-17 20:57 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
Hello, George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it? It''s a serious HVM/winxp performance regression on AMD.. -- Pasi On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote:> On 11/22/2012 07:54 PM, Peter Maloney wrote: > > On 11/13/2012 02:17 PM, Peter Maloney wrote: > >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: > >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: > >>> > >>>> Hi, > >>>> > >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: > >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: > >>>>>> The change was 8 months ago > >>>>>> > >>>>>> changeset: 24770:7f79475d3de7 > >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > >>>>>> date: Fri Feb 10 16:07:07 2012 +0000 > >>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications > >>>> [...] > >>> Not any immediate ideas without profiling. > >>> > >>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. > >>> > >>> How about the following patch, Peter, Tim? > > I tried the patch applied to xen-unstable 4.2.0-branched > > 528f0708b6db+ 4.2.0-branched > > > > It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus > > it was slow, but fast enough that I could bother to log in and out > > during the test. > > > > Attached are logs generated with this command (using xm instead of xl): > > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog > > > > xenxp_xm_dmesg_-c_7cpus_idle.log > > xenxp_xm_dmesg_-c_7cpus_logintooslow.log > > xenxp_xm_dmesg_-c_7cpus_shutdown.log > > xenxp_xm_dmesg_-c_duringlogin.log > > xenxp_xm_dmesg_-c_idling_login_screen.log > > > > Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w > > and p in case it''s relevant. > > > > BTW this time I am testing with kernel 3.6.7 > > > > I also tested 4.2.1 now, and it has the same problem. And after using it > for a while with windows 8 (playing games), I get the general feel that > it is laggier than with 4.1.3. And now I''m using 4.1.4 which is fast > like 4.1.3. > > So any ideas on how to fix this or gather more useful information? > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap
2013-Jan-18 14:22 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On 17/01/13 20:57, Pasi Kärkkäinen wrote:> Hello, > > George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it? > It''s a serious HVM/winxp performance regression on AMD..Hey Pasi -- thanks for bringing this thread to my attention. I had noticed a performance impact on AMD boxen myself, but investigating it had kind of gotten buried in more urgent tasks. Yes, I think we should track it. I''ll put it on the list. -George> > -- Pasi > > On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote: >> On 11/22/2012 07:54 PM, Peter Maloney wrote: >>> On 11/13/2012 02:17 PM, Peter Maloney wrote: >>>> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: >>>>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >>>>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>>>>>>> The change was 8 months ago >>>>>>>> >>>>>>>> changeset: 24770:7f79475d3de7 >>>>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>>>>>> date: Fri Feb 10 16:07:07 2012 +0000 >>>>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >>>>>> [...] >>>>> Not any immediate ideas without profiling. >>>>> >>>>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. >>>>> >>>>> How about the following patch, Peter, Tim? >>> I tried the patch applied to xen-unstable 4.2.0-branched >>> 528f0708b6db+ 4.2.0-branched >>> >>> It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus >>> it was slow, but fast enough that I could bother to log in and out >>> during the test. >>> >>> Attached are logs generated with this command (using xm instead of xl): >>> for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog >>> >>> xenxp_xm_dmesg_-c_7cpus_idle.log >>> xenxp_xm_dmesg_-c_7cpus_logintooslow.log >>> xenxp_xm_dmesg_-c_7cpus_shutdown.log >>> xenxp_xm_dmesg_-c_duringlogin.log >>> xenxp_xm_dmesg_-c_idling_login_screen.log >>> >>> Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w >>> and p in case it''s relevant. >>> >>> BTW this time I am testing with kernel 3.6.7 >>> >> >> I also tested 4.2.1 now, and it has the same problem. And after using it >> for a while with windows 8 (playing games), I get the general feel that >> it is laggier than with 4.1.3. And now I''m using 4.1.4 which is fast >> like 4.1.3. >> >> So any ideas on how to fix this or gather more useful information? >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
George Dunlap
2013-Jan-18 14:30 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On Sat, Jan 12, 2013 at 3:25 PM, Peter Maloney < peter.maloney@brockmann-consult.de> wrote:> On 11/22/2012 07:54 PM, Peter Maloney wrote: > > On 11/13/2012 02:17 PM, Peter Maloney wrote: > >> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: > >>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: > >>> > >>>> Hi, > >>>> > >>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: > >>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: > >>>>>> The change was 8 months ago > >>>>>> > >>>>>> changeset: 24770:7f79475d3de7 > >>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> > >>>>>> date: Fri Feb 10 16:07:07 2012 +0000 > >>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt > modifications > >>>> [...] > >>> Not any immediate ideas without profiling. > >>> > >>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa > address. We might be madly hitting the p2m locks for no reason there. > >>> > >>> How about the following patch, Peter, Tim? > > I tried the patch applied to xen-unstable 4.2.0-branched > > 528f0708b6db+ 4.2.0-branched > > > > It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus > > it was slow, but fast enough that I could bother to log in and out > > during the test. > > > > Attached are logs generated with this command (using xm instead of xl): > > for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog > > > > xenxp_xm_dmesg_-c_7cpus_idle.log > > xenxp_xm_dmesg_-c_7cpus_logintooslow.log > > xenxp_xm_dmesg_-c_7cpus_shutdown.log > > xenxp_xm_dmesg_-c_duringlogin.log > > xenxp_xm_dmesg_-c_idling_login_screen.log > > > > Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w > > and p in case it''s relevant. > > > > BTW this time I am testing with kernel 3.6.7 > > > > I also tested 4.2.1 now, and it has the same problem. And after using it > for a while with windows 8 (playing games), I get the general feel that > it is laggier than with 4.1.3. And now I''m using 4.1.4 which is fast > like 4.1.3. > > So any ideas on how to fix this or gather more useful information? >Pete, One thing that would be helpful is if we could have a quantifiable difference, other than "feels laggier". If this is related to the problem I saw a few months ago, running winXP and looking at "top" in qemu is pretty clear. If you have a bit of time, do you suppose you could try to look around for a freely-available benchmark that would give us some numbers for Windows 8? That might help us track down the problem better as well. I''ve put this on my 4.3 release tracking list, so it should get attention. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andres Lagar-Cavilla
2013-Jan-18 14:40 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
I unfortunately don''t have AMD hardware to test with. On our regular win7 EPT-based workloads we''ve seen no noticeable degradations. The answer to this one obviously lies in profiling the right sw/hw combo. Andres On Jan 18, 2013, at 9:22 AM, George Dunlap <george.dunlap@eu.citrix.com> wrote:> On 17/01/13 20:57, Pasi Kärkkäinen wrote: >> Hello, >> >> George: What do you think, should we add this bug to the Xen 4.3 status email for tracking it? >> It''s a serious HVM/winxp performance regression on AMD.. > > Hey Pasi -- thanks for bringing this thread to my attention. I had noticed a performance impact on AMD boxen myself, but investigating it had kind of gotten buried in more urgent tasks. Yes, I think we should track it. I''ll put it on the list. > > -George > >> >> -- Pasi >> >> On Sat, Jan 12, 2013 at 04:25:47PM +0100, Peter Maloney wrote: >>> On 11/22/2012 07:54 PM, Peter Maloney wrote: >>>> On 11/13/2012 02:17 PM, Peter Maloney wrote: >>>>> On 2012-11-01 18:28, Andres Lagar-Cavilla wrote: >>>>>> On Nov 1, 2012, at 1:00 PM, Tim Deegan <tim@xen.org> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> At 14:59 +0100 on 22 Oct (1350917960), Tim Deegan wrote: >>>>>>>> At 19:21 +0200 on 20 Oct (1350760876), Peter Maloney wrote: >>>>>>>>> The change was 8 months ago >>>>>>>>> >>>>>>>>> changeset: 24770:7f79475d3de7 >>>>>>>>> user: Andres Lagar-Cavilla <andres@lagarcavilla.org> >>>>>>>>> date: Fri Feb 10 16:07:07 2012 +0000 >>>>>>>>> summary: x86/mm: Make p2m lookups fully synchronized wrt modifications >>>>>>> [...] >>>>>> Not any immediate ideas without profiling. >>>>>> >>>>>> However, most callers of hvmemul_do_io pass a stub zero ram_gpa address. We might be madly hitting the p2m locks for no reason there. >>>>>> >>>>>> How about the following patch, Peter, Tim? >>>> I tried the patch applied to xen-unstable 4.2.0-branched >>>> 528f0708b6db+ 4.2.0-branched >>>> >>>> It seemed the same. It was extremely slow with 7 vcpus, and with 2 vcpus >>>> it was slow, but fast enough that I could bother to log in and out >>>> during the test. >>>> >>>> Attached are logs generated with this command (using xm instead of xl): >>>> for i in {1..30}; do xm debug-keys d; xm dmesg -c; done >> nameoflog >>>> >>>> xenxp_xm_dmesg_-c_7cpus_idle.log >>>> xenxp_xm_dmesg_-c_7cpus_logintooslow.log >>>> xenxp_xm_dmesg_-c_7cpus_shutdown.log >>>> xenxp_xm_dmesg_-c_duringlogin.log >>>> xenxp_xm_dmesg_-c_idling_login_screen.log >>>> >>>> Also there is xenxp_dmesg.log which is output from hitting alt+sysrq+w >>>> and p in case it''s relevant. >>>> >>>> BTW this time I am testing with kernel 3.6.7 >>>> >>> >>> I also tested 4.2.1 now, and it has the same problem. And after using it >>> for a while with windows 8 (playing games), I get the general feel that >>> it is laggier than with 4.1.3. And now I''m using 4.1.4 which is fast >>> like 4.1.3. >>> >>> So any ideas on how to fix this or gather more useful information? >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >
George Dunlap
2013-Jan-21 12:07 UTC
Re: xen-unstable, winxp32 very poor performance on AMD FX-8150, I bisected and changeset is 24770:7f79475d3de7
On Fri, Jan 18, 2013 at 2:40 PM, Andres Lagar-Cavilla < andreslc@gridcentric.ca> wrote:> I unfortunately don''t have AMD hardware to test with. On our regular win7 > EPT-based workloads we''ve seen no noticeable degradations. > > The answer to this one obviously lies in profiling the right sw/hw combo. >It seems like given the nature of the work you''re doing, having at least a couple of AMD boxes to test on would make sense. It would be a shame for your company to lose a big deal because (for example) a potential customer had just bought 100''s of AMD boxes, and your extensions just had terrible performance on their already-paid-for-and-installed hardware. I''ve got a box here that exhibits the behavior (or something like it anyway), but I probably wouldn''t have a chance to look at it until the end of February at the earliest. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel