Fabio Fantoni
2013-Nov-21 10:52 UTC
Test if on newer xen all SSE2 and SSE3 instructions are effectively working
I''m trying to test if on newer xen all SSE2 and SSE3 instructions are effectively working. I tried this simple program to test SSE2: http://forum.nasm.us/index.php?topic=1605.0 But probably use only instructions with short operand because SSE2 on this program is working also on old xen 4.0 where Jan Beulich patches to support long operands are missing. Are there any minimal program to test if SSE instructions with MMIO operands > 8 byte are working? Thanks for any reply.
George Dunlap
2013-Nov-21 15:12 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote:> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are > effectively working. > I tried this simple program to test SSE2: > http://forum.nasm.us/index.php?topic=1605.0 > But probably use only instructions with short operand because SSE2 on this > program is working also on old xen 4.0 where Jan Beulich patches to support > long operands are missing. > Are there any minimal program to test if SSE instructions with MMIO operands >> 8 byte are working?I don''t see the code there doing MMIO -- it''s just doing operations on normal RAM, which is not emulated by Xen at all, but executed natively by the processor. What you need is a program that will do this to an MMIO region -- that will be a much trickier thing to set up, I think. -George
Andrew Cooper
2013-Nov-21 15:22 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
On 21/11/13 15:12, George Dunlap wrote:> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: >> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >> effectively working. >> I tried this simple program to test SSE2: >> http://forum.nasm.us/index.php?topic=1605.0 >> But probably use only instructions with short operand because SSE2 on this >> program is working also on old xen 4.0 where Jan Beulich patches to support >> long operands are missing. >> Are there any minimal program to test if SSE instructions with MMIO operands >>> 8 byte are working? > I don''t see the code there doing MMIO -- it''s just doing operations on > normal RAM, which is not emulated by Xen at all, but executed natively > by the processor. > > What you need is a program that will do this to an MMIO region -- that > will be a much trickier thing to set up, I think. > > -GeorgeThe problem with SSE is only when the guest performs an SSE (or larger) operation on a piece of memory which ends up being emulated and handed to qemu. The ioreq protocol doesn''t have a way of signalling an operand width greater than 64 bits. All operations on regular RAM should be fine, and should have no Xen interception. ~Andrew
George Dunlap
2013-Nov-21 15:32 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
On 21/11/13 15:22, Andrew Cooper wrote:> On 21/11/13 15:12, George Dunlap wrote: >> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: >>> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >>> effectively working. >>> I tried this simple program to test SSE2: >>> http://forum.nasm.us/index.php?topic=1605.0 >>> But probably use only instructions with short operand because SSE2 on this >>> program is working also on old xen 4.0 where Jan Beulich patches to support >>> long operands are missing. >>> Are there any minimal program to test if SSE instructions with MMIO operands >>>> 8 byte are working? >> I don''t see the code there doing MMIO -- it''s just doing operations on >> normal RAM, which is not emulated by Xen at all, but executed natively >> by the processor. >> >> What you need is a program that will do this to an MMIO region -- that >> will be a much trickier thing to set up, I think. >> >> -George > The problem with SSE is only when the guest performs an SSE (or larger) > operation on a piece of memory which ends up being emulated and handed > to qemu. The ioreq protocol doesn''t have a way of signalling an operand > width greater than 64 bits.I''d like to emphasize the "and" in the first sentence. You might be able to trigger a Xen emulation in any number of ways (disabling HAP and then doing an SSE instruction on an in-use PT might do it). But Xen allegedly already does the actual emulation correctly -- as Andy said, it''s only the path to qemu that wasn''t working before. -George
Andrew Cooper
2013-Nov-21 15:38 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
On 21/11/13 15:32, George Dunlap wrote:> On 21/11/13 15:22, Andrew Cooper wrote: >> On 21/11/13 15:12, George Dunlap wrote: >>> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni >>> <fabio.fantoni@m2r.biz> wrote: >>>> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >>>> effectively working. >>>> I tried this simple program to test SSE2: >>>> http://forum.nasm.us/index.php?topic=1605.0 >>>> But probably use only instructions with short operand because SSE2 >>>> on this >>>> program is working also on old xen 4.0 where Jan Beulich patches to >>>> support >>>> long operands are missing. >>>> Are there any minimal program to test if SSE instructions with MMIO >>>> operands >>>>> 8 byte are working? >>> I don''t see the code there doing MMIO -- it''s just doing operations on >>> normal RAM, which is not emulated by Xen at all, but executed natively >>> by the processor. >>> >>> What you need is a program that will do this to an MMIO region -- that >>> will be a much trickier thing to set up, I think. >>> >>> -George >> The problem with SSE is only when the guest performs an SSE (or larger) >> operation on a piece of memory which ends up being emulated and handed >> to qemu. The ioreq protocol doesn''t have a way of signalling an operand >> width greater than 64 bits. > > I''d like to emphasize the "and" in the first sentence. You might be > able to trigger a Xen emulation in any number of ways (disabling HAP > and then doing an SSE instruction on an in-use PT might do it). But > Xen allegedly already does the actual emulation correctly -- as Andy > said, it''s only the path to qemu that wasn''t working before. > > -George >Oops yes - I should have emphasised that a bit more. I believe Jan submitted a hacked-fix for the qemu path which fixes the immediate issue (for 128bit emulation) but is in need of a redesign for wider emulation; 256bit is available with AVX, and 512bit is on its way with AVX2. As for testing individual instructions, there is tools/tests/x86_emulator/test_x86_emulator.c which tests a token few instructions against Xen''s emulation code, but it is far from comprehensive. ~Andrew
Jan Beulich
2013-Nov-22 11:39 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
>>> On 21.11.13 at 16:38, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Oops yes - I should have emphasised that a bit more. I believe Jan > submitted a hacked-fix for the qemu path which fixes the immediate issue > (for 128bit emulation) but is in need of a redesign for wider emulation;No, the fix was generic (breaking up things into at most 64-bit chunks, no matter what their original size - just like would happen at the bus level on real hardware).> 256bit is available with AVX, and 512bit is on its way with AVX2.That last one is AVX-512 (formerly AVX3); AVX2 doesn''t widen registers. Jan
Fabio Fantoni
2013-Nov-22 12:14 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
Il 21/11/2013 16:38, Andrew Cooper ha scritto:> On 21/11/13 15:32, George Dunlap wrote: >> On 21/11/13 15:22, Andrew Cooper wrote: >>> On 21/11/13 15:12, George Dunlap wrote: >>>> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni >>>> <fabio.fantoni@m2r.biz> wrote: >>>>> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >>>>> effectively working. >>>>> I tried this simple program to test SSE2: >>>>> http://forum.nasm.us/index.php?topic=1605.0 >>>>> But probably use only instructions with short operand because SSE2 >>>>> on this >>>>> program is working also on old xen 4.0 where Jan Beulich patches to >>>>> support >>>>> long operands are missing. >>>>> Are there any minimal program to test if SSE instructions with MMIO >>>>> operands >>>>>> 8 byte are working? >>>> I don''t see the code there doing MMIO -- it''s just doing operations on >>>> normal RAM, which is not emulated by Xen at all, but executed natively >>>> by the processor. >>>> >>>> What you need is a program that will do this to an MMIO region -- that >>>> will be a much trickier thing to set up, I think. >>>> >>>> -George >>> The problem with SSE is only when the guest performs an SSE (or larger) >>> operation on a piece of memory which ends up being emulated and handed >>> to qemu. The ioreq protocol doesn''t have a way of signalling an operand >>> width greater than 64 bits. >> I''d like to emphasize the "and" in the first sentence. You might be >> able to trigger a Xen emulation in any number of ways (disabling HAP >> and then doing an SSE instruction on an in-use PT might do it). But >> Xen allegedly already does the actual emulation correctly -- as Andy >> said, it''s only the path to qemu that wasn''t working before. >> >> -George >> > Oops yes - I should have emphasised that a bit more. I believe Jan > submitted a hacked-fix for the qemu path which fixes the immediate issue > (for 128bit emulation) but is in need of a redesign for wider emulation; > 256bit is available with AVX, and 512bit is on its way with AVX2. > > As for testing individual instructions, there is > tools/tests/x86_emulator/test_x86_emulator.c which tests a token few > instructions against Xen''s emulation code, but it is far from comprehensive. > > ~AndrewThanks for all replies. I tried x86_emulator on dom0 and SSE2 instructions seems ok:> ./test_x86_emulator > Testing addl %%ecx,(%%eax)... okay > Testing addl %%ecx,%%eax... okay > Testing xorl (%%eax),%%ecx... okay > Testing movl (%%eax),%%ecx... okay > Testing lock cmpxchgb %%cl,(%%ebx)... okay > Testing lock cmpxchgb %%cl,(%%ebx)... okay > Testing xchgl %%ecx,(%%eax)... okay > Testing lock cmpxchgl %%ecx,(%%ebx)... okay > Testing rep movsw... okay > Testing btrl $0x1,(%edi)... okay > Testing btrl %eax,(%edi)... okay > Testing cmpxchg8b (%edi) [succeeding]...okay > Testing cmpxchg8b (%edi) [failing]... okay > Testing movsxbd (%%eax),%%ecx... okay > Testing movzxwd (%%eax),%%ecx... okay > Testing movsxd (%%rax),%%rcx... okay > Testing xadd %%ax,(%%ecx)... okay > Testing dec %%ax... okay > Testing lea 8(%%ebp),%%eax... okay > Testing daa/das (all inputs)... skipped > Testing movq %mm3,(%ecx)... okay > Testing movq (%edx),%mm5... okay > Testing movdqu %xmm2,(%ecx)... okay > Testing movdqu (%edx),%xmm4... okay > Testing vmovdqu %ymm2,(%ecx)... skipped > Testing vmovdqu (%edx),%ymm4... skipped > Testing movsd %xmm5,(%ecx)... okay > Testing movaps (%edx),%xmm7... okay > Testing vmovsd %xmm5,(%ecx)... skipped > Testing vmovaps (%edx),%ymm7... skipped > Testing blowfish 32-bit code > sequence..................................okay > Testing blowfish 64-bit code sequence.................................okay > Testing blowfish native execution... okaySame result on linux hvm domUs. I''m trying to verify if SSE2 is fully working because on past Anthony''s debug about qxl problem on linux hvm domUs showed up an error on SSE2 instructions. After Jan Beulich patches these errors went away. Qxl on windows 7 pro 64 bit domUs with qxl driver installed, is working but has big performance problem on screen refresh, same of before Jan Beulich patches. Windows qxl driver code seems to use SSE2:> void CheckAndSetSSE2() > { > _asm > { > mov eax, 0x0000001 > cpuid > and edx, 0x4000000 > mov have_sse2, edx > } > > if (have_sse2) { > have_sse2 = TRUE; > } > }Time ago I tried to disable SSE from cpuid of xl cfg but windows not starts if I remember good. I don''t have knownledge about SSE but with fast search on qxl driver code I noticed there are other SSE instructions missed on xen x86_emulator test. Here a copy of 2 parts about from qxl driver code:> static _inline void fast_memcpy_unaligment(void *dest, const void > *src, size_t len) > { > _asm > { > mov ecx, len > mov esi, src > mov edi, dest > > cmp ecx, 128 > jb try_to_copy64 > > prefetchnta [esi] > copy_128: > prefetchnta [esi + 64] > > movdqu xmm0, [esi] > movdqu xmm1, [esi + 16] > movdqu xmm2, [esi + 32] > movdqu xmm3, [esi + 48] > > prefetchnta [esi + 128] > > movntdq [edi], xmm0 > movntdq [edi + 16], xmm1 > movntdq [edi + 32], xmm2 > movntdq [edi + 48], xmm3 > > movdqu xmm0, [esi + 64] > movdqu xmm1, [esi + 80] > movdqu xmm2, [esi + 96] > movdqu xmm3, [esi + 112] > > movntdq [edi + 64], xmm0 > movntdq [edi + 80], xmm1 > movntdq [edi + 96], xmm2 > movntdq [edi + 112], xmm3 > > add edi, 128 > add esi, 128 > sub ecx, 128 > cmp ecx, 128 > jae copy_128 > > try_to_copy64: > cmp ecx, 64 > jb try_to_copy32 > > movdqu xmm0, [esi] > movdqu xmm1, [esi + 16] > movdqu xmm2, [esi + 32] > movdqu xmm3, [esi + 48] > > movntdq [edi], xmm0 > movntdq [edi + 16], xmm1 > movntdq [edi + 32], xmm2 > movntdq [edi + 48], xmm3 > > add edi, 64 > add esi, 64 > sub ecx, 64 > prefetchnta [esi] > > try_to_copy32: > cmp ecx, 32 > jb try_to_copy16 > > movdqu xmm0, [esi] > movdqu xmm1, [esi + 16] > movntdq [edi], xmm0 > movntdq [edi + 16], xmm1 > > add edi, 32 > add esi, 32 > sub ecx, 32 > > try_to_copy16: > cmp ecx, 16 > jb try_to_copy4 > > movdqu xmm0, [esi] > movntdq [edi], xmm0 > > add edi, 16 > add esi, 16 > sub ecx, 16 > > > try_to_copy4: > cmp ecx, 4 > jb try_to_copy_1 > movsd > sub ecx, 4 > jmp try_to_copy4 > > try_to_copy_1: > rep movsb > > sfence > } > }> static _inline void SaveFPU(PDev *pdev, UINT8 FPUSave[]) > { > void *align_addr = (void *)ALIGN((size_t)(FPUSave), SSE_ALIGN); > > _asm > { > mov edi, align_addr > > movdqa [edi], xmm0 > movdqa [edi + 16], xmm1 > movdqa [edi + 32], xmm2 > movdqa [edi + 48], xmm3 > } > }Is it possible to know if these instructions are currently full working on hvm domUs please? About linux domUs with qxl tests details of latest debug is here if you want see it also: http://lists.xen.org/archives/html/xen-devel/2013-10/msg00016.html Gave this error but I don''t know if is related:> ioremap error for 0xfc001000-0xfc002000, requested 0x10, got 0x0Thanks for any reply.
Jan Beulich
2013-Nov-22 12:16 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
>>> On 22.11.13 at 13:14, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: > Il 21/11/2013 16:38, Andrew Cooper ha scritto: >> On 21/11/13 15:32, George Dunlap wrote: >>> On 21/11/13 15:22, Andrew Cooper wrote: >>>> On 21/11/13 15:12, George Dunlap wrote: >>>>> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni >>>>> <fabio.fantoni@m2r.biz> wrote: >>>>>> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >>>>>> effectively working. >>>>>> I tried this simple program to test SSE2: >>>>>> http://forum.nasm.us/index.php?topic=1605.0 >>>>>> But probably use only instructions with short operand because SSE2 >>>>>> on this >>>>>> program is working also on old xen 4.0 where Jan Beulich patches to >>>>>> support >>>>>> long operands are missing. >>>>>> Are there any minimal program to test if SSE instructions with MMIO >>>>>> operands >>>>>>> 8 byte are working? >>>>> I don''t see the code there doing MMIO -- it''s just doing operations on >>>>> normal RAM, which is not emulated by Xen at all, but executed natively >>>>> by the processor. >>>>> >>>>> What you need is a program that will do this to an MMIO region -- that >>>>> will be a much trickier thing to set up, I think. >>>>> >>>>> -George >>>> The problem with SSE is only when the guest performs an SSE (or larger) >>>> operation on a piece of memory which ends up being emulated and handed >>>> to qemu. The ioreq protocol doesn''t have a way of signalling an operand >>>> width greater than 64 bits. >>> I''d like to emphasize the "and" in the first sentence. You might be >>> able to trigger a Xen emulation in any number of ways (disabling HAP >>> and then doing an SSE instruction on an in-use PT might do it). But >>> Xen allegedly already does the actual emulation correctly -- as Andy >>> said, it''s only the path to qemu that wasn''t working before. >>> >>> -George >>> >> Oops yes - I should have emphasised that a bit more. I believe Jan >> submitted a hacked-fix for the qemu path which fixes the immediate issue >> (for 128bit emulation) but is in need of a redesign for wider emulation; >> 256bit is available with AVX, and 512bit is on its way with AVX2. >> >> As for testing individual instructions, there is >> tools/tests/x86_emulator/test_x86_emulator.c which tests a token few >> instructions against Xen''s emulation code, but it is far from comprehensive. >> >> ~Andrew > > Thanks for all replies. > I tried x86_emulator on dom0 and SSE2 instructions seems ok:Sure. Which means you still didn''t understand that you need to try with SSE instructions accessing MMIO where the emulation happens in qemu. Jan
Fabio Fantoni
2013-Nov-22 13:51 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
Il 22/11/2013 13:16, Jan Beulich ha scritto:>>>> On 22.11.13 at 13:14, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: >> Il 21/11/2013 16:38, Andrew Cooper ha scritto: >>> On 21/11/13 15:32, George Dunlap wrote: >>>> On 21/11/13 15:22, Andrew Cooper wrote: >>>>> On 21/11/13 15:12, George Dunlap wrote: >>>>>> On Thu, Nov 21, 2013 at 10:52 AM, Fabio Fantoni >>>>>> <fabio.fantoni@m2r.biz> wrote: >>>>>>> I''m trying to test if on newer xen all SSE2 and SSE3 instructions are >>>>>>> effectively working. >>>>>>> I tried this simple program to test SSE2: >>>>>>> http://forum.nasm.us/index.php?topic=1605.0 >>>>>>> But probably use only instructions with short operand because SSE2 >>>>>>> on this >>>>>>> program is working also on old xen 4.0 where Jan Beulich patches to >>>>>>> support >>>>>>> long operands are missing. >>>>>>> Are there any minimal program to test if SSE instructions with MMIO >>>>>>> operands >>>>>>>> 8 byte are working? >>>>>> I don''t see the code there doing MMIO -- it''s just doing operations on >>>>>> normal RAM, which is not emulated by Xen at all, but executed natively >>>>>> by the processor. >>>>>> >>>>>> What you need is a program that will do this to an MMIO region -- that >>>>>> will be a much trickier thing to set up, I think. >>>>>> >>>>>> -George >>>>> The problem with SSE is only when the guest performs an SSE (or larger) >>>>> operation on a piece of memory which ends up being emulated and handed >>>>> to qemu. The ioreq protocol doesn''t have a way of signalling an operand >>>>> width greater than 64 bits. >>>> I''d like to emphasize the "and" in the first sentence. You might be >>>> able to trigger a Xen emulation in any number of ways (disabling HAP >>>> and then doing an SSE instruction on an in-use PT might do it). But >>>> Xen allegedly already does the actual emulation correctly -- as Andy >>>> said, it''s only the path to qemu that wasn''t working before. >>>> >>>> -George >>>> >>> Oops yes - I should have emphasised that a bit more. I believe Jan >>> submitted a hacked-fix for the qemu path which fixes the immediate issue >>> (for 128bit emulation) but is in need of a redesign for wider emulation; >>> 256bit is available with AVX, and 512bit is on its way with AVX2. >>> >>> As for testing individual instructions, there is >>> tools/tests/x86_emulator/test_x86_emulator.c which tests a token few >>> instructions against Xen''s emulation code, but it is far from comprehensive. >>> >>> ~Andrew >> Thanks for all replies. >> I tried x86_emulator on dom0 and SSE2 instructions seems ok: > Sure. Which means you still didn''t understand that you need to > try with SSE instructions accessing MMIO where the emulation > happens in qemu. > > Jan >I have understood that this test needs to be tried inside hvm domUs (as I have posted in my previous mail):> Same result on linux hvm domUs.What I do not understand if is there is a test to accessing MMIO with SEE2 on x86_emulator and if not, an alternate way to make this check. Thanks for any reply.
Jan Beulich
2013-Nov-22 14:23 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
>>> On 22.11.13 at 14:51, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: > What I do not understand if is there is a test to accessing MMIO with > SEE2 on x86_emulator and if not, an alternate way to make this check.You''d need to write something like this, I''m afraid. Jan
George Dunlap
2013-Nov-22 14:37 UTC
Re: Test if on newer xen all SSE2 and SSE3 instructions are effectively working
On 22/11/13 14:23, Jan Beulich wrote:>>>> On 22.11.13 at 14:51, Fabio Fantoni <fabio.fantoni@m2r.biz> wrote: >> What I do not understand if is there is a test to accessing MMIO with >> SEE2 on x86_emulator and if not, an alternate way to make this check. > You''d need to write something like this, I''m afraid.I guess the question is, why do you think it may not work? -George