search for: mxcsr

Displaying 20 results from an estimated 51 matches for "mxcsr".

2014 Jan 28
2
[LLVMdev] ldmxcsr reordering issue
Hi, I met troubles with jitting x86 codes when using Intrinsic::x86_sse_ldmxcsr. The target code must execute some SSE2 instruction with DAZ/FTZ modes enabled and others with DAZ/FTZ disabled. I'm trying to get this by emitting LDMXCSR instructions with proper flag words. It appeared however that execution engine sometimes reorders these instructions with computational one...
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
The actual purpose that I wanted such an intrinsic is to solve a problem similar to this one in X86. Say I wanted to read the "mxcsr" register(which is the status register for SSE instructions) after a particular instruction, then I need a kind of barrier intrinsic which will not allow the arithmetic instructions to move around it. Or else I will be reading the status of some other instruction. Thanks -------------- next p...
2017 Feb 14
2
Adding FP environment register modeling for constrained FP nodes
...g to piece together how to get the set of nodes to be updated from the SelectionDAG to the InstrEmitter. I’m still learning my way around this code. In any event, I can confirm that for X86 targets the control register uses are not currently modeled. I just committed a patch yesterday adding the MXCSR register and updating the instructions that directly read and write it (but still implicitly so). I suppose you are correct that there is no reason not to add uses of that register to the instructions that derive their rounding behavior from it and then the constrained FP intrinsics will just need...
2007 Dec 17
5
[PATCH 0/21] Integrate processor.h
Hi, This series integrate the processor.h header. There are a lot of things that are deeply architectural differences between architectures, but I've done my best to come to a settlement. With this series, I am very close to have selectable paravirt for x86_64, It applies ontop of today's x86 git, mm branch.
2007 Dec 17
5
[PATCH 0/21] Integrate processor.h
Hi, This series integrate the processor.h header. There are a lot of things that are deeply architectural differences between architectures, but I've done my best to come to a settlement. With this series, I am very close to have selectable paravirt for x86_64, It applies ontop of today's x86 git, mm branch.
2015 Aug 21
2
The semantics of the fptrunc instruction with an example of incorrect optimisation
...double 3.000000e-01, double* %x, align 8 %call = call i32 @fesetround(i32 0) #3 %0 = load double, double* %x, align 8 %conv = fptrunc double %0 to float .... ``` If I look at the codegened assembly I see that the ``cvtsd2ss`` x86 instruction is used (how rounding is done is controlled by the MXCSR register apparently). So this instruction might not "truncate" depending on how MXCSR is set. If I run the program ``` $ clang -O0 float.c -lm -o float.clang.o0 $ ./float.clang.o0 y (nearest):0x1.333334p-2 y (upward):0x1.333334p-2 y (downward):0x1.333332p-2 ``` I can see that the last...
2006 Apr 19
0
[LLVMdev] floating point exception and SSE2 instructions
...showed up on numpy-discussion: http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 """ #include <fenv.h> void feclearexcept(int ex) This function should clear the specified exception status bits in the FPU status register. For CPUs with SSE support it should also clear the MXCSR status register bits. The problem is that feclearexcept() clears the status control bits also, causing future floating-point errors to generate interrupts which will lead to a SIGFPE signal which terminates the program (unless caught by a SIGFPE handler). """ Is there a way I can d...
2011 Jul 09
1
[LLVMdev] LLVM floating point rounding modes
Hi, I am not sure if this is the right mailing list to ask my question, if not, please refer me to the proper one. Is there any support for rounding modes in LLVM floating point? I looked in the assembler reference manual, and it doesn't seem so. I am thinking about choosing LLVM as one of the backends for my programming language Babel-17 (www.babel-17.com). Babel-17 features interval
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
On Tue, 18 Apr 2006 23:27:39 -0700 Evan Cheng <evan.cheng at apple.com> wrote: > Hi Simon, > > The x86 backend does generate scalar SSE2 instructions. For your > example, it should emit something like: Oh, how did you get this ? [...] > > There is nothing here that should cause an exception. Are you using a > release or cvs? CVS. >From what I remember,
2011 Apr 06
1
[LLVMdev] Adding scheduling constraints to intrinsics
Hi, I am working on fixing a bug in the x86 codegen and I need help in adding a new type of scheduling constraints. The bug I am fixing is related to SSE instruction scheduling. SSE instructions use the "mxcsr" register for selecting the desired rounding mode. This control register is set/read by an intrinsic. Currently, this intrinsic has no scheduling deps and SSE instructions are scheduled freely before and after calls to this register. When working on this I noticed a case where an SSE instru...
2008 Feb 29
10
[PATCH] [RFC] More fp instructions for realmode emulation (Enables booting OS/2 as a HVM guest on Intel/VT hardware)
This patch adds a number of fp instructions needed for OS/2 to boot as a HVM guest on Intel/VT hardware. It appears to work fine, and OS/2 is now finally working on Intel/VT as well as AMD/SVM. I''m a little concerned about the "correctness" of the FSTSW emulation and the use of inline assembly directly using the corresponding ops for emulation. Wrt FSTSW, it is really two ops
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...t* %1, align 4 >> %3 = fptoui float %2 to i64 >> ret i64 %3 >> } >> >> GCC performs a comparison with ucomiss and branches whereas Clang computes both forms and predicates the result using a conditional move. One of the conversions obviously is setting the INEXACT MXCSR flag. >> >> Clang lowering (inexact set when result is exact): >> >> fcvt_lu(float): >> movss xmm1, dword ptr [rip + .LCPI1_0] # xmm1 = mem[0],zero,zero,zero >> movaps xmm2, xmm0 >> subss xmm2, xmm1 >> cvttss2si...
2017 Apr 20
4
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...ormations that violate the needs of FENV_ACCESS when doing so can improve the performance of generated code. Basically, we more or less pretend that floating point status bits don’t exist (at least before you get to the target-specific backend). You’ll find that the X86 backend doesn’t even model MXCSR at the moment. I tried to add it recently and it kind of blew up before I had even modeled it for anything other than LDMXCSR and STMCXSR. We may want to address that at some point, but right now it just isn’t there. When we discussed how FENV_ACCESS support should be implemented, Chandler propo...
2009 Sep 23
1
High CPU usage
...locate the exceptions and added VERY_SMALLs where they seem to fit well. Although I got CPU usage as low as 10%, I seriously lack knowledge of how things work inside speex. So just changing some code is not the best idea for me. My second attempt was to follow Jeff's suggestion to modify the MXCSR register and recompile with _USE_SSE. This works very well (CPU < 3%). However I would still prefer the first method (VERY_SMALL) because not all CPUs, my app is going to run on, have the SSE instruction set available. Hopefully someone with more insight is able to fix this some day :-) Thanks...
2009 Dec 22
1
fsc: pxelinux.0 not working
...=0008 00000580 00000067 00008900 DPL=0 TSS32-avl GDT= 0000a500 0000002f IDT= 00002808 000007ff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 CCS=00000010 CCD=0011ffec CCO=ADDL FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000...
2006 Apr 19
2
[LLVMdev] floating point exception and SSE2 instructions
...ources.redhat.com/bugzilla/show_bug.cgi?id=10 > > """ > #include <fenv.h> > void feclearexcept(int ex) > > This function should clear the specified exception status bits in the > FPU status register. For CPUs with SSE support it should also clear the > MXCSR status register bits. > > The problem is that feclearexcept() clears the status control bits also, > causing future floating-point errors to generate interrupts which will > lead to a SIGFPE signal which terminates the program (unless caught by a > SIGFPE handler). > ""&...
2007 Oct 25
2
linux.c32 doesn't work
Hi! The linux.c32 module doesn't boot my kernel, while pxelinux itself does: kernel vmlinuz-2.6.23 append initrd=initrd.img-2.6.23 root=/dev/mapper/root works, while kernel linux.c32 append vmlinuz-2.6.23 initrd=initrd.img-2.6.23 root=/dev/mapper/root does not: it stays sitting after loading the initrd without any output. Version 3.50-pre2 worked, current git does not work.
2014 Sep 30
2
[LLVMdev] Behaviour of NVPTX intrinsic
...solve your code motion issue. > > Jingyue > > On Tue Sep 30 2014 at 11:03:45 AM RAVI KORSA <ravi.korsa at gmail.com> wrote: > >> The actual purpose that I wanted such an intrinsic is to solve a problem >> similar to this one in X86. Say I wanted to read the "mxcsr" register(which >> is the status register for SSE instructions) after a particular >> instruction, then I need a kind of barrier intrinsic which will not allow >> the arithmetic instructions to move around it. Or else I will be reading >> the status of some other instru...
2019 Sep 16
3
Handling of FP denormal values
...tain whether it is intended to control the target hardware or just the optimizer. In addition, when either -Ofast or -ffast-math is used, we attempt to link 'crtfastmath.o' if it can be found. For X86 targets, this object file adds a static constructor that sets the DAZ and FTZ bits of the MXCSR register. I expect that it has analogous behavior for other architectures when it is available. This object file is typically available on Linux systems, possibly also with things like MinGW. If it isn't found, the denomral control flags will be left in their default state. There is also a CUD...
2009 May 25
4
Crash with core32 (syslinux-3.81-pre12-68-g4a211f6)
...0 00008200 TR =0008 00000580 00000067 00008900 GDT= 0000b050 0000002f IDT= 00002800 000007ff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 CCS=00000044 CCD=00000000 CCO=EFLAGS FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000...