Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Bug in X86CompilationCallback_SSE"
2009 Mar 11
4
[LLVMdev] Bug in X86CompilationCallback_SSE
I don't know how to file a PR, but I have a patch (see below), that
should work regardless of abi differences, since it relies on the
compiler to do the though job.
void X86CompilationCallback_SSE(void) {
char * SAVEBUF= (char*) alloca(64+12); // alloca is 16byte aligned
asm volatile (
"movl %%eax,(%0)\n"
"movl %%edx,4(%0)\n" // Save EAX/EDX/ECX
2009 Mar 11
0
[LLVMdev] Bug in X86CompilationCallback_SSE
Hello, Corrado
> Before you can correctly invoke a function via the Procedure Linkage
> Table (plt), the ABI mandates that ebx is pointing to the GOT (Global
> Offset Table) (see http://www.greyhat.ch/lab/downloads/pic.html)
This is known issue, just nobody realized, that we have bunch of non-
PIC-aware assembler code. :) Fixing would be not so trivial though,
mostly due to ABI
2009 Mar 12
0
[LLVMdev] Bug in X86CompilationCallback_SSE
On Mar 11, 2009, at 2:39 PM, Corrado Zoccolo wrote:
> I don't know how to file a PR, but I have a patch (see below), that
> should work regardless of abi differences, since it relies on the
> compiler to do the though job.
>
> void X86CompilationCallback_SSE(void) {
> char * SAVEBUF= (char*) alloca(64+12); // alloca is 16byte aligned
How do you ensure it's 16-byte
2009 Mar 12
0
[LLVMdev] Bug in X86CompilationCallback_SSE
This looks like an interesting idea. As written, the inline asms
aren't safe
though; they reference %eax, %edx, etc. without declaring such things in
constraints, so the compiler wouldn't know that it can't clobber those
registers.
Dan
On Mar 11, 2009, at 2:39 PM, Corrado Zoccolo wrote:
> I don't know how to file a PR, but I have a patch (see below), that
> should work
2009 Aug 18
0
[LLVMdev] Build issues on Solaris
Hello, Nathan
> or if it should be a configure test, which might be safer. Are there
> any x86 platforms (other than apple) that don't need PLT-indirect calls?
Yes, mingw. However just tweaking the define is not enough - we're not
loading address of GOT into ebx before the call (on 32 bit ABIs) thus
the call will be to nowhere.
--
With best regards, Anton Korobeynikov
Faculty of
2009 Aug 11
6
[LLVMdev] Build issues on Solaris
Hi all,
I've encountered a couple of minor build issues on Solaris that
have crept in since 2.5, fixes below:
1. In lib/Target/X86/X86JITInfo.cpp, there is:
// Check if building with -fPIC
#if defined(__PIC__) && __PIC__ && defined(__linux__)
#define ASMCALLSUFFIX "@PLT"
#else
#define ASMCALLSUFFIX
#endif
Which causes a link failure due to the non-PLT
2009 Aug 25
2
[LLVMdev] Build issues on Solaris
On 19/08/2009, at 4:00 AM, Anton Korobeynikov wrote:
> Hello, Nathan
>
>> or if it should be a configure test, which might be safer. Are there
>> any x86 platforms (other than apple) that don't need PLT-indirect
>> calls?
> Yes, mingw. However just tweaking the define is not enough - we're not
Ok, so configure might be the way to go then, maybe something
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Sat, Jul 7, 2012 at 12:25 AM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>>> [...]
>>> movaps 32(%rdi), %xmm3
>>> movaps 48(%rdi), %xmm2
>>>
2004 Oct 18
3
[LLVMdev] Fix for non-standard variable length array + Visual C X86 specific code
Paolo Invernizzi wrote:
> There was a similar problem some time ago, and was resolved with alloca.
> I think it's a better solution to use the stack instead of the heap...
I tend to agree, but the constructors won't get called if it's an object
array -- anyway, this particular case there was no objects, just
pointers and bools so alloca should be fine. I'll leave it to
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Fri, Jul 6, 2012 at 6:39 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
>
>> I've noticed that LLVM tends to generate suboptimal code and spill an
>> excessive amount of registers in large functions, such as in those
>> that are automatically generated by FFTW.
>
2012 Jul 06
0
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
On Jul 5, 2012, at 9:06 PM, Anthony Blake <amb33 at cs.waikato.ac.nz> wrote:
> I've noticed that LLVM tends to generate suboptimal code and spill an
> excessive amount of registers in large functions, such as in those
> that are automatically generated by FFTW.
One problem might be that we're forcing the 16 stores to the out array to happen in source order, which
2012 Jul 06
2
[LLVMdev] Excessive register spilling in large automatically generated functions, such as is found in FFTW
Hi,
I've noticed that LLVM tends to generate suboptimal code and spill an
excessive amount of registers in large functions, such as in those
that are automatically generated by FFTW.
LLVM generates good code for a function that computes an 8-point
complex FFT, but from 16-point upwards, icc or gcc generates much
better code. Here is an example of a sequence of instructions from a
32-point
2008 Aug 06
2
[LLVMdev] crash in JIT when running the inliner
Hi,
Today I've been trying to debug a weird bug that makes JIT crash with
certain code and when using the inliner. This may sound weird, but if I
disable the inliner, it doesn't crash.
I include an example gdb dump below. Does something looks wrong? Do you
think it's a bug in JIT or it's just some other piece of code that is
writing on the JIT memory?.. I don't really know
2010 May 11
2
[LLVMdev] How does SSEDomainFix work?
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x,
2010 May 11
0
[LLVMdev] How does SSEDomainFix work?
On May 10, 2010, at 9:07 PM, NAKAMURA Takumi wrote:
> Hello. This is my 1st post.
ようこそ!
> I have tried SSE execution domain fixup pass.
> But I am not able to see any improvements.
Did you actually measure runtime, or did you look at assembly?
> I expect for the example below to use MOVDQA, PAND &c.
> (On nehalem, ANDPS is extremely slower than PAND)
Are you sure? The
2010 Jan 22
0
[LLVMdev] Exception handling question
Interesting. Was this the reason you were getting the recursive compilation error in JIT::runJITOnFunctionUnlocked(...) (isAlreadyCodeGenerating)?
Do you have the time to try your test with 2.7?
Garrison
On Jan 22, 2010, at 17:37, James Williams wrote:
> I've worked around this issue in my test case by simply calling my personality function on program to ensure it's JIT'ed
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael,
Thanks for the legwork!
It appears that the stats you listed are for movaps [SSE], not vmovaps
[AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256),
since they are both AVX instructions. Although, yes, I agree that this is
not clear from Agner's report. Please correct me if I am misunderstanding.
As I am sure you are aware, we cannot use SSE (movaps)
2013 Jul 19
0
[LLVMdev] llvm.x86.sse2.sqrt.pd not using sqrtpd, calling a function that modifies ECX
(Changing subject line as diagnosis has changed)
I'm attaching the compiled code that I've been getting, both with
CodeGenOpt::Default and CodeGenOpt::None . The crash isn't occurring
with CodeGenOpt::None, but that seems to be because ECX isn't being used
- it still gets set to 0x7fffffff by one of the calls to 76719BA1
I notice that X86::SQRTPD[m|r] appear in
2013 Jul 19
4
[LLVMdev] SIMD instructions and memory alignment on X86
Hmm, I'm not able to get those .ll files to compile if I disable SSE and I
end up with SSE instructions(including sqrtpd) if I don't disable it.
On Thu, Jul 18, 2013 at 10:53 PM, Peter Newman <peter at uformia.com> wrote:
> Is there something specifically required to enable SSE? If it's not
> detected as available (based from the target triple?) then I don't think
2013 Aug 22
2
New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
libFLAC have three SSE-accelerated functions FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_N (N = 4, 8, 12). They require lpc_order less than N.
The best compression preset (flac -8) uses lpc_order up to 12; it means that during encoding FLAC also uses unaccelerated C function.
I'm not very familiar with asm so I took FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_12, changed it and