Hey guys, I'm working on a bug for x86-64 in LLVM 2.9. Well, it's actually two issues. The assembly generated for large stack offsets has an overflow; And, once the overflow is fixed, the displacement is too large for GNU ld to handle it. void fool( int long n ) { double w[268435600]; double z[268435600]; unsigned long i; for ( i = 0; i < n; i++ ) { w[i] = 1.0; z[i] = 2.0; } printf(" n: %lld, W %g Z %g\n", n, w[1], z[1] ); } Here's one of the offending instructions produced by 2.9: movsd -2147482472(%rsp), %xmm0 Fixing the displacement overflow is pretty easy. It's just a matter of changing a few variable types in LLVM from unsigned to uint64_t in the functions that calculate the stack offsets. The real trouble I'm having is finding a good place to break up the displacements during lowering. I would like the offset to be calculated similar to gcc: movabsq $-4294969640, %rdx movsd 0(%rbp,%rdx), %xmm0 Any suggestions on the correct lowering pass to do a transformation like this? I'm an LLVM noob, so I'm not sure where it should go. Tx, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110926/698907db/attachment.html>
To be pedantic... use of the frame pointer isn't necessary. The stack pointer would be fine. That's just how GCC calculates the offset for this test case. On Monday, September 26, 2011, Cameron McInally <cameron.mcinally at nyu.edu> wrote:> Hey guys, > > I'm working on a bug for x86-64 in LLVM 2.9. Well, it's actually twoissues. The assembly generated for large stack offsets has an overflow; And, once the overflow is fixed, the displacement is too large for GNU ld to handle it.> > void fool( int long n ) > { > double w[268435600]; > double z[268435600]; > unsigned long i; > for ( i = 0; i < n; i++ ) { > w[i] = 1.0; > z[i] = 2.0; > } > printf(" n: %lld, W %g Z %g\n", n, w[1], z[1] ); > } > > Here's one of the offending instructions produced by 2.9: > > movsd -2147482472 <tel:2147482472>(%rsp), %xmm0 > > Fixing the displacement overflow is pretty easy. It's just a matter ofchanging a few variable types in LLVM from unsigned to uint64_t in the functions that calculate the stack offsets. The real trouble I'm having> is finding a good place to break up the displacements during lowering. Iwould like the offset to be calculated similar to gcc:> > movabsq $-4294969640, %rdx > movsd 0(%rbp,%rdx), %xmm0 > > Any suggestions on the correct lowering pass to do a transformation likethis? I'm an LLVM noob, so I'm not sure where it should go.> > Tx, > Cameron-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110926/ded4cdc4/attachment.html>
On Sep 26, 2011, at 12:02 PM, Cameron McInally wrote:> > Here's one of the offending instructions produced by 2.9: > > movsd -2147482472(%rsp), %xmm0 > > Fixing the displacement overflow is pretty easy. It's just a matter of changing a few variable types in LLVM from unsigned to uint64_t in the functions that calculate the stack offsets. The real trouble I'm having > is finding a good place to break up the displacements during lowering. I would like the offset to be calculated similar to gcc: > > movabsq $-4294969640, %rdx > movsd 0(%rbp,%rdx), %xmm0 > > Any suggestions on the correct lowering pass to do a transformation like this? I'm an LLVM noob, so I'm not sure where it should go.Hi Cameron, As you have noticed, the x86 backend only supports stack frames up to 2GB. Fixing that would require the x86 backend to use the register scavenger during prolog epilog insertion like the ARM backend does. That particular code was very difficult to get right, and no one has thought it was worth the trouble to get it working for x86. Your life will be a whole lot easier if you just use malloc(). /jakob
Jakob Stoklund Olesen <stoklund at 2pi.dk> writes: Hi Jakob, Thanks for the responses.> As you have noticed, the x86 backend only supports stack frames up to 2GB.That's unfortunate. :(> Fixing that would require the x86 backend to use the register > scavenger during prolog epilog insertion like the ARM backend does.Makes sense.> That particular code was very difficult to get right, and no one has > thought it was worth the trouble to get it working for x86.I wouldn't imagine so, since these kinds of large stack objects are rather rare in the C world. They are somewhat more common in the Fortran world. :)> Your life will be a whole lot easier if you just use malloc().Perhaps. This is customer-written code and they will (probably) not be willing to change it. We could replace the allocas with malloc/free under the hood but we haven't needed to do that on past platforms. It's certainly a mildly large change in our compiler in the sense of how resources get allocated. It is certainly doable but for various reasons may be undesirable. Do you have a feel for the complexity involved with the ARM code? What were the troublesome parts and corner cases, etc.? -Dave
Possibly Parallel Threads
- [LLVMdev] x86-64 large stack offsets
- [LLVMdev] simple way to print disassembly of final code from jit?
- [LLVMdev] simple way to print disassembly of final code from jit?
- [LLVMdev] simple way to print disassembly of final code from jit?
- [LLVMdev] Suboptimal code due to excessive spilling