Dale Martin
2014-Nov-26 17:10 UTC
[LLVMdev] crash with large structure values on the stack
Hello, This example input crashes if you run it through llc on x86. [begin example] ; ModuleID = 'test' %struct_2 = type { [90000 x %struct_1] } %struct_1 = type { i8 } define void @testFcn(%struct_2 %in1) { testFcn_entry: %in1_ = alloca %struct_2 store %struct_2 %in1, %struct_2* %in1_, align 8 %localStruct_ = alloca %struct_2 store %struct_2 %in1, %struct_2* %localStruct_, align 8 br label %exit exit: ; preds = %testFcn_entry ret void } [end example] It looks like at some stage of the backend compiler flow there is a "merge_values" instruction generated where the number of inputs exceeds 16k, but the number of inputs is stored in an unsigned short. When this instruction is being translated into x86 machine code, then there is an out of bounds access: ~> llc bug-simple.bc llc: /local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649: llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo < NumValues && "Illegal result number!"' failed. Probably the truncation of NumOperands should be caught directly with an assertion in the SDNode constructor. One other interesting aspect of it is that if you make the struct_2 type a smaller matrix, like: %struct_2 = type { [65534 x %struct_1] } Then you don't get a crash, instead it takes about 20+ minutes to process and you get a lot of movb instructions out - like 200k or so. Callgrind says that 33% of the time consumed is directly in llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear for this case. I looked at what clang is doing for a similar construct and I see it generates a memcpy intrinsic instead of the direct load/store. I'm now doing that in my own front-end, but it seems like this is intended to be supported so I thought I'd report it. It seems like turning this construct into memcpy as an optimization or part of the back-end lowering might be a better long-term approach. I filed this as bug #21671 in bugzilla. Let me know if you'd like more information or if I can help come up with a fix. (I'm not familiar at all with the backend so I'd need some guidance to be of much help.) Thanks, Dale Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141126/c969bd11/attachment.html>
Reid Kleckner
2014-Nov-26 17:37 UTC
[LLVMdev] crash with large structure values on the stack
While it would be good to get this bug fixed, I want to point out that frontends are generally discouraged from emitting large aggregate loads and stores. It's considered "more canonical" for the frontend to use @llvm.memcpy to move this stuff around, and then load the individual elements as needed. See past discussions with "daedal nix" about optimizer problems in this area. On Wed, Nov 26, 2014 at 9:10 AM, Dale Martin <Dale.Martin at mathworks.com> wrote:> Hello, > > This example input crashes if you run it through llc on x86. > > [begin example] > ; ModuleID = 'test' > > %struct_2 = type { [90000 x %struct_1] } > %struct_1 = type { i8 } > > define void @testFcn(%struct_2 %in1) { > testFcn_entry: > %in1_ = alloca %struct_2 > store %struct_2 %in1, %struct_2* %in1_, align 8 > %localStruct_ = alloca %struct_2 > store %struct_2 %in1, %struct_2* %localStruct_, align 8 > br label %exit > > exit: ; preds > %testFcn_entry > ret void > } > [end example] > > It looks like at some stage of the backend compiler flow there is a > "merge_values" instruction generated where the number of inputs exceeds > 16k, but the number of inputs is stored in an unsigned short. When this > instruction is being translated into x86 machine code, then there is an out > of bounds access: > ~> llc bug-simple.bc > llc: > /local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649: > llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo > < NumValues && "Illegal result number!"' failed. > > Probably the truncation of NumOperands should be caught directly with an > assertion in the SDNode constructor. > > One other interesting aspect of it is that if you make the struct_2 type a > smaller matrix, like: > %struct_2 = type { [65534 x %struct_1] } > > Then you don't get a crash, instead it takes about 20+ minutes to > process and you get a lot of movb instructions out - like 200k or so. > Callgrind says that 33% of the time consumed is directly in > llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear > for this case. > > I looked at what clang is doing for a similar construct and I see it > generates a memcpy intrinsic instead of the direct load/store. I'm now > doing that in my own front-end, but it seems like this is intended to be > supported so I thought I'd report it. It seems like turning this construct > into memcpy as an optimization or part of the back-end lowering might be a > better long-term approach. > > I filed this as bug #21671 in bugzilla. > > Let me know if you'd like more information or if I can help come up with > a fix. (I'm not familiar at all with the backend so I'd need some guidance > to be of much help.) > > Thanks, > Dale Martin > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141126/bb2404d6/attachment.html>
David Majnemer
2014-Nov-28 07:44 UTC
[LLVMdev] crash with large structure values on the stack
I believe this is the same as http://llvm.org/bugs/show_bug.cgi?id=21513 I had a dumb fix for this which simply bumped up the size of SDNode::NumOperands and SDNode::NumValues. Hal had very reasonable reservations with this approach because it increases the size of a structure which is used a lot. On Wed, Nov 26, 2014 at 12:37 PM, Reid Kleckner <rnk at google.com> wrote:> While it would be good to get this bug fixed, I want to point out that > frontends are generally discouraged from emitting large aggregate loads and > stores. It's considered "more canonical" for the frontend to use > @llvm.memcpy to move this stuff around, and then load the individual > elements as needed. See past discussions with "daedal nix" about optimizer > problems in this area. > > On Wed, Nov 26, 2014 at 9:10 AM, Dale Martin <Dale.Martin at mathworks.com> > wrote: > >> Hello, >> >> This example input crashes if you run it through llc on x86. >> >> [begin example] >> ; ModuleID = 'test' >> >> %struct_2 = type { [90000 x %struct_1] } >> %struct_1 = type { i8 } >> >> define void @testFcn(%struct_2 %in1) { >> testFcn_entry: >> %in1_ = alloca %struct_2 >> store %struct_2 %in1, %struct_2* %in1_, align 8 >> %localStruct_ = alloca %struct_2 >> store %struct_2 %in1, %struct_2* %localStruct_, align 8 >> br label %exit >> >> exit: ; preds >> %testFcn_entry >> ret void >> } >> [end example] >> >> It looks like at some stage of the backend compiler flow there is a >> "merge_values" instruction generated where the number of inputs exceeds >> 16k, but the number of inputs is stored in an unsigned short. When this >> instruction is being translated into x86 machine code, then there is an out >> of bounds access: >> ~> llc bug-simple.bc >> llc: >> /local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649: >> llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo >> < NumValues && "Illegal result number!"' failed. >> >> Probably the truncation of NumOperands should be caught directly with >> an assertion in the SDNode constructor. >> >> One other interesting aspect of it is that if you make the struct_2 type >> a smaller matrix, like: >> %struct_2 = type { [65534 x %struct_1] } >> >> Then you don't get a crash, instead it takes about 20+ minutes to >> process and you get a lot of movb instructions out - like 200k or so. >> Callgrind says that 33% of the time consumed is directly in >> llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear >> for this case. >> >> I looked at what clang is doing for a similar construct and I see it >> generates a memcpy intrinsic instead of the direct load/store. I'm now >> doing that in my own front-end, but it seems like this is intended to be >> supported so I thought I'd report it. It seems like turning this construct >> into memcpy as an optimization or part of the back-end lowering might be a >> better long-term approach. >> >> I filed this as bug #21671 in bugzilla. >> >> Let me know if you'd like more information or if I can help come up >> with a fix. (I'm not familiar at all with the backend so I'd need some >> guidance to be of much help.) >> >> Thanks, >> Dale Martin >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141128/3d80cb46/attachment.html>