thr3ads.net - llvm dev - [LLVMdev] crash with large structure values on the stack [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Dale Martin

2014-Nov-26 17:10 UTC

[LLVMdev] crash with large structure values on the stack

Hello,

This example input crashes if you run it through llc on x86.

[begin example]

; ModuleID = 'test'

%struct_2 = type { [90000 x %struct_1] }
%struct_1 = type { i8 }

define void @testFcn(%struct_2 %in1) {
testFcn_entry:
  %in1_ = alloca %struct_2
  store %struct_2 %in1, %struct_2* %in1_, align 8
  %localStruct_ = alloca %struct_2
  store %struct_2 %in1, %struct_2* %localStruct_, align 8
  br label %exit

exit:                                             ; preds = %testFcn_entry
  ret void
}
[end example]

It looks like at some stage of the backend compiler flow there is a
"merge_values" instruction generated where the number of inputs
exceeds 16k, but the number of inputs is stored in an unsigned short.  When this
instruction is being translated into x86 machine code, then there is an out of
bounds access:
~> llc bug-simple.bc
llc:
/local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649:
llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo <
NumValues && "Illegal result number!"' failed.

Probably the truncation of NumOperands should be caught directly with an
assertion in the SDNode constructor.

One other interesting aspect of it is that if you make the struct_2 type a
smaller matrix, like:
%struct_2 = type { [65534 x %struct_1] }

Then you don't get a crash, instead it takes about 20+ minutes to process
and you get a lot of movb instructions out - like 200k or so. Callgrind says
that 33% of the time consumed is directly in llvm::SUnit::ComputeHeight().
Clearly something has gone badly non-linear for this case.

I looked at what clang is doing for a similar construct and I see it generates a
memcpy intrinsic instead of the direct load/store. I'm now doing that in my
own front-end, but it seems like this is intended to be supported so I thought
I'd report it.  It seems like turning this construct into memcpy as an
optimization or part of the back-end lowering might be a better long-term
approach.

I filed this as bug #21671 in bugzilla.

Let me know if you'd like more information or if I can help come up with a
fix.  (I'm not familiar at all with the backend so I'd need some
guidance to be of much help.)

Thanks,
  Dale Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141126/c969bd11/attachment.html>

Reid Kleckner

2014-Nov-26 17:37 UTC

head link

[LLVMdev] crash with large structure values on the stack

While it would be good to get this bug fixed, I want to point out that
frontends are generally discouraged from emitting large aggregate loads and
stores. It's considered "more canonical" for the frontend to use
@llvm.memcpy to move this stuff around, and then load the individual
elements as needed. See past discussions with "daedal nix" about
optimizer
problems in this area.

On Wed, Nov 26, 2014 at 9:10 AM, Dale Martin <Dale.Martin at
mathworks.com>
wrote:
>  Hello,
>
> This example input crashes if you run it through llc on x86.
>
> [begin example]
>  ; ModuleID = 'test'
>
>  %struct_2 = type { [90000 x %struct_1] }
> %struct_1 = type { i8 }
>
>  define void @testFcn(%struct_2 %in1) {
> testFcn_entry:
>   %in1_ = alloca %struct_2
>   store %struct_2 %in1, %struct_2* %in1_, align 8
>   %localStruct_ = alloca %struct_2
>   store %struct_2 %in1, %struct_2* %localStruct_, align 8
>   br label %exit
>
>  exit:                                             ; preds >
%testFcn_entry
>   ret void
> }
>  [end example]
>
>  It looks like at some stage of the backend compiler flow there is a
> "merge_values" instruction generated where the number of inputs
exceeds
> 16k, but the number of inputs is stored in an unsigned short.  When this
> instruction is being translated into x86 machine code, then there is an out
> of bounds access:
>  ~> llc bug-simple.bc
> llc:
>
/local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649:
> llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion `ResNo
> < NumValues && "Illegal result number!"' failed.
>
>  Probably the truncation of NumOperands should be caught directly with an
> assertion in the SDNode constructor.
>
> One other interesting aspect of it is that if you make the struct_2 type a
> smaller matrix, like:
>  %struct_2 = type { [65534 x %struct_1] }
>
>  Then you don't get a crash, instead it takes about 20+ minutes to
> process and you get a lot of movb instructions out - like 200k or so.
> Callgrind says that 33% of the time consumed is directly in
> llvm::SUnit::ComputeHeight(). Clearly something has gone badly non-linear
> for this case.
>
>  I looked at what clang is doing for a similar construct and I see it
> generates a memcpy intrinsic instead of the direct load/store. I'm now
> doing that in my own front-end, but it seems like this is intended to be
> supported so I thought I'd report it.  It seems like turning this
construct
> into memcpy as an optimization or part of the back-end lowering might be a
> better long-term approach.
>
>  I filed this as bug #21671 in bugzilla.
>
>  Let me know if you'd like more information or if I can help come up
with
> a fix.  (I'm not familiar at all with the backend so I'd need some
guidance
> to be of much help.)
>
> Thanks,
>   Dale Martin
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141126/bb2404d6/attachment.html>

David Majnemer

2014-Nov-28 07:44 UTC

head link

[LLVMdev] crash with large structure values on the stack

I believe this is the same as http://llvm.org/bugs/show_bug.cgi?id=21513

I had a dumb fix for this which simply bumped up the size of
SDNode::NumOperands and SDNode::NumValues.

Hal had very reasonable reservations with this approach because it
increases the size of a structure which is used a lot.

On Wed, Nov 26, 2014 at 12:37 PM, Reid Kleckner <rnk at google.com> wrote:
> While it would be good to get this bug fixed, I want to point out that
> frontends are generally discouraged from emitting large aggregate loads and
> stores. It's considered "more canonical" for the frontend to
use
> @llvm.memcpy to move this stuff around, and then load the individual
> elements as needed. See past discussions with "daedal nix" about
optimizer
> problems in this area.
>
> On Wed, Nov 26, 2014 at 9:10 AM, Dale Martin <Dale.Martin at
mathworks.com>
> wrote:
>
>>  Hello,
>>
>> This example input crashes if you run it through llc on x86.
>>
>> [begin example]
>>  ; ModuleID = 'test'
>>
>>  %struct_2 = type { [90000 x %struct_1] }
>> %struct_1 = type { i8 }
>>
>>  define void @testFcn(%struct_2 %in1) {
>> testFcn_entry:
>>   %in1_ = alloca %struct_2
>>   store %struct_2 %in1, %struct_2* %in1_, align 8
>>   %localStruct_ = alloca %struct_2
>>   store %struct_2 %in1, %struct_2* %localStruct_, align 8
>>   br label %exit
>>
>>  exit:                                             ; preds >>
%testFcn_entry
>>   ret void
>> }
>>  [end example]
>>
>>  It looks like at some stage of the backend compiler flow there is a
>> "merge_values" instruction generated where the number of
inputs exceeds
>> 16k, but the number of inputs is stored in an unsigned short.  When
this
>> instruction is being translated into x86 machine code, then there is an
out
>> of bounds access:
>>  ~> llc bug-simple.bc
>> llc:
>>
/local/martind/oss/llvm-3.5.0.src/include/llvm/CodeGen/SelectionDAGNodes.h:649:
>> llvm::EVT llvm::SDNode::getValueType(unsigned int) const: Assertion
`ResNo
>> < NumValues && "Illegal result number!"'
failed.
>>
>>  Probably the truncation of NumOperands should be caught directly with
>> an assertion in the SDNode constructor.
>>
>> One other interesting aspect of it is that if you make the struct_2
type
>> a smaller matrix, like:
>>  %struct_2 = type { [65534 x %struct_1] }
>>
>>  Then you don't get a crash, instead it takes about 20+ minutes to
>> process and you get a lot of movb instructions out - like 200k or so.
>> Callgrind says that 33% of the time consumed is directly in
>> llvm::SUnit::ComputeHeight(). Clearly something has gone badly
non-linear
>> for this case.
>>
>>  I looked at what clang is doing for a similar construct and I see it
>> generates a memcpy intrinsic instead of the direct load/store. I'm
now
>> doing that in my own front-end, but it seems like this is intended to
be
>> supported so I thought I'd report it.  It seems like turning this
construct
>> into memcpy as an optimization or part of the back-end lowering might
be a
>> better long-term approach.
>>
>>  I filed this as bug #21671 in bugzilla.
>>
>>  Let me know if you'd like more information or if I can help come
up
>> with a fix.  (I'm not familiar at all with the backend so I'd
need some
>> guidance to be of much help.)
>>
>> Thanks,
>>   Dale Martin
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141128/3d80cb46/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Nov 2014 - [LLVMdev] crash with large structure values on the stack

[LLVMdev] crash with large structure values on the stack

[LLVMdev] crash with large structure values on the stack

[LLVMdev] crash with large structure values on the stack

Maybe Matching Threads