thr3ads.net - llvm dev - [llvm-dev] target-features attribute prevents inlining? [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Haoran Xu via llvm-dev

2020-Jun-13 04:20 UTC

[llvm-dev] target-features attribute prevents inlining?

Hello,

I'm new to LLVM and I recently hit a weird problem about inlining behavior.
I managed to get a minimal repro and the symptom of the issue, but I
couldn't understand the root cause or how I should properly handle this
issue.

Below is an IR code consisting of two functions '_Z2fnP10TestStructi'
and
'testfn', with the latter calling the former. One would expect the
optimizer inlining the call to the '_Z2fnP10TestStructi', but it
doesn't.
(The command line I used is 'opt -O3 test.ll -o test2.bc')

source_filename = "a.cpp"> target datalayout >
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.TestStruct = type { i8*, i32 }
>
> define dso_local i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1)
> #0 {
>   %3 = getelementptr inbounds %struct.TestStruct, %struct.TestStruct* %0,
> i64 0, i32 0
>   %4 = load i8*, i8** %3, align 8
>   %5 = icmp eq i8* %4, null
>   %6 = add nsw i32 %1, 1
>   %7 = shl nsw i32 %1, 1
>   %8 = select i1 %5, i32 %6, i32 %7
>   ret i32 %8
> }
>
> define i32 @testfn(%struct.TestStruct* %0) {
> body:
>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 1)
>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1)
>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %2)
>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %3)
>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %4)
>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %5)
>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %6)
>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %7)
>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %8)
>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %9)
>   ret i32 %10
> }
>
> attributes #0 = {
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
>
It turns out that the failure to inline is caused by the
'target-features'
attribute in the last line. The function inlines properly if I remove the
'target-features' attribute from '_Z2fnP10TestStructi', or if I
add
'attribute #0' to 'testfn'.

So I think the symptom is that inlining does not work when two functions
have different 'target-features' attributes. However, I could not
understand what is the reasoning behind this, or how I should prevent this
issue properly.

Just for additional information, in my use case, the function
'_Z2fnP10TestStructi' is automatically extracted from IR generated by
clang++ with -O3, so the IR contains a bunch of attributes and
MetadataNodes. The function 'testfn' is generated by my logic using
llvm::IRBuilder at runtime, so the function does not contain any of those
attributes and MetadataNodes initially. The functions generated by clang++
and my functions are then fed together into optimization passes, and I
expect the optimizer to inline clang++ functions into my functions as
needed.

So, what is the proper workaround for this? Should I delete all the
attribute and MetadataNodes from the clang++-generated IR (and if yes, is
that sufficient to prevent all those weird cases like this one)? I thought
it was a bad idea because they provide more info to optimizer. If not, what
is the proper way of handling this?

Thanks!

Best regards,
Haoran
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200612/b185c5a7/attachment.html>

David Blaikie via llvm-dev

2020-Jun-13 04:53 UTC

head link

[llvm-dev] target-features attribute prevents inlining?

(+Eric Christopher for target attributes)
(+Lang Hames for JIT things)

The problem is that those target features enable the compiler select
instructions that would only be valid if the target CPU has those
features (eg: a function without the "+mmx" attribute might be
intended to be run on a CPU that doesn't have the mmx instruction
set). It's possible that a function with mmx could be called from a
function without mmx if the caller checked the CPU features to ensure
they matched before making the call. Since there's any number of ways
that test might be done - LLVM can't be sure once it inlines the
mmx-using function into the not-mmx having caller, that LLVM won't
accidentally move the mmx-using code around beyond the condition. So
the inlining is disabled.

In the broadest sense, you probably want to compile things the same
way for both your IR generators - lifting whatever set of flags/etc is
used to generate the target and attributes from clang for your runtime
generated code would probably be the best thing.

- Dave

On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> Hello,
>
> I'm new to LLVM and I recently hit a weird problem about inlining
behavior. I managed to get a minimal repro and the symptom of the issue, but I
couldn't understand the root cause or how I should properly handle this
issue.
>
> Below is an IR code consisting of two functions
'_Z2fnP10TestStructi' and 'testfn', with the latter calling the
former. One would expect the optimizer inlining the call to the
'_Z2fnP10TestStructi', but it doesn't. (The command line I used is
'opt -O3 test.ll -o test2.bc')
>
>> source_filename = "a.cpp"
>> target datalayout =
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
>> target triple = "x86_64-unknown-linux-gnu"
>>
>> %struct.TestStruct = type { i8*, i32 }
>>
>> define dso_local i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%1) #0 {
>>   %3 = getelementptr inbounds %struct.TestStruct, %struct.TestStruct*
%0, i64 0, i32 0
>>   %4 = load i8*, i8** %3, align 8
>>   %5 = icmp eq i8* %4, null
>>   %6 = add nsw i32 %1, 1
>>   %7 = shl nsw i32 %1, 1
>>   %8 = select i1 %5, i32 %6, i32 %7
>>   ret i32 %8
>> }
>>
>> define i32 @testfn(%struct.TestStruct* %0) {
>> body:
>>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 1)
>>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1)
>>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %2)
>>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %3)
>>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %4)
>>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %5)
>>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %6)
>>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %7)
>>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %8)
>>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %9)
>>   ret i32 %10
>> }
>>
>> attributes #0 = {
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
>
>
> It turns out that the failure to inline is caused by the
'target-features' attribute in the last line. The function inlines
properly if I remove the 'target-features' attribute from
'_Z2fnP10TestStructi', or if I add 'attribute #0' to
'testfn'.
>
> So I think the symptom is that inlining does not work when two functions
have different 'target-features' attributes. However, I could not
understand what is the reasoning behind this, or how I should prevent this issue
properly.
>
> Just for additional information, in my use case, the function
'_Z2fnP10TestStructi' is automatically extracted from IR generated by
clang++ with -O3, so the IR contains a bunch of attributes and MetadataNodes.
The function 'testfn' is generated by my logic using llvm::IRBuilder at
runtime, so the function does not contain any of those attributes and
MetadataNodes initially. The functions generated by clang++ and my functions are
then fed together into optimization passes, and I expect the optimizer to inline
clang++ functions into my functions as needed.
>
> So, what is the proper workaround for this? Should I delete all the
attribute and MetadataNodes from the clang++-generated IR (and if yes, is that
sufficient to prevent all those weird cases like this one)? I thought it was a
bad idea because they provide more info to optimizer. If not, what is the proper
way of handling this?
>
> Thanks!
>
> Best regards,
> Haoran
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Haoran Xu via llvm-dev

2020-Jun-13 05:10 UTC

head link

[llvm-dev] target-features attribute prevents inlining?

Hi David,

Thanks for your quick response!

I now understand the reason that inlining cannot be done on functions with
different target-attributes. Thanks for your explanation!

However, I think I didn't fully understand your solution; it would be nice
if you would like to elaborate a bit more. Here's a bit more info on my
current workflow:

(1) The clang++ compiler builds C++ source file (a.cpp), which contains the
implementation '_Z2fnP10TestStructi', into bitcode (a.bc).
(2) A parser parses 'a.bc', extracts the IR of
'_Z2fnP10TestStructi' and
generates a data header file (a.h), containing the raw bitcode of that
function.
(3) The data header is then built with the main program, so the main
program has access to the raw bitcode data.
(4) At runtime, the main program generates 'testfn' using
llvm::IRBuilder
(something similar to Kaleidoscope tutorial does). The 'testfn' does not
have any of those attributes or MetadataNodes of course.
(5) The raw bitcode data and the 'testfn' are combined into a single
module
using LLVM's LinkinModule API, then fed into optimizer.

What do you think is the proper fix for my use case? I can think of a few,
but I don't think I have enough context to determine which is the most
proper fix.
(1) Remove all MetadataNode and attributes from the bitcode files. Is this
sufficient to prevent all weird cases like this one? What would be the
drawback if all MetadataNodes and attributes are removed?
(2) Remove only the 'target-features' attribute from the bitcode file.
Is
this sufficient to prevent all weird cases like this one?
(3) Add 'target-features' attribute to all the functions I generated. Is
this sufficient to prevent all weird cases like this one? Do I have the
guarantee that the 'target-features' attribute of all bitcode files
generated by clang++ are identical?

Thanks!

Haoran


David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午9:54写道：
> (+Eric Christopher for target attributes)
> (+Lang Hames for JIT things)
>
> The problem is that those target features enable the compiler select
> instructions that would only be valid if the target CPU has those
> features (eg: a function without the "+mmx" attribute might be
> intended to be run on a CPU that doesn't have the mmx instruction
> set). It's possible that a function with mmx could be called from a
> function without mmx if the caller checked the CPU features to ensure
> they matched before making the call. Since there's any number of ways
> that test might be done - LLVM can't be sure once it inlines the
> mmx-using function into the not-mmx having caller, that LLVM won't
> accidentally move the mmx-using code around beyond the condition. So
> the inlining is disabled.
>
> In the broadest sense, you probably want to compile things the same
> way for both your IR generators - lifting whatever set of flags/etc is
> used to generate the target and attributes from clang for your runtime
> generated code would probably be the best thing.
>
> - Dave
>
> On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hello,
> >
> > I'm new to LLVM and I recently hit a weird problem about inlining
> behavior. I managed to get a minimal repro and the symptom of the issue,
> but I couldn't understand the root cause or how I should properly
handle
> this issue.
> >
> > Below is an IR code consisting of two functions
'_Z2fnP10TestStructi'
> and 'testfn', with the latter calling the former. One would expect
the
> optimizer inlining the call to the '_Z2fnP10TestStructi', but it
doesn't.
> (The command line I used is 'opt -O3 test.ll -o test2.bc')
> >
> >> source_filename = "a.cpp"
> >> target datalayout >
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> >> target triple = "x86_64-unknown-linux-gnu"
> >>
> >> %struct.TestStruct = type { i8*, i32 }
> >>
> >> define dso_local i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0,
i32
> %1) #0 {
> >>   %3 = getelementptr inbounds %struct.TestStruct,
%struct.TestStruct*
> %0, i64 0, i32 0
> >>   %4 = load i8*, i8** %3, align 8
> >>   %5 = icmp eq i8* %4, null
> >>   %6 = add nsw i32 %1, 1
> >>   %7 = shl nsw i32 %1, 1
> >>   %8 = select i1 %5, i32 %6, i32 %7
> >>   ret i32 %8
> >> }
> >>
> >> define i32 @testfn(%struct.TestStruct* %0) {
> >> body:
> >>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
1)
> >>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%1)
> >>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%2)
> >>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%3)
> >>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%4)
> >>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%5)
> >>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%6)
> >>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%7)
> >>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%8)
> >>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
%9)
> >>   ret i32 %10
> >> }
> >>
> >> attributes #0 = {
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
> >
> >
> > It turns out that the failure to inline is caused by the
> 'target-features' attribute in the last line. The function inlines
properly
> if I remove the 'target-features' attribute from
'_Z2fnP10TestStructi', or
> if I add 'attribute #0' to 'testfn'.
> >
> > So I think the symptom is that inlining does not work when two
functions
> have different 'target-features' attributes. However, I could not
> understand what is the reasoning behind this, or how I should prevent this
> issue properly.
> >
> > Just for additional information, in my use case, the function
> '_Z2fnP10TestStructi' is automatically extracted from IR generated
by
> clang++ with -O3, so the IR contains a bunch of attributes and
> MetadataNodes. The function 'testfn' is generated by my logic using
> llvm::IRBuilder at runtime, so the function does not contain any of those
> attributes and MetadataNodes initially. The functions generated by clang++
> and my functions are then fed together into optimization passes, and I
> expect the optimizer to inline clang++ functions into my functions as
> needed.
> >
> > So, what is the proper workaround for this? Should I delete all the
> attribute and MetadataNodes from the clang++-generated IR (and if yes, is
> that sufficient to prevent all those weird cases like this one)? I thought
> it was a bad idea because they provide more info to optimizer. If not, what
> is the proper way of handling this?
> >
> > Thanks!
> >
> > Best regards,
> > Haoran
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200612/76b29781/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Jun 2020 - target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?

Maybe Matching Threads