thr3ads.net - llvm dev - [llvm-dev] target-features attribute prevents inlining? [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Haoran Xu via llvm-dev

2020-Jun-13 05:42 UTC

[llvm-dev] target-features attribute prevents inlining?

Thank you so much David! After thinking a bit more I agree with you that
attempting to add 'target-features' to my functions seem to be the
safest
approach of all.

I noticed that if I mark the clang++ function as 'AlwaysInline', the
inlining is performed normally. Is this a potential bug, given what you
said that LLVM may accidentally move code using advanced cpu features
outside the condition-check?

Also, may I ask an additional (kind of irrelevant) question?
The functions I extracted from clang++ output are already optimized. I
wanted to have some way to prevent LLVM from wasting time in optimizing
them again at runtime, when those functions are fed together with my
functions (which should be optimized) into the optimizer. Is there any way
to achieve this? I do not think the 'optnone' attribute is the solution
since it prevents the function from being inlined. I am currently marking
those clang++ functions to have 'available_externally' linkage, which I
feel is the closest to what I want from my understanding of the document,
though I'm not sure if this is the right approach, or if there is a better
approach. Could you kindly give some pointers on this question?

Thanks again!

Best,
Haoran


David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午10:17写道：
> On Fri, Jun 12, 2020 at 10:10 PM Haoran Xu <haoranxu510 at gmail.com>
wrote:
> >
> > Hi David,
> >
> > Thanks for your quick response!
> >
> > I now understand the reason that inlining cannot be done on functions
> with different target-attributes. Thanks for your explanation!
> >
> > However, I think I didn't fully understand your solution; it would
be
> nice if you would like to elaborate a bit more. Here's a bit more info
on
> my current workflow:
> >
> > (1) The clang++ compiler builds C++ source file (a.cpp), which
contains
> the implementation '_Z2fnP10TestStructi', into bitcode (a.bc).
> > (2) A parser parses 'a.bc', extracts the IR of
'_Z2fnP10TestStructi' and
> generates a data header file (a.h), containing the raw bitcode of that
> function.
> > (3) The data header is then built with the main program, so the main
> program has access to the raw bitcode data.
> > (4) At runtime, the main program generates 'testfn' using
> llvm::IRBuilder (something similar to Kaleidoscope tutorial does). The
> 'testfn' does not have any of those attributes or MetadataNodes of
course.
> > (5) The raw bitcode data and the 'testfn' are combined into a
single
> module using LLVM's LinkinModule API, then fed into optimizer.
> >
> > What do you think is the proper fix for my use case? I can think of a
> few, but I don't think I have enough context to determine which is the
most
> proper fix.
> > (1) Remove all MetadataNode and attributes from the bitcode files. Is
> this sufficient to prevent all weird cases like this one? What would be the
> drawback if all MetadataNodes and attributes are removed?
>
> I don't know if dropping attributes is always safe/correct. (metadata
> is certainly droppable (or at least intended to be) while maintaining
> correctness - they're meant to be optional value-add without being
> mandatory)
>
> > (2) Remove only the 'target-features' attribute from the
bitcode file.
> Is this sufficient to prevent all weird cases like this one?
>
> Don't know for sure.
>
> > (3) Add 'target-features' attribute to all the functions I
generated. Is
> this sufficient to prevent all weird cases like this one? Do I have the
> guarantee that the 'target-features' attribute of all bitcode files
> generated by clang++ are identical?
>
> That's sort of what I was getting at - suggesting you figure out how
> Clang is determining the attributes and replicate or otherwise reuse
> the same logic. Not sure how feasible that approach is - but it'd be
> where I'd look to start at least.
>
> - Dave
>
> >
> > Thanks!
> >
> > Haoran
> >
> >
> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午9:54写道：
> >>
> >> (+Eric Christopher for target attributes)
> >> (+Lang Hames for JIT things)
> >>
> >> The problem is that those target features enable the compiler
select
> >> instructions that would only be valid if the target CPU has those
> >> features (eg: a function without the "+mmx" attribute
might be
> >> intended to be run on a CPU that doesn't have the mmx
instruction
> >> set). It's possible that a function with mmx could be called
from a
> >> function without mmx if the caller checked the CPU features to
ensure
> >> they matched before making the call. Since there's any number
of ways
> >> that test might be done - LLVM can't be sure once it inlines
the
> >> mmx-using function into the not-mmx having caller, that LLVM
won't
> >> accidentally move the mmx-using code around beyond the condition.
So
> >> the inlining is disabled.
> >>
> >> In the broadest sense, you probably want to compile things the
same
> >> way for both your IR generators - lifting whatever set of
flags/etc is
> >> used to generate the target and attributes from clang for your
runtime
> >> generated code would probably be the best thing.
> >>
> >> - Dave
> >>
> >> On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I'm new to LLVM and I recently hit a weird problem about
inlining
> behavior. I managed to get a minimal repro and the symptom of the issue,
> but I couldn't understand the root cause or how I should properly
handle
> this issue.
> >> >
> >> > Below is an IR code consisting of two functions
'_Z2fnP10TestStructi'
> and 'testfn', with the latter calling the former. One would expect
the
> optimizer inlining the call to the '_Z2fnP10TestStructi', but it
doesn't.
> (The command line I used is 'opt -O3 test.ll -o test2.bc')
> >> >
> >> >> source_filename = "a.cpp"
> >> >> target datalayout >
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> >> >> target triple = "x86_64-unknown-linux-gnu"
> >> >>
> >> >> %struct.TestStruct = type { i8*, i32 }
> >> >>
> >> >> define dso_local i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0,
> i32 %1) #0 {
> >> >>   %3 = getelementptr inbounds %struct.TestStruct,
> %struct.TestStruct* %0, i64 0, i32 0
> >> >>   %4 = load i8*, i8** %3, align 8
> >> >>   %5 = icmp eq i8* %4, null
> >> >>   %6 = add nsw i32 %1, 1
> >> >>   %7 = shl nsw i32 %1, 1
> >> >>   %8 = select i1 %5, i32 %6, i32 %7
> >> >>   ret i32 %8
> >> >> }
> >> >>
> >> >> define i32 @testfn(%struct.TestStruct* %0) {
> >> >> body:
> >> >>   %1 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 1)
> >> >>   %2 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %1)
> >> >>   %3 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %2)
> >> >>   %4 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %3)
> >> >>   %5 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %4)
> >> >>   %6 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %5)
> >> >>   %7 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %6)
> >> >>   %8 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %7)
> >> >>   %9 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %8)
> >> >>   %10 = call i32 @_Z2fnP10TestStructi(%struct.TestStruct*
%0, i32 %9)
> >> >>   ret i32 %10
> >> >> }
> >> >>
> >> >> attributes #0 = {
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
> >> >
> >> >
> >> > It turns out that the failure to inline is caused by the
> 'target-features' attribute in the last line. The function inlines
properly
> if I remove the 'target-features' attribute from
'_Z2fnP10TestStructi', or
> if I add 'attribute #0' to 'testfn'.
> >> >
> >> > So I think the symptom is that inlining does not work when
two
> functions have different 'target-features' attributes. However, I
could not
> understand what is the reasoning behind this, or how I should prevent this
> issue properly.
> >> >
> >> > Just for additional information, in my use case, the function
> '_Z2fnP10TestStructi' is automatically extracted from IR generated
by
> clang++ with -O3, so the IR contains a bunch of attributes and
> MetadataNodes. The function 'testfn' is generated by my logic using
> llvm::IRBuilder at runtime, so the function does not contain any of those
> attributes and MetadataNodes initially. The functions generated by clang++
> and my functions are then fed together into optimization passes, and I
> expect the optimizer to inline clang++ functions into my functions as
> needed.
> >> >
> >> > So, what is the proper workaround for this? Should I delete
all the
> attribute and MetadataNodes from the clang++-generated IR (and if yes, is
> that sufficient to prevent all those weird cases like this one)? I thought
> it was a bad idea because they provide more info to optimizer. If not, what
> is the proper way of handling this?
> >> >
> >> > Thanks!
> >> >
> >> > Best regards,
> >> > Haoran
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200612/57e02091/attachment.html>

David Blaikie via llvm-dev

2020-Jun-13 06:48 UTC

head link

[llvm-dev] target-features attribute prevents inlining?

On Fri, Jun 12, 2020 at 10:42 PM Haoran Xu <haoranxu510 at gmail.com>
wrote:>
> Thank you so much David! After thinking a bit more I agree with you that
attempting to add 'target-features' to my functions seem to be the
safest approach of all.
>
> I noticed that if I mark the clang++ function as 'AlwaysInline',
the inlining is performed normally. Is this a potential bug, given what you said
that LLVM may accidentally move code using advanced cpu features outside the
condition-check?
I guess that's probably just one of those "you get what you asked
for"
situations - if you're mixing target attributes and forcing inlining,
it's assumed you've weighed the risks/figured out how to make that
work safely. But I'm not entirely sure.
> Also, may I ask an additional (kind of irrelevant) question?
> The functions I extracted from clang++ output are already optimized. I
wanted to have some way to prevent LLVM from wasting time in optimizing them
again at runtime, when those functions are fed together with my functions (which
should be optimized) into the optimizer. Is there any way to achieve this? I do
not think the 'optnone' attribute is the solution since it prevents the
function from being inlined. I am currently marking those clang++ functions to
have 'available_externally' linkage, which I feel is the closest to what
I want from my understanding of the document, though I'm not sure if this is
the right approach, or if there is a better approach. Could you kindly give some
pointers on this question?
available_externally seems problematic - what that does is, if LLVM
fails to inline the available_externally definition, it can
drop/delete the definition and rely on a definition being available in
some other object file/module that this one is linked to. So if you
add that attribute and the function is not inlined, you'll probably
get a linker error about a missing symbol definition.

"optnone" is about all we have for this sort of thing - so if
that's
not what you're looking for, probably best to just let LLVM
re-optimize the function. In general optimizations should be cheap if
they're not doing any work/the function is already optimized.

- Dave
>
> Thanks again!
>
> Best,
> Haoran
>
>
> David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午10:17写道：
>>
>> On Fri, Jun 12, 2020 at 10:10 PM Haoran Xu <haoranxu510 at
gmail.com> wrote:
>> >
>> > Hi David,
>> >
>> > Thanks for your quick response!
>> >
>> > I now understand the reason that inlining cannot be done on
functions with different target-attributes. Thanks for your explanation!
>> >
>> > However, I think I didn't fully understand your solution; it
would be nice if you would like to elaborate a bit more. Here's a bit more
info on my current workflow:
>> >
>> > (1) The clang++ compiler builds C++ source file (a.cpp), which
contains the implementation '_Z2fnP10TestStructi', into bitcode (a.bc).
>> > (2) A parser parses 'a.bc', extracts the IR of
'_Z2fnP10TestStructi' and generates a data header file (a.h), containing
the raw bitcode of that function.
>> > (3) The data header is then built with the main program, so the
main program has access to the raw bitcode data.
>> > (4) At runtime, the main program generates 'testfn' using
llvm::IRBuilder (something similar to Kaleidoscope tutorial does). The
'testfn' does not have any of those attributes or MetadataNodes of
course.
>> > (5) The raw bitcode data and the 'testfn' are combined
into a single module using LLVM's LinkinModule API, then fed into optimizer.
>> >
>> > What do you think is the proper fix for my use case? I can think
of a few, but I don't think I have enough context to determine which is the
most proper fix.
>> > (1) Remove all MetadataNode and attributes from the bitcode files.
Is this sufficient to prevent all weird cases like this one? What would be the
drawback if all MetadataNodes and attributes are removed?
>>
>> I don't know if dropping attributes is always safe/correct.
(metadata
>> is certainly droppable (or at least intended to be) while maintaining
>> correctness - they're meant to be optional value-add without being
>> mandatory)
>>
>> > (2) Remove only the 'target-features' attribute from the
bitcode file. Is this sufficient to prevent all weird cases like this one?
>>
>> Don't know for sure.
>>
>> > (3) Add 'target-features' attribute to all the functions I
generated. Is this sufficient to prevent all weird cases like this one? Do I
have the guarantee that the 'target-features' attribute of all bitcode
files generated by clang++ are identical?
>>
>> That's sort of what I was getting at - suggesting you figure out
how
>> Clang is determining the attributes and replicate or otherwise reuse
>> the same logic. Not sure how feasible that approach is - but it'd
be
>> where I'd look to start at least.
>>
>> - Dave
>>
>> >
>> > Thanks!
>> >
>> > Haoran
>> >
>> >
>> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五
下午9:54写道：
>> >>
>> >> (+Eric Christopher for target attributes)
>> >> (+Lang Hames for JIT things)
>> >>
>> >> The problem is that those target features enable the compiler
select
>> >> instructions that would only be valid if the target CPU has
those
>> >> features (eg: a function without the "+mmx"
attribute might be
>> >> intended to be run on a CPU that doesn't have the mmx
instruction
>> >> set). It's possible that a function with mmx could be
called from a
>> >> function without mmx if the caller checked the CPU features to
ensure
>> >> they matched before making the call. Since there's any
number of ways
>> >> that test might be done - LLVM can't be sure once it
inlines the
>> >> mmx-using function into the not-mmx having caller, that LLVM
won't
>> >> accidentally move the mmx-using code around beyond the
condition. So
>> >> the inlining is disabled.
>> >>
>> >> In the broadest sense, you probably want to compile things the
same
>> >> way for both your IR generators - lifting whatever set of
flags/etc is
>> >> used to generate the target and attributes from clang for your
runtime
>> >> generated code would probably be the best thing.
>> >>
>> >> - Dave
>> >>
>> >> On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
>> >> <llvm-dev at lists.llvm.org> wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > I'm new to LLVM and I recently hit a weird problem
about inlining behavior. I managed to get a minimal repro and the symptom of the
issue, but I couldn't understand the root cause or how I should properly
handle this issue.
>> >> >
>> >> > Below is an IR code consisting of two functions
'_Z2fnP10TestStructi' and 'testfn', with the latter calling the
former. One would expect the optimizer inlining the call to the
'_Z2fnP10TestStructi', but it doesn't. (The command line I used is
'opt -O3 test.ll -o test2.bc')
>> >> >
>> >> >> source_filename = "a.cpp"
>> >> >> target datalayout =
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
>> >> >> target triple = "x86_64-unknown-linux-gnu"
>> >> >>
>> >> >> %struct.TestStruct = type { i8*, i32 }
>> >> >>
>> >> >> define dso_local i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1) #0 {
>> >> >>   %3 = getelementptr inbounds %struct.TestStruct,
%struct.TestStruct* %0, i64 0, i32 0
>> >> >>   %4 = load i8*, i8** %3, align 8
>> >> >>   %5 = icmp eq i8* %4, null
>> >> >>   %6 = add nsw i32 %1, 1
>> >> >>   %7 = shl nsw i32 %1, 1
>> >> >>   %8 = select i1 %5, i32 %6, i32 %7
>> >> >>   ret i32 %8
>> >> >> }
>> >> >>
>> >> >> define i32 @testfn(%struct.TestStruct* %0) {
>> >> >> body:
>> >> >>   %1 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 1)
>> >> >>   %2 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %1)
>> >> >>   %3 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %2)
>> >> >>   %4 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %3)
>> >> >>   %5 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %4)
>> >> >>   %6 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %5)
>> >> >>   %7 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %6)
>> >> >>   %8 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %7)
>> >> >>   %9 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %8)
>> >> >>   %10 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32 %9)
>> >> >>   ret i32 %10
>> >> >> }
>> >> >>
>> >> >> attributes #0 = {
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
>> >> >
>> >> >
>> >> > It turns out that the failure to inline is caused by the
'target-features' attribute in the last line. The function inlines
properly if I remove the 'target-features' attribute from
'_Z2fnP10TestStructi', or if I add 'attribute #0' to
'testfn'.
>> >> >
>> >> > So I think the symptom is that inlining does not work
when two functions have different 'target-features' attributes. However,
I could not understand what is the reasoning behind this, or how I should
prevent this issue properly.
>> >> >
>> >> > Just for additional information, in my use case, the
function '_Z2fnP10TestStructi' is automatically extracted from IR
generated by clang++ with -O3, so the IR contains a bunch of attributes and
MetadataNodes. The function 'testfn' is generated by my logic using
llvm::IRBuilder at runtime, so the function does not contain any of those
attributes and MetadataNodes initially. The functions generated by clang++ and
my functions are then fed together into optimization passes, and I expect the
optimizer to inline clang++ functions into my functions as needed.
>> >> >
>> >> > So, what is the proper workaround for this? Should I
delete all the attribute and MetadataNodes from the clang++-generated IR (and if
yes, is that sufficient to prevent all those weird cases like this one)? I
thought it was a bad idea because they provide more info to optimizer. If not,
what is the proper way of handling this?
>> >> >
>> >> > Thanks!
>> >> >
>> >> > Best regards,
>> >> > Haoran
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > llvm-dev at lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Craig Topper via llvm-dev

2020-Jun-13 06:58 UTC

head link

[llvm-dev] target-features attribute prevents inlining?

On Fri, Jun 12, 2020 at 11:48 PM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Fri, Jun 12, 2020 at 10:42 PM Haoran Xu <haoranxu510 at gmail.com>
wrote:
> >
> > Thank you so much David! After thinking a bit more I agree with you
that
> attempting to add 'target-features' to my functions seem to be the
safest
> approach of all.
> >
> > I noticed that if I mark the clang++ function as
'AlwaysInline', the
> inlining is performed normally. Is this a potential bug, given what you
> said that LLVM may accidentally move code using advanced cpu features
> outside the condition-check?
>
> I guess that's probably just one of those "you get what you asked
for"
> situations - if you're mixing target attributes and forcing inlining,
> it's assumed you've weighed the risks/figured out how to make that
> work safely. But I'm not entirely sure.

Clang checks the target features for attribute(always_inline) and will fail
to compile if the caller isn’t a superset of the caller.


>
> > Also, may I ask an additional (kind of irrelevant) question?
> > The functions I extracted from clang++ output are already optimized. I
> wanted to have some way to prevent LLVM from wasting time in optimizing
> them again at runtime, when those functions are fed together with my
> functions (which should be optimized) into the optimizer. Is there any way
> to achieve this? I do not think the 'optnone' attribute is the
solution
> since it prevents the function from being inlined. I am currently marking
> those clang++ functions to have 'available_externally' linkage,
which I
> feel is the closest to what I want from my understanding of the document,
> though I'm not sure if this is the right approach, or if there is a
better
> approach. Could you kindly give some pointers on this question?
>
> available_externally seems problematic - what that does is, if LLVM
> fails to inline the available_externally definition, it can
> drop/delete the definition and rely on a definition being available in
> some other object file/module that this one is linked to. So if you
> add that attribute and the function is not inlined, you'll probably
> get a linker error about a missing symbol definition.
>
> "optnone" is about all we have for this sort of thing - so if
that's
> not what you're looking for, probably best to just let LLVM
> re-optimize the function. In general optimizations should be cheap if
> they're not doing any work/the function is already optimized.
>
> - Dave
>
> >
> > Thanks again!
> >
> > Best,
> > Haoran
> >
> >
> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五 下午10:17写道：
> >>
> >> On Fri, Jun 12, 2020 at 10:10 PM Haoran Xu <haoranxu510 at
gmail.com>
> wrote:
> >> >
> >> > Hi David,
> >> >
> >> > Thanks for your quick response!
> >> >
> >> > I now understand the reason that inlining cannot be done on
functions
> with different target-attributes. Thanks for your explanation!
> >> >
> >> > However, I think I didn't fully understand your solution;
it would be
> nice if you would like to elaborate a bit more. Here's a bit more info
on
> my current workflow:
> >> >
> >> > (1) The clang++ compiler builds C++ source file (a.cpp),
which
> contains the implementation '_Z2fnP10TestStructi', into bitcode
(a.bc).
> >> > (2) A parser parses 'a.bc', extracts the IR of
'_Z2fnP10TestStructi'
> and generates a data header file (a.h), containing the raw bitcode of that
> function.
> >> > (3) The data header is then built with the main program, so
the main
> program has access to the raw bitcode data.
> >> > (4) At runtime, the main program generates 'testfn'
using
> llvm::IRBuilder (something similar to Kaleidoscope tutorial does). The
> 'testfn' does not have any of those attributes or MetadataNodes of
course.
> >> > (5) The raw bitcode data and the 'testfn' are
combined into a single
> module using LLVM's LinkinModule API, then fed into optimizer.
> >> >
> >> > What do you think is the proper fix for my use case? I can
think of a
> few, but I don't think I have enough context to determine which is the
most
> proper fix.
> >> > (1) Remove all MetadataNode and attributes from the bitcode
files. Is
> this sufficient to prevent all weird cases like this one? What would be the
> drawback if all MetadataNodes and attributes are removed?
> >>
> >> I don't know if dropping attributes is always safe/correct.
(metadata
> >> is certainly droppable (or at least intended to be) while
maintaining
> >> correctness - they're meant to be optional value-add without
being
> >> mandatory)
> >>
> >> > (2) Remove only the 'target-features' attribute from
the bitcode
> file. Is this sufficient to prevent all weird cases like this one?
> >>
> >> Don't know for sure.
> >>
> >> > (3) Add 'target-features' attribute to all the
functions I generated.
> Is this sufficient to prevent all weird cases like this one? Do I have the
> guarantee that the 'target-features' attribute of all bitcode files
> generated by clang++ are identical?
> >>
> >> That's sort of what I was getting at - suggesting you figure
out how
> >> Clang is determining the attributes and replicate or otherwise
reuse
> >> the same logic. Not sure how feasible that approach is - but
it'd be
> >> where I'd look to start at least.
> >>
> >> - Dave
> >>
> >> >
> >> > Thanks!
> >> >
> >> > Haoran
> >> >
> >> >
> >> > David Blaikie <dblaikie at gmail.com> 于2020年6月12日周五
下午9:54写道：
> >> >>
> >> >> (+Eric Christopher for target attributes)
> >> >> (+Lang Hames for JIT things)
> >> >>
> >> >> The problem is that those target features enable the
compiler select
> >> >> instructions that would only be valid if the target CPU
has those
> >> >> features (eg: a function without the "+mmx"
attribute might be
> >> >> intended to be run on a CPU that doesn't have the mmx
instruction
> >> >> set). It's possible that a function with mmx could be
called from a
> >> >> function without mmx if the caller checked the CPU
features to ensure
> >> >> they matched before making the call. Since there's
any number of ways
> >> >> that test might be done - LLVM can't be sure once it
inlines the
> >> >> mmx-using function into the not-mmx having caller, that
LLVM won't
> >> >> accidentally move the mmx-using code around beyond the
condition. So
> >> >> the inlining is disabled.
> >> >>
> >> >> In the broadest sense, you probably want to compile
things the same
> >> >> way for both your IR generators - lifting whatever set of
flags/etc
> is
> >> >> used to generate the target and attributes from clang for
your
> runtime
> >> >> generated code would probably be the best thing.
> >> >>
> >> >> - Dave
> >> >>
> >> >> On Fri, Jun 12, 2020 at 9:21 PM Haoran Xu via llvm-dev
> >> >> <llvm-dev at lists.llvm.org> wrote:
> >> >> >
> >> >> > Hello,
> >> >> >
> >> >> > I'm new to LLVM and I recently hit a weird
problem about inlining
> behavior. I managed to get a minimal repro and the symptom of the issue,
> but I couldn't understand the root cause or how I should properly
handle
> this issue.
> >> >> >
> >> >> > Below is an IR code consisting of two functions
> '_Z2fnP10TestStructi' and 'testfn', with the latter calling
the former. One
> would expect the optimizer inlining the call to the
'_Z2fnP10TestStructi',
> but it doesn't. (The command line I used is 'opt -O3 test.ll -o
test2.bc')
> >> >> >
> >> >> >> source_filename = "a.cpp"
> >> >> >> target datalayout >
"e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> >> >> >> target triple =
"x86_64-unknown-linux-gnu"
> >> >> >>
> >> >> >> %struct.TestStruct = type { i8*, i32 }
> >> >> >>
> >> >> >> define dso_local i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0,
> i32 %1) #0 {
> >> >> >>   %3 = getelementptr inbounds
%struct.TestStruct,
> %struct.TestStruct* %0, i64 0, i32 0
> >> >> >>   %4 = load i8*, i8** %3, align 8
> >> >> >>   %5 = icmp eq i8* %4, null
> >> >> >>   %6 = add nsw i32 %1, 1
> >> >> >>   %7 = shl nsw i32 %1, 1
> >> >> >>   %8 = select i1 %5, i32 %6, i32 %7
> >> >> >>   ret i32 %8
> >> >> >> }
> >> >> >>
> >> >> >> define i32 @testfn(%struct.TestStruct* %0) {
> >> >> >> body:
> >> >> >>   %1 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> 1)
> >> >> >>   %2 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %1)
> >> >> >>   %3 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %2)
> >> >> >>   %4 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %3)
> >> >> >>   %5 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %4)
> >> >> >>   %6 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %5)
> >> >> >>   %7 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %6)
> >> >> >>   %8 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %7)
> >> >> >>   %9 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %8)
> >> >> >>   %10 = call i32
@_Z2fnP10TestStructi(%struct.TestStruct* %0, i32
> %9)
> >> >> >>   ret i32 %10
> >> >> >> }
> >> >> >>
> >> >> >> attributes #0 = {
> "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" }
> >> >> >
> >> >> >
> >> >> > It turns out that the failure to inline is caused by
the
> 'target-features' attribute in the last line. The function inlines
properly
> if I remove the 'target-features' attribute from
'_Z2fnP10TestStructi', or
> if I add 'attribute #0' to 'testfn'.
> >> >> >
> >> >> > So I think the symptom is that inlining does not
work when two
> functions have different 'target-features' attributes. However, I
could not
> understand what is the reasoning behind this, or how I should prevent this
> issue properly.
> >> >> >
> >> >> > Just for additional information, in my use case, the
function
> '_Z2fnP10TestStructi' is automatically extracted from IR generated
by
> clang++ with -O3, so the IR contains a bunch of attributes and
> MetadataNodes. The function 'testfn' is generated by my logic using
> llvm::IRBuilder at runtime, so the function does not contain any of those
> attributes and MetadataNodes initially. The functions generated by clang++
> and my functions are then fed together into optimization passes, and I
> expect the optimizer to inline clang++ functions into my functions as
> needed.
> >> >> >
> >> >> > So, what is the proper workaround for this? Should I
delete all
> the attribute and MetadataNodes from the clang++-generated IR (and if yes,
> is that sufficient to prevent all those weird cases like this one)? I
> thought it was a bad idea because they provide more info to optimizer. If
> not, what is the proper way of handling this?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > Best regards,
> >> >> > Haoran
> >> >> >
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > LLVM Developers mailing list
> >> >> > llvm-dev at lists.llvm.org
> >> >> >
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200612/fab33b67/attachment.html>

llvm dev - Jun 2020 - target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?

[llvm-dev] target-features attribute prevents inlining?