thr3ads.net - llvm dev - [llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Mehdi AMINI via llvm-dev

2019-Jan-15 01:03 UTC

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

On Mon, Jan 14, 2019 at 4:51 PM Chandler Carruth <chandlerc at gmail.com>
wrote:
>
> On Mon, Jan 14, 2019, 15:59 Mehdi AMINI <joker.eph at gmail.com wrote:
>
>>
>>
>> On Mon, Jan 14, 2019 at 9:36 AM Chandler Carruth via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> While I'm very interested in the end result here, I have some
questions
>>> that don't seem well answered yet around pointer subtraction...
>>>
>>> First and foremost - how do you address correctness issues here?
Because
>>> the subtraction `A - B` can escape/capture more things.
Specifically, if
>>> one of `A` or `B` is escaped/captured, the subtraction can be used
to
>>> escape or capture the other pointer.
>>>
>>
>> Isn't escaping supposed to work at the "address ranges"
level and not at
>> the pointer value?
>> I mean that if `A` or `B` is escaped/captured, then any pointer that is
>> associated to the same memory range should be considered as
"escaped", and
>> thus the subtraction does not seem to leak anything more to me.
>>
>
> I believe this is true for subtracting "inbounds" (to borrow the
gep
> terminology), but just as we support non inbounds GEP, we support non
> imbounds subtracting. There it seems like this does escape the other
> global. I know that in the past I've discussed this exact case with
> nlewycky and he believed that to be the case, so I suspect quite a bit of
> LLVM is written under this model. No idea what would be the impact of
> changing it beyond the ability to represent code like the example I gave
> earlier on the thread.
>
That does not match my current reading of LangRef (admittedly
shallow/imperfect): not having inbounds allows you to form pointers that
are outside of the memory range, however the "result value may not
necessarily be used to access memory though, even if it happens to point
into allocated storage".
Also the "pointer aliasing rules" section mentions: "Any memory
access must
be done through a pointer value associated with an address range of the
memory access, otherwise the behavior is undefined".


>
>>
>>
>>> So *some* of the conservative treatment is necessary. What is the
plan
>>> to update all the analyses to remain correct? What correctness
testing have
>>> you done?
>>>
>>> Second - an intrinsic seems a poor fit here given the significance
of
>>> this operation. We have an instruction that covers most pointer
arithmetic
>>> (`getelementptr`), and I can imagine growing pointer subtraction,
but it
>>> seems like it should be an instruction if we're going to have
it. Based on
>>> the above, we will need to use it very often in analysis.
>>>
>>>
>>> Regarding the instcombine, it should be very easy to keep loads and
>>> stores of pointers as pointer typed in instcombine. Likely just a
missing
>>> case in the code I added/touched there.
>>>
>>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hello all,
>>>>
>>>> This is a proposal for reducing # of ptrtoint/inttoptr casts
which are
>>>> not
>>>> written by programmers but rather generated by LLVM passes.
>>>> Currently the majority of ptrtoint/inttoptr casts are generated
by LLVM;
>>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018) with
-O3,
>>>> the output IR contains 22,771 inttoptr instructions. However,
when
>>>> compiling it with -O0, there are only 1048 inttoptrs, meaning
that 95.4%
>>>> of them are generated by LLVM passes.
>>>>
>>>> This trend is similar in ptrtoint instruction as well. When
compiling
>>>> SPEC 2017
>>>> with -O0, there are 23,208 ptrtoint instructions, but among
them 22,016
>>>> (94.8%)
>>>> are generated by Clang frontend to represent pointer
subtraction.
>>>> They aren't effectively optimized out because there are
even more
>>>> ptrtoints (31,721) after -O3.
>>>> This is bad for performance because existence of ptrtoint makes
>>>> analysis return conservative
>>>> result as a pointer can be escaped through the cast.
>>>> Memory accesses to a pointer came from inttoptr is assumed
>>>> to possibly access anywhere, therefore it may block
>>>> store-to-load forwarding, merging two same loads, etc.
>>>>
>>>> I believe this can be addressed by applying two patches - first
one is
>>>> representing pointer subtraction with a dedicated intrinsic
function,
>>>> llvm.psub, and second one is disabling InstCombine
transformation
>>>>
>>>>     %q = load i8*, i8** %p1
>>>>     store i8* %q, i8** %p2
>>>> =>
>>>>   %1 = bitcast i8** %p1 to i64*
>>>>   %q1 = load i64, i64* %1, align 8
>>>>   %2 = bitcast i8** %p2 to i64*
>>>>   store i64 %q1, i64* %2, align 8
>>>>
>>>> This transformation can introduce inttoptrs later if loads are
followed
>>>> (https://godbolt.org/z/wsZ3II ). Both are discussed in
>>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well.
>>>> After llvm.psub is used & this transformation is disabled,
# of
>>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of
ptrtoints
>>>> decreases from 31,721 to 7,772 (24.5%).
>>>>
>>>> I'll introduce llvm.psub patch first.
>>>>
>>>>
>>>> --- Adding llvm.psub ---
>>>>
>>>> By defining pointer subtraction intrinsic, we can get
performance gain
>>>> because it gives more undefined behavior than just subtracting
two
>>>> ptrtoints.
>>>>
>>>> Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2)
intrinsic
>>>> function, which subtracts two pointers and returns the
difference. Its
>>>> semantic is as follows.
>>>> If p1 and p2 point to different objects, and neither of them is
based
>>>> on a pointer casted from an integer, `llvm.psub(p1, p2)`
returns poison.
>>>> For example,
>>>>
>>>> %p = alloca
>>>> %q = alloca
>>>> %i = llvm.psub(p, q) ; %i is poison
>>>>
>>>> This allows aggressive escape analysis on pointers. Given i
>>>> llvm.psub(p1, p2), if neither of p1 and p2 is based on a
pointer casted
>>>> from an integer, the llvm.psub call does not make p1 or p2
escape. (
>>>> https://reviews.llvm.org/D56601 )
>>>>
>>>> If either p1 or p2 is based on a pointer casted from integer,
or p1 and
>>>> p2 point to a same object, it returns the result of subtraction
(in bytes);
>>>> for example,
>>>>
>>>> %p = alloca
>>>> %q = inttoptr %x
>>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x
>>>>
>>>> `null` is regarded as a pointer casted from an integer because
>>>> it is equivalent to `inttoptr 0`.
>>>>
>>>> Adding llvm.psub allows LLVM to utilize significant portion of
>>>> ptrtoints & reduce a portion of inttoptrs. After llvm.psub
is used, when
>>>> SPECrate 2017 is compiled with -O3, # of inttoptr decreases to
~13,500
>>>> (59%) and # of ptrtoint decreases to ~14,300 (45%).
>>>>
>>>> To see the performance change, I ran SPECrate 2017 (thread # =
1) with
>>>> three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM
6.0
>>>> official, and r348082 (Dec 2, 2018).
>>>> Running r313797 shows that 505.mcf_r has consistent 2.0%
speedup over 3
>>>> different machines (which are i3-6100, i5-6600, i7-7700). For
LLVM 6.0 and
>>>> r348082, there's neither consistent speedup nor slowdown,
but the average
>>>> speedup is near 0. I believe there's still a room of
improvement because
>>>> there are passes which are not aware of llvm.psub.
>>>>
>>>> Thank you for reading this, and any comment is welcome.
>>>>
>>>> Best Regards,
>>>> Juneyoung Lee
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/80c22cc3/attachment.html>

Chandler Carruth via llvm-dev

2019-Jan-15 01:44 UTC

head link

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

On Mon, Jan 14, 2019 at 5:03 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>
> On Mon, Jan 14, 2019 at 4:51 PM Chandler Carruth <chandlerc at
gmail.com>
> wrote:
>
>>
>> On Mon, Jan 14, 2019, 15:59 Mehdi AMINI <joker.eph at gmail.com
wrote:
>>
>>>
>>>
>>> On Mon, Jan 14, 2019 at 9:36 AM Chandler Carruth via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> While I'm very interested in the end result here, I have
some questions
>>>> that don't seem well answered yet around pointer
subtraction...
>>>>
>>>> First and foremost - how do you address correctness issues
here?
>>>> Because the subtraction `A - B` can escape/capture more things.
>>>> Specifically, if one of `A` or `B` is escaped/captured, the
subtraction can
>>>> be used to escape or capture the other pointer.
>>>>
>>>
>>> Isn't escaping supposed to work at the "address
ranges" level and not at
>>> the pointer value?
>>> I mean that if `A` or `B` is escaped/captured, then any pointer
that is
>>> associated to the same memory range should be considered as
"escaped", and
>>> thus the subtraction does not seem to leak anything more to me.
>>>
>>
>> I believe this is true for subtracting "inbounds" (to borrow
the gep
>> terminology), but just as we support non inbounds GEP, we support non
>> imbounds subtracting. There it seems like this does escape the other
>> global. I know that in the past I've discussed this exact case with
>> nlewycky and he believed that to be the case, so I suspect quite a bit
of
>> LLVM is written under this model. No idea what would be the impact of
>> changing it beyond the ability to represent code like the example I
gave
>> earlier on the thread.
>>
>
> That does not match my current reading of LangRef (admittedly
> shallow/imperfect): not having inbounds allows you to form pointers that
> are outside of the memory range, however the "result value may not
> necessarily be used to access memory though, even if it happens to point
> into allocated storage".
> Also the "pointer aliasing rules" section mentions: "Any
memory access
> must be done through a pointer value associated with an address range of
> the memory access, otherwise the behavior is undefined".
>
I think there are still two issues here:

1) Escape of the address doesn't necessarily imply you can access the
memory. It can still be an escaped address. It may not *alias* the memory,
but that's a different question (confusingly). This means, for example,
that a global may still be address-taken even though it cannot be accessed.
2) This model does not play well with the way we manage things like CSE. If
we prove bit equivalence of pointers we will (AFAIK) fold from one to the
other even if one of them would make memory accesses well defined and the
other would make them undefined.

Maybe #2 has been fixed? I would honestly be surprised -- it's been a long
standing tension in compiler optimization that I'm aware of... Adding
+Sanjoy
Das <sanjoy at playingwithpointers.com> who I think has a lot of
experience
with these semantic parts of LLVM. Also +George Burgess <gbiv at
google.com>.

But what I really want to see is a careful and thorough examination of this
before we shift the representation of the IR to embed a fundamental
reliance on this property.

For example: do we have a sanitizer that verifies these properties? Has
anyone used that on significant application code?

-Chandler

>
>
>>
>>>
>>>
>>>> So *some* of the conservative treatment is necessary. What is
the plan
>>>> to update all the analyses to remain correct? What correctness
testing have
>>>> you done?
>>>>
>>>> Second - an intrinsic seems a poor fit here given the
significance of
>>>> this operation. We have an instruction that covers most pointer
arithmetic
>>>> (`getelementptr`), and I can imagine growing pointer
subtraction, but it
>>>> seems like it should be an instruction if we're going to
have it. Based on
>>>> the above, we will need to use it very often in analysis.
>>>>
>>>>
>>>> Regarding the instcombine, it should be very easy to keep loads
and
>>>> stores of pointers as pointer typed in instcombine. Likely just
a missing
>>>> case in the code I added/touched there.
>>>>
>>>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> This is a proposal for reducing # of ptrtoint/inttoptr
casts which are
>>>>> not
>>>>> written by programmers but rather generated by LLVM passes.
>>>>> Currently the majority of ptrtoint/inttoptr casts are
generated by
>>>>> LLVM;
>>>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018)
with -O3,
>>>>> the output IR contains 22,771 inttoptr instructions.
However, when
>>>>> compiling it with -O0, there are only 1048 inttoptrs,
meaning that
>>>>> 95.4%
>>>>> of them are generated by LLVM passes.
>>>>>
>>>>> This trend is similar in ptrtoint instruction as well. When
compiling
>>>>> SPEC 2017
>>>>> with -O0, there are 23,208 ptrtoint instructions, but among
them
>>>>> 22,016 (94.8%)
>>>>> are generated by Clang frontend to represent pointer
subtraction.
>>>>> They aren't effectively optimized out because there are
even more
>>>>> ptrtoints (31,721) after -O3.
>>>>> This is bad for performance because existence of ptrtoint
makes
>>>>> analysis return conservative
>>>>> result as a pointer can be escaped through the cast.
>>>>> Memory accesses to a pointer came from inttoptr is assumed
>>>>> to possibly access anywhere, therefore it may block
>>>>> store-to-load forwarding, merging two same loads, etc.
>>>>>
>>>>> I believe this can be addressed by applying two patches -
first one is
>>>>> representing pointer subtraction with a dedicated intrinsic
function,
>>>>> llvm.psub, and second one is disabling InstCombine
transformation
>>>>>
>>>>>     %q = load i8*, i8** %p1
>>>>>     store i8* %q, i8** %p2
>>>>> =>
>>>>>   %1 = bitcast i8** %p1 to i64*
>>>>>   %q1 = load i64, i64* %1, align 8
>>>>>   %2 = bitcast i8** %p2 to i64*
>>>>>   store i64 %q1, i64* %2, align 8
>>>>>
>>>>> This transformation can introduce inttoptrs later if loads
are
>>>>> followed (https://godbolt.org/z/wsZ3II ). Both are
discussed in
>>>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well.
>>>>> After llvm.psub is used & this transformation is
disabled, # of
>>>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and # of
ptrtoints
>>>>> decreases from 31,721 to 7,772 (24.5%).
>>>>>
>>>>> I'll introduce llvm.psub patch first.
>>>>>
>>>>>
>>>>> --- Adding llvm.psub ---
>>>>>
>>>>> By defining pointer subtraction intrinsic, we can get
performance gain
>>>>> because it gives more undefined behavior than just
subtracting two
>>>>> ptrtoints.
>>>>>
>>>>> Patch https://reviews.llvm.org/D56598 adds llvm.psub(p1,p2)
intrinsic
>>>>> function, which subtracts two pointers and returns the
difference. Its
>>>>> semantic is as follows.
>>>>> If p1 and p2 point to different objects, and neither of
them is based
>>>>> on a pointer casted from an integer, `llvm.psub(p1, p2)`
returns poison.
>>>>> For example,
>>>>>
>>>>> %p = alloca
>>>>> %q = alloca
>>>>> %i = llvm.psub(p, q) ; %i is poison
>>>>>
>>>>> This allows aggressive escape analysis on pointers. Given i
>>>>> llvm.psub(p1, p2), if neither of p1 and p2 is based on a
pointer casted
>>>>> from an integer, the llvm.psub call does not make p1 or p2
escape. (
>>>>> https://reviews.llvm.org/D56601 )
>>>>>
>>>>> If either p1 or p2 is based on a pointer casted from
integer, or p1
>>>>> and p2 point to a same object, it returns the result of
subtraction (in
>>>>> bytes); for example,
>>>>>
>>>>> %p = alloca
>>>>> %q = inttoptr %x
>>>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) -
%x
>>>>>
>>>>> `null` is regarded as a pointer casted from an integer
because
>>>>> it is equivalent to `inttoptr 0`.
>>>>>
>>>>> Adding llvm.psub allows LLVM to utilize significant portion
of
>>>>> ptrtoints & reduce a portion of inttoptrs. After
llvm.psub is used, when
>>>>> SPECrate 2017 is compiled with -O3, # of inttoptr decreases
to ~13,500
>>>>> (59%) and # of ptrtoint decreases to ~14,300 (45%).
>>>>>
>>>>> To see the performance change, I ran SPECrate 2017 (thread
# = 1) with
>>>>> three versions of LLVM, which are r313797 (Sep 21, 2017),
LLVM 6.0
>>>>> official, and r348082 (Dec 2, 2018).
>>>>> Running r313797 shows that 505.mcf_r has consistent 2.0%
speedup over
>>>>> 3 different machines (which are i3-6100, i5-6600, i7-7700).
For LLVM 6.0
>>>>> and r348082, there's neither consistent speedup nor
slowdown, but the
>>>>> average speedup is near 0. I believe there's still a
room of improvement
>>>>> because there are passes which are not aware of llvm.psub.
>>>>>
>>>>> Thank you for reading this, and any comment is welcome.
>>>>>
>>>>> Best Regards,
>>>>> Juneyoung Lee
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/9cca73d0/attachment.html>

Mehdi AMINI via llvm-dev

2019-Jan-15 01:53 UTC

head link

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

On Mon, Jan 14, 2019 at 5:44 PM Chandler Carruth <chandlerc at gmail.com>
wrote:
> On Mon, Jan 14, 2019 at 5:03 PM Mehdi AMINI <joker.eph at gmail.com>
wrote:
>
>>
>>
>> On Mon, Jan 14, 2019 at 4:51 PM Chandler Carruth <chandlerc at
gmail.com>
>> wrote:
>>
>>>
>>> On Mon, Jan 14, 2019, 15:59 Mehdi AMINI <joker.eph at gmail.com
wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 14, 2019 at 9:36 AM Chandler Carruth via llvm-dev
<
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> While I'm very interested in the end result here, I
have some
>>>>> questions that don't seem well answered yet around
pointer subtraction...
>>>>>
>>>>> First and foremost - how do you address correctness issues
here?
>>>>> Because the subtraction `A - B` can escape/capture more
things.
>>>>> Specifically, if one of `A` or `B` is escaped/captured, the
subtraction can
>>>>> be used to escape or capture the other pointer.
>>>>>
>>>>
>>>> Isn't escaping supposed to work at the "address
ranges" level and not
>>>> at the pointer value?
>>>> I mean that if `A` or `B` is escaped/captured, then any pointer
that is
>>>> associated to the same memory range should be considered as
"escaped", and
>>>> thus the subtraction does not seem to leak anything more to me.
>>>>
>>>
>>> I believe this is true for subtracting "inbounds" (to
borrow the gep
>>> terminology), but just as we support non inbounds GEP, we support
non
>>> imbounds subtracting. There it seems like this does escape the
other
>>> global. I know that in the past I've discussed this exact case
with
>>> nlewycky and he believed that to be the case, so I suspect quite a
bit of
>>> LLVM is written under this model. No idea what would be the impact
of
>>> changing it beyond the ability to represent code like the example I
gave
>>> earlier on the thread.
>>>
>>
>> That does not match my current reading of LangRef (admittedly
>> shallow/imperfect): not having inbounds allows you to form pointers
that
>> are outside of the memory range, however the "result value may not
>> necessarily be used to access memory though, even if it happens to
point
>> into allocated storage".
>> Also the "pointer aliasing rules" section mentions: "Any
memory access
>> must be done through a pointer value associated with an address range
of
>> the memory access, otherwise the behavior is undefined".
>>
>
> I think there are still two issues here:
>
> 1) Escape of the address doesn't necessarily imply you can access the
> memory. It can still be an escaped address. It may not *alias* the memory,
> but that's a different question (confusingly). This means, for example,
> that a global may still be address-taken even though it cannot be accessed.
>
I suspect we are out of the LangRef domain, but I'm surprised by this
definition. Why is it useful to consider such address as "escaping"?
I'm not an expert on the literature, but I have always encountered the
notion of escape analysis in relation with aliasing: it isn't the value of
the pointer that is interesting to consider "escaped" but the memory
it is
pointing to.

-- 
Mehdi



> 2) This model does not play well with the way we manage things like CSE.
> If we prove bit equivalence of pointers we will (AFAIK) fold from one to
> the other even if one of them would make memory accesses well defined and
> the other would make them undefined.
>
> Maybe #2 has been fixed? I would honestly be surprised -- it's been a
long
> standing tension in compiler optimization that I'm aware of... Adding
+Sanjoy
> Das <sanjoy at playingwithpointers.com> who I think has a lot of
experience
> with these semantic parts of LLVM. Also +George Burgess <gbiv at
google.com>.
>
> But what I really want to see is a careful and thorough examination of
> this before we shift the representation of the IR to embed a fundamental
> reliance on this property.
>
> For example: do we have a sanitizer that verifies these properties? Has
> anyone used that on significant application code?
>
> -Chandler
>
>
>>
>>
>>>
>>>>
>>>>
>>>>> So *some* of the conservative treatment is necessary. What
is the plan
>>>>> to update all the analyses to remain correct? What
correctness testing have
>>>>> you done?
>>>>>
>>>>> Second - an intrinsic seems a poor fit here given the
significance of
>>>>> this operation. We have an instruction that covers most
pointer arithmetic
>>>>> (`getelementptr`), and I can imagine growing pointer
subtraction, but it
>>>>> seems like it should be an instruction if we're going
to have it. Based on
>>>>> the above, we will need to use it very often in analysis.
>>>>>
>>>>>
>>>>> Regarding the instcombine, it should be very easy to keep
loads and
>>>>> stores of pointers as pointer typed in instcombine. Likely
just a missing
>>>>> case in the code I added/touched there.
>>>>>
>>>>> On Mon, Jan 14, 2019 at 3:23 AM Juneyoung Lee via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> This is a proposal for reducing # of ptrtoint/inttoptr
casts which
>>>>>> are not
>>>>>> written by programmers but rather generated by LLVM
passes.
>>>>>> Currently the majority of ptrtoint/inttoptr casts are
generated by
>>>>>> LLVM;
>>>>>> when compiling SPEC 2017 with LLVM r348082 (Dec 2 2018)
with -O3,
>>>>>> the output IR contains 22,771 inttoptr instructions.
However, when
>>>>>> compiling it with -O0, there are only 1048 inttoptrs,
meaning that
>>>>>> 95.4%
>>>>>> of them are generated by LLVM passes.
>>>>>>
>>>>>> This trend is similar in ptrtoint instruction as well.
When compiling
>>>>>> SPEC 2017
>>>>>> with -O0, there are 23,208 ptrtoint instructions, but
among them
>>>>>> 22,016 (94.8%)
>>>>>> are generated by Clang frontend to represent pointer
subtraction.
>>>>>> They aren't effectively optimized out because there
are even more
>>>>>> ptrtoints (31,721) after -O3.
>>>>>> This is bad for performance because existence of
ptrtoint makes
>>>>>> analysis return conservative
>>>>>> result as a pointer can be escaped through the cast.
>>>>>> Memory accesses to a pointer came from inttoptr is
assumed
>>>>>> to possibly access anywhere, therefore it may block
>>>>>> store-to-load forwarding, merging two same loads, etc.
>>>>>>
>>>>>> I believe this can be addressed by applying two patches
- first one
>>>>>> is representing pointer subtraction with a dedicated
intrinsic function,
>>>>>> llvm.psub, and second one is disabling InstCombine
transformation
>>>>>>
>>>>>>     %q = load i8*, i8** %p1
>>>>>>     store i8* %q, i8** %p2
>>>>>> =>
>>>>>>   %1 = bitcast i8** %p1 to i64*
>>>>>>   %q1 = load i64, i64* %1, align 8
>>>>>>   %2 = bitcast i8** %p2 to i64*
>>>>>>   store i64 %q1, i64* %2, align 8
>>>>>>
>>>>>> This transformation can introduce inttoptrs later if
loads are
>>>>>> followed (https://godbolt.org/z/wsZ3II ). Both are
discussed in
>>>>>> https://bugs.llvm.org/show_bug.cgi?id=39846 as well.
>>>>>> After llvm.psub is used & this transformation is
disabled, # of
>>>>>> inttoptrs decreases from 22,771 to 1,565 (6.9%), and #
of ptrtoints
>>>>>> decreases from 31,721 to 7,772 (24.5%).
>>>>>>
>>>>>> I'll introduce llvm.psub patch first.
>>>>>>
>>>>>>
>>>>>> --- Adding llvm.psub ---
>>>>>>
>>>>>> By defining pointer subtraction intrinsic, we can get
performance
>>>>>> gain because it gives more undefined behavior than just
subtracting two
>>>>>> ptrtoints.
>>>>>>
>>>>>> Patch https://reviews.llvm.org/D56598 adds
llvm.psub(p1,p2)
>>>>>> intrinsic function, which subtracts two pointers and
returns the
>>>>>> difference. Its semantic is as follows.
>>>>>> If p1 and p2 point to different objects, and neither of
them is based
>>>>>> on a pointer casted from an integer, `llvm.psub(p1,
p2)` returns poison.
>>>>>> For example,
>>>>>>
>>>>>> %p = alloca
>>>>>> %q = alloca
>>>>>> %i = llvm.psub(p, q) ; %i is poison
>>>>>>
>>>>>> This allows aggressive escape analysis on pointers.
Given i >>>>>> llvm.psub(p1, p2), if neither of p1 and p2 is
based on a pointer casted
>>>>>> from an integer, the llvm.psub call does not make p1 or
p2 escape. (
>>>>>> https://reviews.llvm.org/D56601 )
>>>>>>
>>>>>> If either p1 or p2 is based on a pointer casted from
integer, or p1
>>>>>> and p2 point to a same object, it returns the result of
subtraction (in
>>>>>> bytes); for example,
>>>>>>
>>>>>> %p = alloca
>>>>>> %q = inttoptr %x
>>>>>> %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint
%p) - %x
>>>>>>
>>>>>> `null` is regarded as a pointer casted from an integer
because
>>>>>> it is equivalent to `inttoptr 0`.
>>>>>>
>>>>>> Adding llvm.psub allows LLVM to utilize significant
portion of
>>>>>> ptrtoints & reduce a portion of inttoptrs. After
llvm.psub is used, when
>>>>>> SPECrate 2017 is compiled with -O3, # of inttoptr
decreases to ~13,500
>>>>>> (59%) and # of ptrtoint decreases to ~14,300 (45%).
>>>>>>
>>>>>> To see the performance change, I ran SPECrate 2017
(thread # = 1)
>>>>>> with three versions of LLVM, which are r313797 (Sep 21,
2017), LLVM 6.0
>>>>>> official, and r348082 (Dec 2, 2018).
>>>>>> Running r313797 shows that 505.mcf_r has consistent
2.0% speedup over
>>>>>> 3 different machines (which are i3-6100, i5-6600,
i7-7700). For LLVM 6.0
>>>>>> and r348082, there's neither consistent speedup nor
slowdown, but the
>>>>>> average speedup is near 0. I believe there's still
a room of improvement
>>>>>> because there are passes which are not aware of
llvm.psub.
>>>>>>
>>>>>> Thank you for reading this, and any comment is welcome.
>>>>>>
>>>>>> Best Regards,
>>>>>> Juneyoung Lee
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190114/67bd1763/attachment.html>

llvm dev - Jan 2019 - Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM

[llvm-dev] Reducing the number of ptrtoint/inttoptrs that are generated by LLVM