thr3ads.net - llvm dev - [llvm-dev] RFC: LLD range extension thunks [Jan 2017]

If this information is useful, please help other people find it:
Share via:

Sean Silva via llvm-dev

2017-Jan-06 22:33 UTC

[llvm-dev] RFC: LLD range extension thunks

After looking at this for a while, I do not think that this problem is
NP-hard. With a finite "short branch" displacement k, I was not able
to
come up with a gadget that could create global constraints as would be
needed to e.g. model an instance of 3SAT or vertex cover in terms of this
problem.

The problem is hard though. I believe that it is likely to be exponential
in the "short branch" displacement k, and k is typically "pretty
big".

-- Sean Silva

On Fri, Jan 6, 2017 at 1:12 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Fri, Jan 6, 2017 at 12:41 AM, Bruce Hoult via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at
linaro.org>
>>> wrote:
>>>
>>>> Hello Rui,
>>>>
>>>> Thanks for the comments
>>>>
>>>> - Synthetic sections and rewriting relocations
>>>> I think that this would definitely be worth trying. It should
remove
>>>> the need for thunks to be represented in the core data
structures, and
>>>> would allow .
>>>>
>>>
>>> Creating symbols for thunks would have another benefit: it makes
>>> disassembled output easier to read because thunks have names.
>>>
>>>
>>>> It would also mean that we wouldn't have to associate
symbols with
>>>> thunks as the relocations would directly target the thunks. ARM
>>>> interworking makes reusing thunks more difficult as not every
thunk is
>>>> compatible with every caller. For example:
>>>> ARM B target and Thumb2 B.W target can't reuse the same
thunk even if
>>>> in range as the branch instruction can't change state.
>>>>
>>>> I think it is worth an experiment to make the existing
implementation
>>>> of thunks use synthetic sections and rewriting relocations
before
>>>> trying to implement range extension thunks.
>>>>
>>>> - Yes the scan is linear it is essentially:
>>>> do
>>>>     assign addresses to input sections
>>>>     for each relocation
>>>>         if (thunk needed)
>>>>             create thunk or reuse existing one
>>>> while (no more thunks added)
>>>>
>>>> There's quite a lot of complexity that can be added with
respect to
>>>> the placement of thunks within the output section. For example
if
>>>> there is a caller with a low address and a caller with a high
address,
>>>> both might be able to reuse a thunk placed in the middle. I
think it
>>>> is worth starting simple though.
>>>
>>>
>>> I agree. I believe that computing the best thunk positions is
NP-hard,
>>> but the best layout and a layout produced by a naive algorithm
wouldn't be
>>> that different.
>>>
>>
>> Correct conclusion, but there's no way the problem is NP.
>>
>> Reordering functions (or even individual basic blocks) to minimize the
>> needed thunks is a complex problem.
>>
>> But you're not doing that. Once an ordering is selected a simple
greedy
>> algorithm is optimal.
>>
>> There is no cost difference between a thunk that is right next to the
>> short jump and a thunk that is only juuust within range. So you can
find
>> the lowest address jump needing a thunk to a particular target and put
the
>> thunk the maximum possible distance after it (after the end of a
function,
>> or even after any unconditional branch). Find everything else within
range
>> of that thunk and fix it up. Repeat.
>>
>
> I don't think this analysis is correct. Assume a 1M branch displacement
> for short jumps. Consider:
>
> secA: 512K (contains a jump "jumpA" at offset 0 that needs a
thunk (it
> doesn't matter where it needs to jump to, just that it definitely needs
a
> thunk))
> secB: 512K
> secC: 512K (contains a jump "jumpC" at offset 0 that jumps to
offset 0 in
> secA (i.e., just barely in range of a short jump))
>
> If the thunk for jumpA is placed between secB and secC (as it would be
> based on your description) it will push the branch at the beginning of secC
> out of range, forcing another thunk to be needed. In this small example,
> the thunk for jumpA must be placed before secA in order to avoid needing a
> thunk for jumpC. In other words, placing thunks can cause you to need even
> more thunks.
>
> -- Sean Silva
>
>
>
>>
>> Other algorithms will give smaller average displacements to the thunks,
>> but there is no advantage in that. No other algorithm will generate
fewer
>> thunks.
>>
>> That's assuming all short branches have the same code size and
>> displacement distance.
>>
>> If there are multiple branch distances and code sizes (and you have a
>> choice between them at given call sites) then it's still just a
simple
>> dynamic programming problem, solvable in linear [1] time by trying each
>> branch size at the first available call site, with a cache of the
minimum
>> cost assuming the first 0, 1, 2 .. N call sites have already been
covered.
>>
>> [1] or at least nCallSites * nBranchSizes
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/507ffbfa/attachment.html>

Mehdi Amini via llvm-dev

2017-Jan-06 23:12 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

Isn’t it the same problem as what ARMConstantIslandPass is trying to address?

— 
Mehdi
> On Jan 6, 2017, at 2:33 PM, Sean Silva via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> After looking at this for a while, I do not think that this problem is
NP-hard. With a finite "short branch" displacement k, I was not able
to come up with a gadget that could create global constraints as would be needed
to e.g. model an instance of 3SAT or vertex cover in terms of this problem.
> 
> The problem is hard though. I believe that it is likely to be exponential
in the "short branch" displacement k, and k is typically "pretty
big".
> 
> -- Sean Silva
> 
> On Fri, Jan 6, 2017 at 1:12 PM, Sean Silva <chisophugis at gmail.com
<mailto:chisophugis at gmail.com>> wrote:
> 
> 
> On Fri, Jan 6, 2017 at 12:41 AM, Bruce Hoult via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> 
> On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at linaro.org
<mailto:peter.smith at linaro.org>> wrote:
> Hello Rui,
> 
> Thanks for the comments
> 
> - Synthetic sections and rewriting relocations
> I think that this would definitely be worth trying. It should remove
> the need for thunks to be represented in the core data structures, and
> would allow .
> 
> Creating symbols for thunks would have another benefit: it makes
disassembled output easier to read because thunks have names.
>  
> It would also mean that we wouldn't have to associate symbols with
> thunks as the relocations would directly target the thunks. ARM
> interworking makes reusing thunks more difficult as not every thunk is
> compatible with every caller. For example:
> ARM B target and Thumb2 B.W target can't reuse the same thunk even if
> in range as the branch instruction can't change state.
> 
> I think it is worth an experiment to make the existing implementation
> of thunks use synthetic sections and rewriting relocations before
> trying to implement range extension thunks.
> 
> - Yes the scan is linear it is essentially:
> do
>     assign addresses to input sections
>     for each relocation
>         if (thunk needed)
>             create thunk or reuse existing one
> while (no more thunks added)
> 
> There's quite a lot of complexity that can be added with respect to
> the placement of thunks within the output section. For example if
> there is a caller with a low address and a caller with a high address,
> both might be able to reuse a thunk placed in the middle. I think it
> is worth starting simple though.
> 
> I agree. I believe that computing the best thunk positions is NP-hard, but
the best layout and a layout produced by a naive algorithm wouldn't be that
different.
> 
> Correct conclusion, but there's no way the problem is NP.
> 
> Reordering functions (or even individual basic blocks) to minimize the
needed thunks is a complex problem.
> 
> But you're not doing that. Once an ordering is selected a simple greedy
algorithm is optimal.
> 
> There is no cost difference between a thunk that is right next to the short
jump and a thunk that is only juuust within range. So you can find the lowest
address jump needing a thunk to a particular target and put the thunk the
maximum possible distance after it (after the end of a function, or even after
any unconditional branch). Find everything else within range of that thunk and
fix it up. Repeat.
> 
> I don't think this analysis is correct. Assume a 1M branch displacement
for short jumps. Consider:
> 
> secA: 512K (contains a jump "jumpA" at offset 0 that needs a
thunk (it doesn't matter where it needs to jump to, just that it definitely
needs a thunk))
> secB: 512K
> secC: 512K (contains a jump "jumpC" at offset 0 that jumps to
offset 0 in secA (i.e., just barely in range of a short jump))
> 
> If the thunk for jumpA is placed between secB and secC (as it would be
based on your description) it will push the branch at the beginning of secC out
of range, forcing another thunk to be needed. In this small example, the thunk
for jumpA must be placed before secA in order to avoid needing a thunk for
jumpC. In other words, placing thunks can cause you to need even more thunks.
> 
> -- Sean Silva
> 
>  
> 
> Other algorithms will give smaller average displacements to the thunks, but
there is no advantage in that. No other algorithm will generate fewer thunks.
> 
> That's assuming all short branches have the same code size and
displacement distance.
> 
> If there are multiple branch distances and code sizes (and you have a
choice between them at given call sites) then it's still just a simple
dynamic programming problem, solvable in linear [1] time by trying each branch
size at the first available call site, with a cache of the minimum cost assuming
the first 0, 1, 2 .. N call sites have already been covered.
> 
> [1] or at least nCallSites * nBranchSizes
>  
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/b988ee3e/attachment.html>

Sean Silva via llvm-dev

2017-Jan-06 23:23 UTC

head link

[llvm-dev] RFC: LLD range extension thunks

I think so. It uses essentially the same iterative approach being discussed
in this thread.

On Fri, Jan 6, 2017 at 3:12 PM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
> Isn’t it the same problem as what ARMConstantIslandPass is trying to
> address?
>
> —
> Mehdi
>
> On Jan 6, 2017, at 2:33 PM, Sean Silva via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> After looking at this for a while, I do not think that this problem is
> NP-hard. With a finite "short branch" displacement k, I was not
able to
> come up with a gadget that could create global constraints as would be
> needed to e.g. model an instance of 3SAT or vertex cover in terms of this
> problem.
>
> The problem is hard though. I believe that it is likely to be exponential
> in the "short branch" displacement k, and k is typically
"pretty big".
>
> -- Sean Silva
>
> On Fri, Jan 6, 2017 at 1:12 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>>
>>
>> On Fri, Jan 6, 2017 at 12:41 AM, Bruce Hoult via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Fri, Jan 6, 2017 at 6:21 AM, Rui Ueyama via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Thu, Jan 5, 2017 at 8:15 PM, Peter Smith <peter.smith at
linaro.org>
>>>> wrote:
>>>>
>>>>> Hello Rui,
>>>>>
>>>>> Thanks for the comments
>>>>>
>>>>> - Synthetic sections and rewriting relocations
>>>>> I think that this would definitely be worth trying. It
should remove
>>>>> the need for thunks to be represented in the core data
structures, and
>>>>> would allow .
>>>>>
>>>>
>>>> Creating symbols for thunks would have another benefit: it
makes
>>>> disassembled output easier to read because thunks have names.
>>>>
>>>>
>>>>> It would also mean that we wouldn't have to associate
symbols with
>>>>> thunks as the relocations would directly target the thunks.
ARM
>>>>> interworking makes reusing thunks more difficult as not
every thunk is
>>>>> compatible with every caller. For example:
>>>>> ARM B target and Thumb2 B.W target can't reuse the same
thunk even if
>>>>> in range as the branch instruction can't change state.
>>>>>
>>>>> I think it is worth an experiment to make the existing
implementation
>>>>> of thunks use synthetic sections and rewriting relocations
before
>>>>> trying to implement range extension thunks.
>>>>>
>>>>> - Yes the scan is linear it is essentially:
>>>>> do
>>>>>     assign addresses to input sections
>>>>>     for each relocation
>>>>>         if (thunk needed)
>>>>>             create thunk or reuse existing one
>>>>> while (no more thunks added)
>>>>>
>>>>> There's quite a lot of complexity that can be added
with respect to
>>>>> the placement of thunks within the output section. For
example if
>>>>> there is a caller with a low address and a caller with a
high address,
>>>>> both might be able to reuse a thunk placed in the middle. I
think it
>>>>> is worth starting simple though.
>>>>
>>>>
>>>> I agree. I believe that computing the best thunk positions is
NP-hard,
>>>> but the best layout and a layout produced by a naive algorithm
wouldn't be
>>>> that different.
>>>>
>>>
>>> Correct conclusion, but there's no way the problem is NP.
>>>
>>> Reordering functions (or even individual basic blocks) to minimize
the
>>> needed thunks is a complex problem.
>>>
>>> But you're not doing that. Once an ordering is selected a
simple greedy
>>> algorithm is optimal.
>>>
>>> There is no cost difference between a thunk that is right next to
the
>>> short jump and a thunk that is only juuust within range. So you can
find
>>> the lowest address jump needing a thunk to a particular target and
put the
>>> thunk the maximum possible distance after it (after the end of a
function,
>>> or even after any unconditional branch). Find everything else
within range
>>> of that thunk and fix it up. Repeat.
>>>
>>
>> I don't think this analysis is correct. Assume a 1M branch
displacement
>> for short jumps. Consider:
>>
>> secA: 512K (contains a jump "jumpA" at offset 0 that needs a
thunk (it
>> doesn't matter where it needs to jump to, just that it definitely
needs a
>> thunk))
>> secB: 512K
>> secC: 512K (contains a jump "jumpC" at offset 0 that jumps to
offset 0 in
>> secA (i.e., just barely in range of a short jump))
>>
>> If the thunk for jumpA is placed between secB and secC (as it would be
>> based on your description) it will push the branch at the beginning of
secC
>> out of range, forcing another thunk to be needed. In this small
example,
>> the thunk for jumpA must be placed before secA in order to avoid
needing a
>> thunk for jumpC. In other words, placing thunks can cause you to need
even
>> more thunks.
>>
>> -- Sean Silva
>>
>>
>>
>>>
>>> Other algorithms will give smaller average displacements to the
thunks,
>>> but there is no advantage in that. No other algorithm will generate
fewer
>>> thunks.
>>>
>>> That's assuming all short branches have the same code size and
>>> displacement distance.
>>>
>>> If there are multiple branch distances and code sizes (and you have
a
>>> choice between them at given call sites) then it's still just a
simple
>>> dynamic programming problem, solvable in linear [1] time by trying
each
>>> branch size at the first available call site, with a cache of the
minimum
>>> cost assuming the first 0, 1, 2 .. N call sites have already been
covered.
>>>
>>> [1] or at least nCallSites * nBranchSizes
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170106/800b9c2e/attachment.html>

llvm dev - Jan 2017 - RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks

[llvm-dev] RFC: LLD range extension thunks