thr3ads.net - llvm dev - [llvm-dev] [GSoC 2016] Code Generation Improvements task [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Tim Northover via llvm-dev

2016-Mar-01 04:53 UTC

[llvm-dev] [GSoC 2016] Code Generation Improvements task

Hi Vivek,

(Mostly responding with AArch64 hints, though anything I happen to
know from elsewhere too).

On 29 February 2016 at 13:00, vivek pandya via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>  2. lib/Target/AArch64/AArch64AddressTypePromotion.cpp
> As far as I understand this pass promotes sign exertion for 32 bit integer
(
> address) and performs calculation on 64 bit number thus processes need not
> switch execution mode to 32 bit.
Switching execution mode isn't an option on AArch64 (it can only
happen with OS support and never happens within a single process on a
sane OS).

This pass is more a matter of putting the IR in a form that precisely
matches the addressing modes that are actually available. AArch64 can
encode addresses like "base64 + sext(offset32)" into the actual
load/store instruction so it's advantageous to put the sext as close
as possible to the pointer dereference.

I'm afraid I don't really know enough about other architectures to say
which could benefit. It's obviously only beneficial if they have the
addressing modes to support it.
> 3. lib/Target/AArch64/AArch64PromoteConstant.cpp
> This pass tries to simplify aggregate data like struct of const with
special
> SIMD instructions available on the system. For example on ARM its NEON
> similarly other architectures have SIMD support specifically MIPS, IBM
> System Z, Power PC with MMX/AltiVee and x86 with Intel’s AVX.
Possibly. It seems to rely pretty strongly on ARM's "load more than
you can actually use" instructions: vldN instructions can load up to 4
128-bit vectors, but they can still only be used as 128-bit vectors.
If other targets possess similar, then they could well benefit; if
not, then it's probably pointless.
> I have question regarding Target hooks. Does it means using TargetInfo an
> SubTargetInfo class and at runtime decide architecture type and based on
> that perform optimization ( i.e use target specific instructions ) ?
I think they more normally live in TargetTransformInfo.
> Please help me ! Am I going in right direction ? Suggest some code ,
> document to look for further ideas. Also if any one like to mentor me for
> this project.
It sounds like a plausible direction, but documentation is always
lacking in these kinds of things.

As a complete outsider to targets with delay slots, merging their
logic sounds like a nice improvement to me (especially as Lanai is
probably incoming as another ISA that has decided delay slots are a
good idea). But (also as an outsider) I have no idea how practical
that really is.

Cheers.

Tim.

vivek pandya via llvm-dev

2016-Mar-02 20:34 UTC

head link

[llvm-dev] [GSoC 2016] Code Generation Improvements task

*Vivek Pandya*


On Tue, Mar 1, 2016 at 10:56 PM, vivek pandya <vivekvpandya at gmail.com>
wrote:
>
>
> *Vivek Pandya*
>
>
> On Tue, Mar 1, 2016 at 10:23 AM, Tim Northover <t.p.northover at
gmail.com>
> wrote:
>
>> Hi Vivek,
>>
>> (Mostly responding with AArch64 hints, though anything I happen to
>> know from elsewhere too).
>>
>> On 29 February 2016 at 13:00, vivek pandya via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >  2. lib/Target/AArch64/AArch64AddressTypePromotion.cpp
>> > As far as I understand this pass promotes sign exertion for 32 bit
>> integer (
>> > address) and performs calculation on 64 bit number thus processes
need
>> not
>> > switch execution mode to 32 bit.
>>
>> Switching execution mode isn't an option on AArch64 (it can only
>> happen with OS support and never happens within a single process on a
>> sane OS).
>>
>> This pass is more a matter of putting the IR in a form that precisely
>> matches the addressing modes that are actually available. AArch64 can
>> encode addresses like "base64 + sext(offset32)" into the
actual
>> load/store instruction so it's advantageous to put the sext as
close
>> as possible to the pointer dereference.
>>
>> I'm afraid I don't really know enough about other architectures
to say
>> which could benefit. It's obviously only beneficial if they have
the
>> addressing modes to support it.
>>
> I did some research for other architectures which has similar addressingmode:
MIPS :
Base Addressing : sext(Immediate 16 bit value) + 32 / 64 bit register value
PC Relative : PC value + sext(Immediate 16 bit value << 2 bit)

ARM : ARM has immediate addressing but offset is used as singned or zero
extedned is to be determined.

Power PC:  Register Indirect with Immediate Index Addressing for Integer
Loads and Store

It seems that most of the architecture which supports immediate offset are
required to use sext to preserve sign before adding them to register value.

So this pass seems to be useful for other architecture.

- Vivek
>
>> > 3. lib/Target/AArch64/AArch64PromoteConstant.cpp
>> > This pass tries to simplify aggregate data like struct of const
with
>> special
>> > SIMD instructions available on the system. For example on ARM its
NEON
>> > similarly other architectures have SIMD support specifically MIPS,
IBM
>> > System Z, Power PC with MMX/AltiVee and x86 with Intel’s AVX.
>>
>> Possibly. It seems to rely pretty strongly on ARM's "load more
than
>> you can actually use" instructions: vldN instructions can load up
to 4
>> 128-bit vectors, but they can still only be used as 128-bit vectors.
>> If other targets possess similar, then they could well benefit; if
>> not, then it's probably pointless.
>>
>> > I have question regarding Target hooks. Does it means using
TargetInfo
>> an
>> > SubTargetInfo class and at runtime decide architecture type and
based on
>> > that perform optimization ( i.e use target specific instructions )
?
>>
>> I think they more normally live in TargetTransformInfo.
>>
>> > Please help me ! Am I going in right direction ? Suggest some code
,
>> > document to look for further ideas. Also if any one like to mentor
me
>> for
>> > this project.
>>
>> It sounds like a plausible direction, but documentation is always
>> lacking in these kinds of things.
>>
>> As a complete outsider to targets with delay slots, merging their
>> logic sounds like a nice improvement to me (especially as Lanai is
>> probably incoming as another ISA that has decided delay slots are a
>> good idea). But (also as an outsider) I have no idea how practical
>> that really is.
>>
>
> Thanks Tim for providing more insights, I would gather more information in
> given direction. Further more here mentioned 3 tasks may be not a much work
> for some one who has a good grasp on llvm but for me it may be sufficient
> for GSoC duration. It may not be possible for Google to provide fundings
> for limited number of improvements. So I am thinking to include some TODOs
> in StackColoring.cpp and StackSlotColoring.cpp in proposal too. Will it be
> enough to demonstrate in proposal ?
>
> Still I am looking for feedback on RDF part and also if some one is
> willing to mentor me.
>
> Sincerely,
> Vivek
>
>
>> Cheers.
>>
>> Tim.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160303/8ae53fd7/attachment.html>

Krzysztof Parzyszek via llvm-dev

2016-Mar-23 16:58 UTC

head link

[llvm-dev] [GSoC 2016] Code Generation Improvements task

On 3/1/2016 11:26 AM, vivek pandya via llvm-dev wrote:>
> Still I am looking for feedback on RDF part and also if some one is
> willing to mentor me.
Hi Vivek,
Sorry, I missed this email.  I wrote the RDF stuff and I'd be happy to 
help you out with it if you are interested.

The idea was to have a utility class that would represent the data flow 
between registers.  The registers could be a mixture of virtual and 
physical, although the main application would be to use it on a post-RA 
code.  I decided against having it as a part of the pass manager, 
because the user does not have any direct control over the creation and 
invalidation of analyses, at least in the current version of the pass 
manager.  This does not mean that it cannot (or shouldn't) be used in an 
analysis, just that it should also be available as a standalone utility.

The missing bits are:

1. Handling of regmasks
This shouldn't be too hard.  All reference nodes (except those in phi 
nodes) have a pointer to the machine operand, from which the actual 
register is obtained.  Regmasks are different, since a single operand 
references multiple registers at once.  The way to handle them would be 
to treat a regmask as a register of its own that is aliased with the 
registers, whose clobbering it represents.

2. Recomputing liveness information on instruction level.
The MI-level IR uses implicit operands to keep track of the liveness of 
aliased registers. These implicit operands serve no other purpose, but 
they may introduce apparent dependencies (that do not, in fact exist). 
RDF will ignore these implicit operands when constructing the DFG, and 
optimizations using RDF could produce code where the liveness 
information carried by these operands is no longer valid (the same goes 
for <kill> flags).  This information would need to be recomputed.  There 
is some code in there that does that for the <kill> flags, but it does 
not deal with the implicit operands at all.

3. Making it work with ther targets.
RDF is intended to handle code that contains both physical and virtual 
registers on any target, but it has only been tested (in some capacity) 
on post-RA code and only on Hexagon.  Making it fully target-independent 
would involve testing it with other targets.
- There are "copy propagation" and "dead code elimination"
passes that
use RDF.  Both are also meant to be target-independent and could serve 
as a testing tool.
- RDF liveness would need to be verified to work on other targets.  It 
is meant to recalculate block live-ins.

4. It is unknown what RDF will do with bundles.
In theory, it should use the summary information from each bundle 
(without looking inside of bundles), but I have no idea whether there 
are any cases that would break it.  There is nothing to represent the 
data flow within a bundle: besides not having any representation for it 
now, the actual data flow there may be highly target-dependent.  This is 
more of a hypothetical question, at least for now, since it may be 
fairly complex to design and implement.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Mar 2016 - [GSoC 2016] Code Generation Improvements task

[llvm-dev] [GSoC 2016] Code Generation Improvements task

[llvm-dev] [GSoC 2016] Code Generation Improvements task

[llvm-dev] [GSoC 2016] Code Generation Improvements task

Seemingly Similar Threads