Tim Northover via llvm-dev
2016-Mar-01 04:53 UTC
[llvm-dev] [GSoC 2016] Code Generation Improvements task
Hi Vivek, (Mostly responding with AArch64 hints, though anything I happen to know from elsewhere too). On 29 February 2016 at 13:00, vivek pandya via llvm-dev <llvm-dev at lists.llvm.org> wrote:> 2. lib/Target/AArch64/AArch64AddressTypePromotion.cpp > As far as I understand this pass promotes sign exertion for 32 bit integer ( > address) and performs calculation on 64 bit number thus processes need not > switch execution mode to 32 bit.Switching execution mode isn't an option on AArch64 (it can only happen with OS support and never happens within a single process on a sane OS). This pass is more a matter of putting the IR in a form that precisely matches the addressing modes that are actually available. AArch64 can encode addresses like "base64 + sext(offset32)" into the actual load/store instruction so it's advantageous to put the sext as close as possible to the pointer dereference. I'm afraid I don't really know enough about other architectures to say which could benefit. It's obviously only beneficial if they have the addressing modes to support it.> 3. lib/Target/AArch64/AArch64PromoteConstant.cpp > This pass tries to simplify aggregate data like struct of const with special > SIMD instructions available on the system. For example on ARM its NEON > similarly other architectures have SIMD support specifically MIPS, IBM > System Z, Power PC with MMX/AltiVee and x86 with Intel’s AVX.Possibly. It seems to rely pretty strongly on ARM's "load more than you can actually use" instructions: vldN instructions can load up to 4 128-bit vectors, but they can still only be used as 128-bit vectors. If other targets possess similar, then they could well benefit; if not, then it's probably pointless.> I have question regarding Target hooks. Does it means using TargetInfo an > SubTargetInfo class and at runtime decide architecture type and based on > that perform optimization ( i.e use target specific instructions ) ?I think they more normally live in TargetTransformInfo.> Please help me ! Am I going in right direction ? Suggest some code , > document to look for further ideas. Also if any one like to mentor me for > this project.It sounds like a plausible direction, but documentation is always lacking in these kinds of things. As a complete outsider to targets with delay slots, merging their logic sounds like a nice improvement to me (especially as Lanai is probably incoming as another ISA that has decided delay slots are a good idea). But (also as an outsider) I have no idea how practical that really is. Cheers. Tim.
vivek pandya via llvm-dev
2016-Mar-02 20:34 UTC
[llvm-dev] [GSoC 2016] Code Generation Improvements task
*Vivek Pandya* On Tue, Mar 1, 2016 at 10:56 PM, vivek pandya <vivekvpandya at gmail.com> wrote:> > > *Vivek Pandya* > > > On Tue, Mar 1, 2016 at 10:23 AM, Tim Northover <t.p.northover at gmail.com> > wrote: > >> Hi Vivek, >> >> (Mostly responding with AArch64 hints, though anything I happen to >> know from elsewhere too). >> >> On 29 February 2016 at 13:00, vivek pandya via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > 2. lib/Target/AArch64/AArch64AddressTypePromotion.cpp >> > As far as I understand this pass promotes sign exertion for 32 bit >> integer ( >> > address) and performs calculation on 64 bit number thus processes need >> not >> > switch execution mode to 32 bit. >> >> Switching execution mode isn't an option on AArch64 (it can only >> happen with OS support and never happens within a single process on a >> sane OS). >> >> This pass is more a matter of putting the IR in a form that precisely >> matches the addressing modes that are actually available. AArch64 can >> encode addresses like "base64 + sext(offset32)" into the actual >> load/store instruction so it's advantageous to put the sext as close >> as possible to the pointer dereference. >> >> I'm afraid I don't really know enough about other architectures to say >> which could benefit. It's obviously only beneficial if they have the >> addressing modes to support it. >> > I did some research for other architectures which has similar addressingmode: MIPS : Base Addressing : sext(Immediate 16 bit value) + 32 / 64 bit register value PC Relative : PC value + sext(Immediate 16 bit value << 2 bit) ARM : ARM has immediate addressing but offset is used as singned or zero extedned is to be determined. Power PC: Register Indirect with Immediate Index Addressing for Integer Loads and Store It seems that most of the architecture which supports immediate offset are required to use sext to preserve sign before adding them to register value. So this pass seems to be useful for other architecture. - Vivek> >> > 3. lib/Target/AArch64/AArch64PromoteConstant.cpp >> > This pass tries to simplify aggregate data like struct of const with >> special >> > SIMD instructions available on the system. For example on ARM its NEON >> > similarly other architectures have SIMD support specifically MIPS, IBM >> > System Z, Power PC with MMX/AltiVee and x86 with Intel’s AVX. >> >> Possibly. It seems to rely pretty strongly on ARM's "load more than >> you can actually use" instructions: vldN instructions can load up to 4 >> 128-bit vectors, but they can still only be used as 128-bit vectors. >> If other targets possess similar, then they could well benefit; if >> not, then it's probably pointless. >> >> > I have question regarding Target hooks. Does it means using TargetInfo >> an >> > SubTargetInfo class and at runtime decide architecture type and based on >> > that perform optimization ( i.e use target specific instructions ) ? >> >> I think they more normally live in TargetTransformInfo. >> >> > Please help me ! Am I going in right direction ? Suggest some code , >> > document to look for further ideas. Also if any one like to mentor me >> for >> > this project. >> >> It sounds like a plausible direction, but documentation is always >> lacking in these kinds of things. >> >> As a complete outsider to targets with delay slots, merging their >> logic sounds like a nice improvement to me (especially as Lanai is >> probably incoming as another ISA that has decided delay slots are a >> good idea). But (also as an outsider) I have no idea how practical >> that really is. >> > > Thanks Tim for providing more insights, I would gather more information in > given direction. Further more here mentioned 3 tasks may be not a much work > for some one who has a good grasp on llvm but for me it may be sufficient > for GSoC duration. It may not be possible for Google to provide fundings > for limited number of improvements. So I am thinking to include some TODOs > in StackColoring.cpp and StackSlotColoring.cpp in proposal too. Will it be > enough to demonstrate in proposal ? > > Still I am looking for feedback on RDF part and also if some one is > willing to mentor me. > > Sincerely, > Vivek > > >> Cheers. >> >> Tim. >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160303/8ae53fd7/attachment.html>
Krzysztof Parzyszek via llvm-dev
2016-Mar-23 16:58 UTC
[llvm-dev] [GSoC 2016] Code Generation Improvements task
On 3/1/2016 11:26 AM, vivek pandya via llvm-dev wrote:> > Still I am looking for feedback on RDF part and also if some one is > willing to mentor me.Hi Vivek, Sorry, I missed this email. I wrote the RDF stuff and I'd be happy to help you out with it if you are interested. The idea was to have a utility class that would represent the data flow between registers. The registers could be a mixture of virtual and physical, although the main application would be to use it on a post-RA code. I decided against having it as a part of the pass manager, because the user does not have any direct control over the creation and invalidation of analyses, at least in the current version of the pass manager. This does not mean that it cannot (or shouldn't) be used in an analysis, just that it should also be available as a standalone utility. The missing bits are: 1. Handling of regmasks This shouldn't be too hard. All reference nodes (except those in phi nodes) have a pointer to the machine operand, from which the actual register is obtained. Regmasks are different, since a single operand references multiple registers at once. The way to handle them would be to treat a regmask as a register of its own that is aliased with the registers, whose clobbering it represents. 2. Recomputing liveness information on instruction level. The MI-level IR uses implicit operands to keep track of the liveness of aliased registers. These implicit operands serve no other purpose, but they may introduce apparent dependencies (that do not, in fact exist). RDF will ignore these implicit operands when constructing the DFG, and optimizations using RDF could produce code where the liveness information carried by these operands is no longer valid (the same goes for <kill> flags). This information would need to be recomputed. There is some code in there that does that for the <kill> flags, but it does not deal with the implicit operands at all. 3. Making it work with ther targets. RDF is intended to handle code that contains both physical and virtual registers on any target, but it has only been tested (in some capacity) on post-RA code and only on Hexagon. Making it fully target-independent would involve testing it with other targets. - There are "copy propagation" and "dead code elimination" passes that use RDF. Both are also meant to be target-independent and could serve as a testing tool. - RDF liveness would need to be verified to work on other targets. It is meant to recalculate block live-ins. 4. It is unknown what RDF will do with bundles. In theory, it should use the summary information from each bundle (without looking inside of bundles), but I have no idea whether there are any cases that would break it. There is nothing to represent the data flow within a bundle: besides not having any representation for it now, the actual data flow there may be highly target-dependent. This is more of a hypothetical question, at least for now, since it may be fairly complex to design and implement. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation