Hi again,
Still seeking a way to force particular order of instructions which would
dominate machine instruction scheduling. IR ordering is quite close and it
actually helps in ensuring correct ordering of instructions, but it's either
not enough to turn off scheduling or I'm using it wrong (comments in code
and commits that introduced it is pretty much all documentation I've found
on this subject).
I still have basically the same questions, but I'll try to rephrase them
hoping that it'll make them clearer:
1. Is it OK to block generic load&store sequence generator by setting
limits for memcpy() generation to very low values? This will force
calling of architecture-specific callback for inlining memcpy().
2. Would it be better to provide a flag in TargetSelectionDAGInfo to
alter behaviour of generic load&store function or maybe use existing
flags that say whether platform supports paired loads&stores?
3. Is there a way to ignore latencies for particular instructions? They
can cause undesired reordering even for instructions for which IR
order is set explicitly.
Are there any documentation on using IROrder? Should it even guarantee
strict order at all? I'm asking because effect of assigning order can
produce quite surprising sequences, e.g.: 100->101->1->102->103.
Also found ScheduleDAGSDNodes::AddGlue(...), can it be useful in my case?
Could somebody with experience in this part of LLVM say whether I'm
digging in right direction or it's all wrong?
Thanks,
Sergey
On Fri, Jul 18, 2014 at 05:29:55AM -0700, Sergey Dmitrouk
wrote:> Hi,
>
> I'm trying to improve performance of code generated for AArch64, as
> described in this thread [0] about memcpy() inlining, and have two issues
> with it:
>
> 1) Some output sequences look like this:
> ...something...
> load
> ...something...
> store
> load
> store
> By chaining each next load with previous store I can turn it into:
> ...something...
> ...something...
> load
> store
> load
> store
> It can be done directly in SelectionDAG.cpp, but it's
target-specific
> and shouldn't go there. If I make it target-specific, it never gets
> called, because SelectionDAG uses the following sequence of calls:
> 1. Try generic load&store generator.
> 2. Try target-specific load&store generator.
> 3. Try generic load&store generator and force generation.
>
> My question is how can I give target-specific generator bigger
> priority in this case? Is there any flag for this, or maybe it's
worth
> adding one?
>
> 2) Second issue seems to be harder, I'd like to prevent Machine
Instruction
> Scheduler from reordering
> load
> store
> load
> store
> into
> load
> load
> store
> store
> Chaining and specifying IROrder doesn't help (assuming I implement
it
> correctly). I don't see particular order on picking instructions
> in GenericScheduler that don't really depend on each other.
Reordering
> seems to occur as a side effect of something else. If there are more
> load&store operations (e.g. four pairs of load&store), only the
last
> four instructions are reordered.
>
> I don't see how scheduling can be controlled other than by providing
> custom scheduler, but will it help? I do not see enough ordering
> information at this level and don't understand how it can be forced.
>
> Can somebody advice me on this? If it's documented somewhere and I
miss
> that, you could just give me a link.
>
> Thanks,
> Sergey
>
> [0]:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140714/226044.html
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev