thr3ads.net - llvm dev - [llvm-dev] beneficial optimization of undef examples needed [Jun 2017]

If this information is useful, please help other people find it:
Share via:
Jon Chesterfield via llvm-dev
2017-Jun-17 10:49 UTC
[llvm-dev] beneficial optimization of undef examples needed

Hi Peter,

Undef is certainly useful for vector operations in the back end. It allows
shorter instruction sequences for vectors which have some, but not all,
elements marked as undef. Lowering vector shuffle as swap, combining
arithmetic and similar.

For example, in slightly lispy notation, folding
(+ x (vector i32 undef 5))
and
(+ x (vector i32 4 undef))
to
(+ x (vector i32 4 5))

There should also be optimisations available for bitwise operations on
machine words that are partially undef, but I haven't written any yet.

Working with variables that are entirely undef is of less interest to me.
For example, folding (add 5 undef) to undef leads to less code, but it's
still not code that does anything useful.

Cheers,

Jon

On Sat, Jun 17, 2017 at 1:02 AM, via llvm-dev <llvm-dev at lists.llvm.org>
wrote:
> Send llvm-dev mailing list submissions to
>         llvm-dev at lists.llvm.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> or, via email, send a message with subject or body 'help' to
>         llvm-dev-request at lists.llvm.org
>
> You can reach the person managing the list at
>         llvm-dev-owner at lists.llvm.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of llvm-dev digest..."
>
>
> Today's Topics:
>
>    1. beneficial optimization of undef examples needed
>       (Peter Lawrence via llvm-dev)
>    2. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
>       Please give it a try! (Quentin Colombet via llvm-dev)
>    3. Re: beneficial optimization of undef examples needed
>       (John Regehr via llvm-dev)
>    4. Re: How does sanitizers in compiler-rt work?
>       (Dipanjan Das via llvm-dev)
>    5. Re: LLC does not do proper copy propagation (or copy
>       coalescing) (Alex Susu via llvm-dev)
>    6. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
>       Please give it a try! (Quentin Colombet via llvm-dev)
>    7. Re: beneficial optimization of undef examples needed
>       (Matthias Braun via llvm-dev)
>    8. Re: [GlobalISel][AArch64] Toward flipping the switch for O0:
>       Please give it a try! (Eric Christopher via llvm-dev)
>    9. Re: Wide load/store optimization question
>       (Matthias Braun via llvm-dev)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 16 Jun 2017 15:03:32 -0700
> From: Peter Lawrence via llvm-dev <llvm-dev at lists.llvm.org>
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] beneficial optimization of undef examples needed
> Message-ID: <E4E7669D-0308-4F85-B58E-87863717CD85 at sbcglobal.net>
> Content-Type: text/plain; charset=utf-8
>
> All,
>      These discussions seem to be based on the premise that there is a
> need for the compiler to exploit undefined behavior for performance
> optimization reasons.
>
> So far the only beneficial optimization I am aware of that relies on some
> form of “undefined” is Dan Gohman’s original project for LP64 targets of
> promoting i32 induction variables to i64 and hoisting sign-extension out
> of the loop.
>
> But “undef” / “poison” never appears in either the original or the
> transformed
> IR for these types of loops, instead properties of “+nsw” are used to
> justify the transformation.  The transformation does not just fall out
> because
> we’ve done a good job at defining “undef” / “poison” IR nodes.
>
> So I’d like to see some concrete examples of where the compiler can
> do useful optimization based on “undef” / “poison” appearing explicitly
> In the IR,  finding some would surely advance this discussion.
>
>
>
> Peter Lawrence.
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 16 Jun 2017 15:06:36 -0700
> From: Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org>
> To: Diana Picus <diana.picus at linaro.org>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Justin Bogner
>         <jbogner at apple.com>, Ahmed Bougacha <abougacha at
apple.com>, Aditya
>         Nandakumar <aditya_nandakumar at apple.com>, nd <nd at
arm.com>
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
>         switch for O0: Please give it a try!
> Message-ID: <7923B567-B229-4FD8-8791-F9083A006FDE at apple.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> > On Jun 14, 2017, at 7:27 AM, Diana Picus <diana.picus at
linaro.org> wrote:
> >
> > On 12 June 2017 at 18:54, Diana Picus <diana.picus at linaro.org
<mailto:
> diana.picus at linaro.org>> wrote:
> > Hi all,
> >
> > I added a buildbot [1] running the test-suite with -O0 -global-isel.
It
> runs into the same 2 timeouts that I reported previously on this thread
> (paq8p and scimark2). It would be nice to make it green before flipping the
> switch.
> >
> >
> > I did some more investigations on a machine similar to the one running
> the buildbot. For paq8p and scimark2, I get these results for O0:
> >
> > PAQ8p:
> > Fast isel: 666.344
> > Global isel: 731.384
> >
> > SciMark2-C:
> > Fast isel: 463.908
> > Global isel: 496.22
> >
> > The current timeout is 500s (so in this particular case we didn't
hit it
> for scimark2, and it ran successfully to completion). I don't think the
> difference between FastISel and GlobalISel is too atrocious, so I would
> propose increasing the timeout for these 2 benchmarks. I'm not sure if
we
> can do this on a per-bot basis, but I see some precedent for setting custom
> timeout thresholds for various benchmarks on different architectures
> (sometimes with comments that it's done so we can run O0 on that
particular
> benchmark).
> >
> > Something along these lines works:
> > https://reviews.llvm.org/differential/diff/102547/ <
> https://reviews.llvm.org/differential/diff/102547/>
> >
> > What do you guys think about this approach?
>
> Looks reasonable to me.
>
> >
> > Thanks,
> > Diana
> >
> > PS: The buildbot is using the Makefiles because that's what our
other
> AArch64 test-suite bots use. Moving all of them to CMake is a transition
> for another time.
> >
> > At the moment, it lives in an internal buildmaster that I've setup
for
> this purpose. If we fix it and it proves to be stable for a week or two,
> I'll move it to the public master.
> >
> > Cheers,
> > Diana
> >
> > [1] http://master2.llvm.validation.linaro.org/
> builders/clang-cmake-aarch64-global-isel <http://master2.llvm.
> validation.linaro.org/builders/clang-cmake-aarch64-global-isel>
> >
> >
> > On 6 June 2017 at 19:11, Quentin Colombet <qcolombet at apple.com
<mailto:
> qcolombet at apple.com>> wrote:
> > Thanks Kristof.
> >
> > Sounds like we'll need to investigate though I'd say it is not
blocking
> the switch.
> >
> > At this point I think everybody is on board to flip the switch.
> > @Eric, how does that sound to you?
> >
> > Thanks,
> > Q
> >
> > Le 1 juin 2017 à 07:46, Kristof Beyls <Kristof.Beyls at arm.com
<mailto:
> Kristof.Beyls at arm.com>> a écrit :
> >
> >>
> >>> On 31 May 2017, at 17:07, Quentin Colombet <qcolombet at
apple.com
> <mailto:qcolombet at apple.com>> wrote:
> >>>>
> >>>> Latest comparisons on my side, after picking up r304244,
i.e. the
> correct Localizer pass.
> >>>> * CTMark compile time, comparing "-O0 -g" vs
'-O0 -g -mllvm
> -global-isel=true -mllvm -global-isel-abort=0': about 6% increase with
> globalisel. This was about 3.5% before the Localizer pass landed.
> >>>
> >>> That one is surprising too. I wouldn’t have expected this pass
to show
> up in the compile time profile. At least not to this extend.
> >>> What is the biggest offender?
> >>
> >> Hmmm. So I took the 3.5% compile time overhead from my last
measurement
> before the localizer landed, from around 24th of May.
> >> When using -ftime-report, I see the Localizer pass typically
taking
> very roughly about 1% of compile time.
> >> Maybe another part of GlobalISel became a bit slower since I did
that
> 3.5% measurement?
> >> Or maybe the Localizer pass changes the structure of the program
so
> that another later pass gets a different compile time profile?
> >> Basically, I'd have to do more experiments to figure that one
out.
> >>
> >> As far as where time is spent in the gisel-passes itself, on
average, I
> saw the following on the latest CTMark experiment I ran:
> >> Avg compile time spent in IRTranslator: 4.61%
> >> Avg compile time spent in InstructionSelect: 7.51%
> >> Avg compile time spent in Legalizer: 1.06%
> >> Avg compile time spent in Localizer: 0.76%
> >> Avg compile time spent in RegBankSelect: 2.12%
> >>
> >>>
> >>>> * My usual performance benchmarking run: 8.5% slow-down.
This was
> about 9.5% before the Localizer pass landed, so a slight improvement.
> >>>> * Code size: 3.14% larger. This was about 2.8% before the
Localizer
> pass landed, so a slight regression.
> >>>
> >>> That one is surprising. Do you have an idea of what is
happening?
> >>> Alternatively if you can point me to the biggest offender, I
can have
> a look.
> >>
> >> So the biggest offenders on the mem_bytes metric in LNT are:
> >> O0 -g        O0 -g gisel-with-localizer      O0 -g
> gisel-without-localizer
> >> SingleSource/Benchmarks/Misc/perlin  14272   14640   18344  
25.95%
> >> SingleSource/Benchmarks/Dhrystone/dry        16560   17144   20160
>  18.21%
> >> SingleSource/Benchmarks/Stanford/QueensProfile       13912   14192
>  15136   6.79%
> >> MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url    71400
>  72272   75504   4.53%
> >>
> >> I haven't had time to investigate what exact changes make the
code size
> go up that much with the localizer pass in those cases...
> >>
> >>>
> >>> The only thing I can think of is that we duplicate constants
that are
> expensive to materialize. If that’s the case, we were discussing with Ahmed
> an alternative to the localizer pass that would operate during
> InstructionSelect so may be worth pursuing.
> >>
> >
> >
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/e2843584/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 16 Jun 2017 16:19:09 -0600
> From: John Regehr via llvm-dev <llvm-dev at lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] beneficial optimization of undef examples
>         needed
> Message-ID: <4caa44fc-9bb7-58e7-9f44-f711edce1f5e at cs.utah.edu>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> I'll repeat that open-ended requests would that end up generating lots
> of work for other people probably aren't going to get great results
here.
>
> John
>
>
>
> On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:
> > All,
> >      These discussions seem to be based on the premise that there is a
> > need for the compiler to exploit undefined behavior for performance
> > optimization reasons.
> >
> > So far the only beneficial optimization I am aware of that relies on
some
> > form of “undefined” is Dan Gohman’s original project for LP64 targets
of
> > promoting i32 induction variables to i64 and hoisting sign-extension
out
> > of the loop.
> >
> > But “undef” / “poison” never appears in either the original or the
> transformed
> > IR for these types of loops, instead properties of “+nsw” are used to
> > justify the transformation.  The transformation does not just fall out
> because
> > we’ve done a good job at defining “undef” / “poison” IR nodes.
> >
> > So I’d like to see some concrete examples of where the compiler can
> > do useful optimization based on “undef” / “poison” appearing
explicitly
> > In the IR,  finding some would surely advance this discussion.
> >
> >
> >
> > Peter Lawrence.
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 16 Jun 2017 15:23:06 -0700
> From: Dipanjan Das via llvm-dev <llvm-dev at lists.llvm.org>
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] How does sanitizers in compiler-rt work?
> Message-ID:
>         <CAEK-7JLpnet2zF82z5v-RvUKaYrbmZRGtHpc-=dTTQuVKwDosg
> @mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Vedant,
>
> Thanks for the pointers. Please find my replies inline.
>
> On 16 June 2017 at 14:48, Vedant Kumar <vsk at apple.com> wrote:
>
> >
> > On Jun 16, 2017, at 4:11 AM, Dipanjan Das via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >
> >
> > Can anybody give me any pointer on how compiler-rt, especially the
> > sanitizers work? Do they operate on IR as any other LLVM pass? Or are
> they
> > integral part of the frontend itself? I couldn't spot any
documentation
> on
> > the internals of compiler-rt project? What happens (sequence of
actions)
> > when I pass -fsanitizer=dataflow to clang?
> >
> >
> > Passing -fsanitize=dataflow tells clang to insert the dataflow
> sanitizer's
> > instrumentation pass into the normal compilation pipeline. The
> > instrumentation occurs at the LLVM IR level. The pass may insert calls
> into
> > runtime functions which are provided by compiler-rt. Therefore, in
order
> to
> > link a program compiled with -fsanitize=dataflow, the appropriate
runtime
> > library from compiler-rt is required.
> >
> >
> > Precisely, I intend to alter the behaviour of DFSan to suit my need.
> >
> >
> > What is your need, exactly?
> >
> >
> Instead of manually inserting the dfsan_create_label() and
> dfsan_set_label() calls in the source, I want to automatically insert those
> calls in the IR for all the input variables in scanf(). I intend to run the
> DFsan pass afterwards, thus instrumenting the IR further as required.
>
>
> > Therefore, I need to know how it gets integrated in the tool-chain.
> > Initially, my idea was to insert the dfsan_set_label() calls to the IR
> and
> > pass it to DFSan. However, I am not sure if it's designed to run
on the
> > source only, not on IR.
> >
> >
> > You should take a look at lib/Transforms/Instrumentation/
> DataFlowSanitizer.cpp.
> > There doesn't appear to be much done at the source level.
> >
> > best,
> > vedant
> >
> >
> > --
> >
> > Thanks & Regards,
> > Dipanjan
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
>
>
> --
>
> Thanks & Regards,
> Dipanjan
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/b7325d95/attachment-0001.html>
>
> ------------------------------
>
> Message: 5
> Date: Sat, 17 Jun 2017 02:28:22 +0300
> From: Alex Susu via llvm-dev <llvm-dev at lists.llvm.org>
> To: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] LLC does not do proper copy propagation (or
>         copy coalescing)
> Message-ID: <87cb1d9f-55ed-c9a6-489c-3d87c6a0aaf1 at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
>    Hello.
>      Wei-Ren, as I've pointed out in the previous email: the piece of
code
> below has the
> deficiency that it uses register R5 instead of using R0 - this happens
> because in LLVM IR
> I created 2 variables, varIndexInner and varIndexOuter, since I have 2
> loops and the
> variable has to be iterated in the inner loop and I need to preserve its
> value when going
> to the next iteration for the outer loop.
>        // NOTE: my processor accepts loops in the form of
> REPEAT(num_times)..END_REPEAT
>        R0 = ...
>        REPEAT(256)
>          R5 = R0; // basically unnecessary reg. copy
>          REPEAT(256)
>            R10 = LS[R4];
>            R2 = LS[R5];
>            R4 = R4 + R1;
>            R5 = R5 + R1; // should be R0 = R0 + R1
>            R10 = R2 * R10;
>            R3 = R3 + R10;
>          END_REPEAT;
>          REDUCE R3;
>          R0 = R5; // basically unnecessary reg. copy
>        END_REPEAT;
>
>
>      The reason the RegisterCoalescer.cpp is not able to optimize this
> problem I mentioned
> about is that R0 and R5 have interfering live intervals.
>
>      I'm trying to implement a case to handle this optimization I want
in
> RegisterCoalescer.cpp, but it seems a bit complicated. (However, it seems
> more natural to
> do a standard copy propagation with Data Flow Analysis on the
> MachineBasicBlocks with
> virtual registers, after coming out of SSA form. Muchnick's book from
1997
> talks in detail
> about this in Section 12.5.)
>
>      More exactly the registers and copies concerned for the above ASM
> code (copying text
> from the stderr of llc) are:
>        BB#0:
>          vreg99 = 0 // IMPORTANT: this instruction is dead and I guess if
> it is DCE-ed
> RegisterCoalescer.cpp would be able to optimize my code
>
>        BB#1:
>          vreg94 = some_data_offset
>
>        BB#3:
>          vreg99 = COPY vreg94 // This copy does propagate
>
>        BB#4:
>          vreg61 = LOAD vreg99
>          vreg99 = ADD vreg99, 1
>          jmp_cond BB#4, BB#9
>
>        BB#9:
>          vreg94 = COPY vreg99 // This copy does NOT propagate
>          jmp_cond BB#3
>
>      Can somebody tell me how can I run the Dead Code Elimination and then
> RegisterCoalescer again in LLC in order to see if I can maybe optimize
> this piece of code?
>
>      I'm interested in doing this optimization since the code runs on a
> very wide SIMD
> processor and every instruction counts.
>
>    Thank you very much,
>      Alex
>
>
>
> On 6/15/2017 11:41 PM, 陳韋任 wrote:
> >         I see 3 options to address my problem:
> >           - implement a case that handles this in PHI elimination
> (PHIElimination.cpp);
> >           - create a new pass that does copy propagation (based on
DFA)
> on machine
> >     instructions before Register Allocation;
> >           - optimize copy coalescing such as the standard one or the
one
> activated by
> >     -pbqp-coalescing in lib/CodeGen/RegAllocPBQP.cpp (there is an
email
> also about PBQP
> >     coalescing at http://lists.llvm.org/pipermail/llvm-dev/2016-June/
> 100523.html
> >    
<http://lists.llvm.org/pipermail/llvm-dev/2016-June/100523.html>).
> >
> >
> > Usually this is done by copy coalescing, do you know why yours cannot
be
> eliminated, is
> > your case not be handled well in existing copy coalescing
> (RegisterCoalescer.cpp for
> > example)?
> >
> > HTH,
> > chenwj
> >
> > --
> > Wei-Ren Chen (陳韋任)
> > Homepage: https://people.cs.nctu.edu.tw/~chenwj
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 16 Jun 2017 16:43:35 -0700
> From: Quentin Colombet via llvm-dev <llvm-dev at lists.llvm.org>
> To: Quentin Colombet <qcolombet at apple.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Justin Bogner
>         <jbogner at apple.com>, Ahmed Bougacha <abougacha at
apple.com>, Aditya
>         Nandakumar <aditya_nandakumar at apple.com>, nd <nd at
arm.com>
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
>         switch for O0: Please give it a try!
> Message-ID: <71BBA103-3C49-4AFD-92E2-A8C9EABCE242 at apple.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> We had some internal discussions about flipping the default for O0 and we
> concluded that we wanted to postpone it.
>
>
> *** Why Is That? ***
>
> We don’t want to send the wrong message that GlobalISel’s design is set in
> stone and ready for broader adoption.
> In particular,
> 1. The APIs are still evolving and can still possibly change significantly
> 2. The TableGen backend to reuse the existing SD patterns is still at its
> early stage
> 3. We want to investigate closely the performance of global-isel
> (compile-time, runtime, code size, fallbacks)
>
> The rationale behind those items is that we want to minimize the pain of
> moving forward for everybody. We also want the out-of-the-box experience to
> be pleasant (like all/most of the tablegen patterns just work, we have
> documentation on how to target a new backend, etc.) Finally, we want to
> gain confidence we are going to be able to address the performance issues
> we have with the current design and if not, derive a plan for that.
>
> We purposely left out of the conversation what will be the right time and
> requirements to flip the switch. We want to gather more data first. Your
> help would be appreciated!
>
>
> *** Short-Term Proposal ***
>
> What we would like to do instead short-term is:
> A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to
> enable GISel with fallbacks and warnings enables (i.e., equivalent of
> -global-isel -global-isel-abort=2)
> B. Advertise this option in the next open source release to allow compiler
> enthusiastic to try it and report problems
> C. Have GISel always built so we can push thing in the right place,
> MachineVerifier in mind, and stop doing some weird gymnastic
>
> What do people think?
>
>
> *** Your Help Is Needed ***
>
> - Please share your experience in using the GISel APIs and how we can make
> them better. Moving forward we’ll have those conversations on open source
> instead of internally/with a narrower audience.
> - Report any performance problem you identify
> - Propose patches!
>
> Cheers,
> -Quentin
>
>
>
> > On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> >>
> >> On Jun 14, 2017, at 7:27 AM, Diana Picus <diana.picus at
linaro.org
> <mailto:diana.picus at linaro.org>> wrote:
> >>
> >> On 12 June 2017 at 18:54, Diana Picus <diana.picus at
linaro.org <mailto:
> diana.picus at linaro.org>> wrote:
> >> Hi all,
> >>
> >> I added a buildbot [1] running the test-suite with -O0
-global-isel. It
> runs into the same 2 timeouts that I reported previously on this thread
> (paq8p and scimark2). It would be nice to make it green before flipping the
> switch.
> >>
> >>
> >> I did some more investigations on a machine similar to the one
running
> the buildbot. For paq8p and scimark2, I get these results for O0:
> >>
> >> PAQ8p:
> >> Fast isel: 666.344
> >> Global isel: 731.384
> >>
> >> SciMark2-C:
> >> Fast isel: 463.908
> >> Global isel: 496.22
> >>
> >> The current timeout is 500s (so in this particular case we
didn't hit
> it for scimark2, and it ran successfully to completion). I don't think
the
> difference between FastISel and GlobalISel is too atrocious, so I would
> propose increasing the timeout for these 2 benchmarks. I'm not sure if
we
> can do this on a per-bot basis, but I see some precedent for setting custom
> timeout thresholds for various benchmarks on different architectures
> (sometimes with comments that it's done so we can run O0 on that
particular
> benchmark).
> >>
> >> Something along these lines works:
> >> https://reviews.llvm.org/differential/diff/102547/ <
> https://reviews.llvm.org/differential/diff/102547/>
> >>
> >> What do you guys think about this approach?
> >
> > Looks reasonable to me.
> >
> >>
> >> Thanks,
> >> Diana
> >>
> >> PS: The buildbot is using the Makefiles because that's what
our other
> AArch64 test-suite bots use. Moving all of them to CMake is a transition
> for another time.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/fb1dc279/attachment-0001.html>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 16 Jun 2017 16:48:15 -0700
> From: Matthias Braun via llvm-dev <llvm-dev at lists.llvm.org>
> To: John Regehr <regehr at cs.utah.edu>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] beneficial optimization of undef examples
>         needed
> Message-ID: <C94F7BAB-BC74-49FA-9F2B-3104736EDA7C at apple.com>
> Content-Type: text/plain; charset="utf-8"
>
> Luckily someone already did the work writing a bunch of examples down:
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html <
> http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>
>
> And +1 for keeping this on-topic on how to implement poison.
>
> - Matthias
>
> > On Jun 16, 2017, at 3:19 PM, John Regehr via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > I'll repeat that open-ended requests would that end up generating
lots
> of work for other people probably aren't going to get great results
here.
> >
> > John
> >
> >
> >
> > On 6/16/17 4:03 PM, Peter Lawrence via llvm-dev wrote:
> >> All,
> >>     These discussions seem to be based on the premise that there
is a
> >> need for the compiler to exploit undefined behavior for
performance
> >> optimization reasons.
> >>
> >> So far the only beneficial optimization I am aware of that relies
on
> some
> >> form of “undefined” is Dan Gohman’s original project for LP64
targets of
> >> promoting i32 induction variables to i64 and hoisting
sign-extension out
> >> of the loop.
> >>
> >> But “undef” / “poison” never appears in either the original or the
> transformed
> >> IR for these types of loops, instead properties of “+nsw” are used
to
> >> justify the transformation.  The transformation does not just fall
out
> because
> >> we’ve done a good job at defining “undef” / “poison” IR nodes.
> >>
> >> So I’d like to see some concrete examples of where the compiler
can
> >> do useful optimization based on “undef” / “poison” appearing
explicitly
> >> In the IR,  finding some would surely advance this discussion.
> >>
> >>
> >>
> >> Peter Lawrence.
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/98f258f4/attachment-0001.html>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 16 Jun 2017 23:58:21 +0000
> From: Eric Christopher via llvm-dev <llvm-dev at lists.llvm.org>
> To: Quentin Colombet <qcolombet at apple.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>, Justin Bogner
>         <jbogner at apple.com>, Ahmed Bougacha <abougacha at
apple.com>, Aditya
>         Nandakumar <aditya_nandakumar at apple.com>, nd <nd at
arm.com>
> Subject: Re: [llvm-dev] [GlobalISel][AArch64] Toward flipping the
>         switch for O0: Please give it a try!
> Message-ID:
>         <CALehDX5M=+G3LJ4q4b-P0WN3D5WqiYPK+j3zGRJVANNMfHRsWg at
mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> On Fri, Jun 16, 2017 at 4:43 PM Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
> > Hi all,
> >
> > We had some internal discussions about flipping the default for O0 and
we
> > concluded that we wanted to postpone it.
> >
> >
> > *** Why Is That? ***
> >
> > We don’t want to send the wrong message that GlobalISel’s design is
set
> in
> > stone and ready for broader adoption.
> > In particular,
> > 1. The APIs are still evolving and can still possibly change
> significantly
> > 2. The TableGen backend to reuse the existing SD patterns is still at
its
> > early stage
> > 3. We want to investigate closely the performance of global-isel
> > (compile-time, runtime, code size, fallbacks)
> >
> > The rationale behind those items is that we want to minimize the pain
of
> > moving forward for everybody. We also want the out-of-the-box
experience
> to
> > be pleasant (like all/most of the tablegen patterns just work, we have
> > documentation on how to target a new backend, etc.) Finally, we want
to
> > gain confidence we are going to be able to address the performance
issues
> > we have with the current design and if not, derive a plan for that.
> >
> > We purposely left out of the conversation what will be the right time
and
> > requirements to flip the switch. We want to gather more data first.
Your
> > help would be appreciated!
> >
> >
> > *** Short-Term Proposal ***
> >
> > What we would like to do instead short-term is:
> > A. Repurpose or create an option “-aarch64-enable-global-isel-at-O” to
> > enable GISel with fallbacks and warnings enables (i.e., equivalent of
> > -global-isel -global-isel-abort=2)
> > B. Advertise this option in the next open source release to allow
> compiler
> > enthusiastic to try it and report problems
> > C. Have GISel always built so we can push thing in the right place,
> > MachineVerifier in mind, and stop doing some weird gymnastic
> >
> > What do people think?
> >
> >
> How about -fexperimental-global-isel as a flag to clang?
>
> -eric
>
>
> >
> > *** Your Help Is Needed ***
> >
> > - Please share your experience in using the GISel APIs and how we can
> make
> > them better. Moving forward we’ll have those conversations on open
source
> > instead of internally/with a narrower audience.
> > - Report any performance problem you identify
> > - Propose patches!
> >
> > Cheers,
> > -Quentin
> >
> >
> >
> > On Jun 16, 2017, at 3:06 PM, Quentin Colombet via llvm-dev <
> > llvm-dev at lists.llvm.org> wrote:
> >
> >
> > On Jun 14, 2017, at 7:27 AM, Diana Picus <diana.picus at
linaro.org> wrote:
> >
> > On 12 June 2017 at 18:54, Diana Picus <diana.picus at
linaro.org> wrote:
> >
> >> Hi all,
> >>
> >> I added a buildbot [1] running the test-suite with -O0
-global-isel. It
> >> runs into the same 2 timeouts that I reported previously on this
thread
> >> (paq8p and scimark2). It would be nice to make it green before
flipping
> the
> >> switch.
> >>
> >>
> > I did some more investigations on a machine similar to the one running
> the
> > buildbot. For paq8p and scimark2, I get these results for O0:
> >
> > PAQ8p:
> > Fast isel: 666.344
> > Global isel: 731.384
> >
> > SciMark2-C:
> > Fast isel: 463.908
> > Global isel: 496.22
> >
> > The current timeout is 500s (so in this particular case we didn't
hit it
> > for scimark2, and it ran successfully to completion). I don't
think the
> > difference between FastISel and GlobalISel is too atrocious, so I
would
> > propose increasing the timeout for these 2 benchmarks. I'm not
sure if we
> > can do this on a per-bot basis, but I see some precedent for setting
> custom
> > timeout thresholds for various benchmarks on different architectures
> > (sometimes with comments that it's done so we can run O0 on that
> particular
> > benchmark).
> >
> > Something along these lines works:
> > https://reviews.llvm.org/differential/diff/102547/
> >
> > What do you guys think about this approach?
> >
> >
> > Looks reasonable to me.
> >
> >
> > Thanks,
> > Diana
> >
> > PS: The buildbot is using the Makefiles because that's what our
other
> > AArch64 test-suite bots use. Moving all of them to CMake is a
transition
> > for another time.
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/76b429ee/attachment-0001.html>
>
> ------------------------------
>
> Message: 9
> Date: Fri, 16 Jun 2017 17:05:46 -0700
> From: Matthias Braun via llvm-dev <llvm-dev at lists.llvm.org>
> To: 陳韋任 <chenwj.cs97g at g2.nctu.edu.tw>
> Cc: LLVM Developers Mailing List <llvm-dev at lists.llvm.org>,
upcfrost
>         <upcfrost at gmail.com>
> Subject: Re: [llvm-dev] Wide load/store optimization question
> Message-ID: <5517D905-C871-45AB-A985-73DEF2C23B58 at apple.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> > On Jun 16, 2017, at 2:43 PM, 陳韋任 via llvm-dev <llvm-dev at
lists.llvm.org>
> wrote:
> >
> >
> >
> > 2017-06-17 4:36 GMT+08:00 upcfrost <upcfrost at gmail.com
<mailto:
> upcfrost at gmail.com>>:
> > Hi,
> >
> > Same here, my backend only has 64bit load/store. But i still use 64bit
> virt regs and expand/declare missing instructions by myself.
> >
> > I'll try looking into sparc backend, thanks. Also, only after
writing
> this post I found a bunch of built-in transforms. Still trying to
> understand how to use those.
> >
> > By the way, constraint-wise (alignment), is there any difference
between
> virt regclass and regtuple?
>
> That question makes no sense.
> - Every virtual register has a register class assigned.
> - You can construct special register classes that represent register
> tuples so that when the allocator chooses an entry from that register class
> it really has choosen a tuple of machine registers (even though it looks
> like a single register with funny aliasing as far as llvm codegen is
> concerned).
>
> - Matthias
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/
> attachments/20170616/30f9fd06/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> llvm-dev mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> ------------------------------
>
> End of llvm-dev Digest, Vol 156, Issue 97
> *****************************************
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170617/851f604d/attachment-0001.html>
llvm dev - Jun 2017 - beneficial optimization of undef examples needed

[llvm-dev] beneficial optimization of undef examples needed