On Jan 16, 2013, at 11:26 PM, Dimitri Tcaciuc <dtcaciuc at gmail.com>
wrote:
> Hello everyone,
>
> For the context of question, I have a small loop written in a custom
front-end which can be fairly accurately expressed with the following C program:
>
> struct Array {
> double * data;
> long n;
> };
>
> #define X 0
> #define Y 1
> #define Z 2
>
> void f(struct Array * restrict d, struct Array * restrict out, const
long n)
> {
> for (long i = 0; i < n; ++ i) {
> for (long j = i + 1; j < n; ++ j) {
> out->data[X] = d->data[i * 3 + X] * d->data[j * 3
+ X];
> out->data[Y] = d->data[i * 3 + Y] * d->data[j * 3
+ Y];
> out->data[Z] = d->data[i * 3 + Z] * d->data[j * 3
+ Z];
> }
> }
> }
>
>
> I'm looking through the IR transformations during passes added by
LLVMTargetMachine::addPassesToEmitFile and seeing something I could use some
help explaining. The point of interest is between 'unreachableblockelim'
and 'codegenprepare' passes. Here is the paste of IR after each pass
>
> http://pastebin.com/42xLT4ZN
>
>
> I've annotated 3 spots in the code with stars. In (1), after
unreachableblockelim, addr89 is precomputed outside the loop once and is used in
store in (2). However, in (3), after codegenprepare, there is now a bunch of
math being done every loop iteration to get the address for the same store.
Additionally, looks like the same thing is happening for several addresses above
as well.
>
> Does this look right? Why would those calculations be moved back into the
loop?
Basically, it thinks that the loads within the loop are using a free(ish)
addressing mode that your target will be able to fold into the load. So it
sinks the address computations into the loop on the assumption that the
addressing mode ISel will kick in, for a net same performance with less register
pressure.
--Owen
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130116/3d462e84/attachment.html>