Hello everyone,
For the context of question, I have a small loop written in a custom
front-end which can be fairly accurately expressed with the following C
program:
struct Array {
double * data;
long n;
};
#define X 0
#define Y 1
#define Z 2
void f(struct Array * restrict d, struct Array * restrict out, const
long n)
{
for (long i = 0; i < n; ++ i) {
for (long j = i + 1; j < n; ++ j) {
out->data[X] = d->data[i * 3 + X] * d->data[j * 3 + X];
out->data[Y] = d->data[i * 3 + Y] * d->data[j * 3 + Y];
out->data[Z] = d->data[i * 3 + Z] * d->data[j * 3 + Z];
}
}
}
I'm looking through the IR transformations during passes added
by LLVMTargetMachine::addPassesToEmitFile and seeing something I could use
some help explaining. The point of interest is between
'unreachableblockelim' and 'codegenprepare' passes. Here is the
paste of IR
after each pass
http://pastebin.com/42xLT4ZN
I've annotated 3 spots in the code with stars. In (1), after
unreachableblockelim, addr89 is precomputed outside the loop once and is
used in store in (2). However, in (3), after codegenprepare, there is now a
bunch of math being done every loop iteration to get the address for the
same store. Additionally, looks like the same thing is happening for
several addresses above as well.
Does this look right? Why would those calculations be moved back into the
loop?
Thanks,
Dimitri.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130116/04147618/attachment.html>
On Jan 16, 2013, at 11:26 PM, Dimitri Tcaciuc <dtcaciuc at gmail.com> wrote:> Hello everyone, > > For the context of question, I have a small loop written in a custom front-end which can be fairly accurately expressed with the following C program: > > struct Array { > double * data; > long n; > }; > > #define X 0 > #define Y 1 > #define Z 2 > > void f(struct Array * restrict d, struct Array * restrict out, const long n) > { > for (long i = 0; i < n; ++ i) { > for (long j = i + 1; j < n; ++ j) { > out->data[X] = d->data[i * 3 + X] * d->data[j * 3 + X]; > out->data[Y] = d->data[i * 3 + Y] * d->data[j * 3 + Y]; > out->data[Z] = d->data[i * 3 + Z] * d->data[j * 3 + Z]; > } > } > } > > > I'm looking through the IR transformations during passes added by LLVMTargetMachine::addPassesToEmitFile and seeing something I could use some help explaining. The point of interest is between 'unreachableblockelim' and 'codegenprepare' passes. Here is the paste of IR after each pass > > http://pastebin.com/42xLT4ZN > > > I've annotated 3 spots in the code with stars. In (1), after unreachableblockelim, addr89 is precomputed outside the loop once and is used in store in (2). However, in (3), after codegenprepare, there is now a bunch of math being done every loop iteration to get the address for the same store. Additionally, looks like the same thing is happening for several addresses above as well. > > Does this look right? Why would those calculations be moved back into the loop?Basically, it thinks that the loads within the loop are using a free(ish) addressing mode that your target will be able to fold into the load. So it sinks the address computations into the loop on the assumption that the addressing mode ISel will kick in, for a net same performance with less register pressure. --Owen -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130116/3d462e84/attachment.html>
Apparently Analagous Threads
- [LLVMdev] LLVM tries to remove labels used in blockaddress()
- [LLVMdev] LLVM tries to remove labels used in blockaddress()
- [LLVMdev] How to use LLVM optimizations with clang
- [LLVMdev] LLVM tries to remove labels used in blockaddress()
- [LLVMdev] How to use LLVM optimizations with clang