Fernando Magno Quintao Pereira via llvm-dev
2016-Dec-31 19:37 UTC
[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives
Dear Mehdi, I've changed your example a little bit: float saxpy(float a, float *x, float *y, int n) { int j = 0; for (int i = 0; i < n; ++i) { y[j] = a*x[i] + y[I]; // Change 'I' into 'j'? ++j; } } I get this code below, once I replace 'I' with 'j'. We are copying n positions of both arrays, 'x' and 'y': float saxpy(float a, float *x, float *y, int n) { int j = 0; long long int AI1[6]; AI1[0] = n + -1; AI1[1] = 4 * AI1[0]; AI1[2] = AI1[1] + 4; AI1[3] = AI1[2] / 4; AI1[4] = (AI1[3] > 0); AI1[5] = (AI1[4] ? AI1[3] : 0); #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) #pragma acc kernels for (int i = 0; i < n; ++i) { y[j] = a * x[i] + y[j]; ++j; } } Regards, Fernando On Sat, Dec 31, 2016 at 3:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:> Hi, > >> On Dec 31, 2016, at 8:33 AM, Fernando Magno Quintao Pereira via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Dear LLVMers, >> >> we have released a tool that uses LLVM to insert OpenACC or OpenMP >> 4.0 directives in programs. You can use the tool online here: >> http://cuda.dcc.ufmg.br/dawn/. Our tool, dawn-cc, analyzes the LLVM IR >> to infer the sizes of memory chunks, and to find dependences within >> loops. After that, we use debug information to translate the low-level >> information back into annotations that we insert into C/C++ programs. >> For instance, if we take a program like this one below: >> >> void saxpy(float a, float *x, float *y, int n) { >> for (int i = 0; i < n; ++i) >> y[i] = a*x[i] + y[i]; >> } >> >> Then dawn-cc produces the code below: >> >> void saxpy(float a, float *x, float *y, int n) { >> long long int AI1[6]; >> AI1[0] = n - 1; >> AI1[1] = 4 * AI1[0]; >> AI1[2] = AI1[1] + 4; >> AI1[3] = AI1[2] / 4; >> AI1[4] = (AI1[3] > 0); >> AI1[5] = (AI1[4] ? AI1[3] : 0); >> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) >> #pragma acc kernels >> for (int i = 0; i < n; ++i) >> y[i] = a * x[i] + y[i]; >> } > > That’s nice! > > I’m wondering about how would you handle the issue that retro-fitting the pragma on the original code from the optimized can be fragile. For example if I use a different variable for indexing into the output array: > > > float saxpy(float a, float *x, float *y, int n) { > int j = 0; > > for (int i = 0; i < n; ++i) { > y[j] = a*x[i] + y[I]; > ++j; > } > } > > The optimized LLVM IR is likely identical to the original code, but simply adding the pragma wouldn’t be correct. > > — > Mehdi > > > >> >> I was wondering if we could add a link to dawn-cc in the LLVM's >> project page (http://llvm.org/ProjectsWithLLVM/). There are a number >> of papers that describe what dawn-cc does. The main publication is >> this paper: >> >> * Automatic Insertion of Copy Annotation in Data-Parallel Programs - >> SBAC-PAD 2016 >> >> The array size inference analysis comes from this work: >> >> * Runtime Pointer Disambiguation - OOPSLA 2015 >> >> The source code of dawn-cc, including all the static analyses, is available at: >> >> * https://github.com/gleisonsdm/DawnCC-Compiler >> >> And, as I've mentioned, you can try it through an online interface: >> >> * cuda.dcc.ufmg.br/dawn >> >> Feel free to report bugs, or send us questions. >> >> Fernando >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Jonathan Roelofs via llvm-dev
2017-Jan-03 15:17 UTC
[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives
On 12/31/16 12:37 PM, Fernando Magno Quintao Pereira via llvm-dev wrote:> Dear Mehdi, > > I've changed your example a little bit: > > float saxpy(float a, float *x, float *y, int n) { > int j = 0; > for (int i = 0; i < n; ++i) { > y[j] = a*x[i] + y[I]; // Change 'I' into 'j'? > ++j; > } > } > > I get this code below, once I replace 'I' with 'j'. We are copying n > positions of both arrays, 'x' and 'y': > > float saxpy(float a, float *x, float *y, int n) { > int j = 0; > > long long int AI1[6]; > AI1[0] = n + -1; > AI1[1] = 4 * AI1[0]; > AI1[2] = AI1[1] + 4; > AI1[3] = AI1[2] / 4; > AI1[4] = (AI1[3] > 0); > AI1[5] = (AI1[4] ? AI1[3] : 0); > #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) > #pragma acc kernels > for (int i = 0; i < n; ++i) { > y[j] = a * x[i] + y[j]; > ++j; > }I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't correctly parallelizable as written? Jon> } > > Regards, > > Fernando >-- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor Embedded
Mehdi Amini via llvm-dev
2017-Jan-03 17:13 UTC
[llvm-dev] Automatic Insertion of OpenACC/OpenMP directives
> On Jan 3, 2017, at 7:17 AM, Jonathan Roelofs <jonathan at codesourcery.com> wrote: > > > > On 12/31/16 12:37 PM, Fernando Magno Quintao Pereira via llvm-dev wrote: >> Dear Mehdi, >> >> I've changed your example a little bit: >> >> float saxpy(float a, float *x, float *y, int n) { >> int j = 0; >> for (int i = 0; i < n; ++i) { >> y[j] = a*x[i] + y[I]; // Change 'I' into 'j'? >> ++j; >> } >> } >> >> I get this code below, once I replace 'I' with 'j'. We are copying n >> positions of both arrays, 'x' and 'y': >> >> float saxpy(float a, float *x, float *y, int n) { >> int j = 0; >> >> long long int AI1[6]; >> AI1[0] = n + -1; >> AI1[1] = 4 * AI1[0]; >> AI1[2] = AI1[1] + 4; >> AI1[3] = AI1[2] / 4; >> AI1[4] = (AI1[3] > 0); >> AI1[5] = (AI1[4] ? AI1[3] : 0); >> #pragma acc data pcopy(x[0:AI1[5]],y[0:AI1[5]]) >> #pragma acc kernels >> for (int i = 0; i < n; ++i) { >> y[j] = a * x[i] + y[j]; >> ++j; >> } > > I'm not familiar with OpenACC, but doesn't this still have a loop carried dependence on j, and therefore isn't correctly parallelizable as written?That was my original concern as well, but I had forgot that OpenACC pragma are not necessarily saying to the compiler that the loop is parallel: #pragma acc kernels only tells the compiler to “try” to parallelize the loop if it can prove it safe, but: #pragma acc parallel kernels bypasses the compiler checks and force parallelization. The tool takes care of figuring out the sizes of the array AFAIK (haven’t read the paper yet to understand the novelty in the approach here). — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170103/1da218ba/attachment.html>