----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov> > To: "Paul Redmond" <paul.redmond at intel.com> > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, March 1, 2013 11:13:06 AM > Subject: Re: [LLVMdev] parallel loop metadata simplification > > ----- Original Message ----- > > From: "Paul Redmond" <paul.redmond at intel.com> > > To: "Hal Finkel" <hfinkel at anl.gov> > > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > > Sent: Friday, March 1, 2013 10:49:24 AM > > Subject: Re: [LLVMdev] parallel loop metadata simplification > > > > > > On 2013-03-01, at 11:35 AM, Hal Finkel wrote: > > > > > ----- Original Message ----- > > >> From: "Paul Redmond" <paul.redmond at intel.com> > > >> To: "Hal Finkel" <hfinkel at anl.gov> > > >> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > > >> Sent: Friday, March 1, 2013 10:06:51 AM > > >> Subject: Re: [LLVMdev] parallel loop metadata simplification > > >> > > >> Hi Hal, > > >> > > >> On 2013-02-28, at 9:33 PM, Hal Finkel wrote: > > >> > > >>> ----- Original Message ----- > > >>>> From: "Paul Redmond" <paul.redmond at intel.com> > > >>>> To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> > > >>>> Sent: Thursday, February 28, 2013 1:30:57 PM > > >>>> Subject: [LLVMdev] parallel loop metadata simplification > > >>>> > > >>>> Hi, > > >>>> > > >>>> I've been working on clang codegen for #pragma ivdep and > > >>>> creating > > >>>> the > > >>>> llvm.mem.parallel_loop_access metadata seems quite difficult. > > >>>> The > > >>>> main problem is that there are so many places where loads and > > >>>> stores > > >>>> are created and all of them need to be changed when emitting a > > >>>> parallel loop. Note that creating llvm.loop.parallel is not a > > >>>> problem. > > >>>> > > >>>> One option is to modify IRBuilder to enable attaching the > > >>>> metadata > > >>>> for loads and stores but that seems like a huge hack. > > >>> > > >>> Can you please explain why this is a bad option? To be honest, > > >>> this > > >>> is what I expected you to do. The IRBuilder sits on top of all > > >>> of > > >>> the loads and stores, and seems like the perfect place to keep > > >>> state for something that affects "all" loads and stores in some > > >>> code region. In addition, putting this in IRBuilder should also > > >>> make it easier to use this feature from other frontends. > > >> > > >> One concern I have is that the llvm.mem.parallel_loop_access can > > >> refer to nested loops. Is it possible that instructions in the > > >> same > > >> loop require different metadatas (one is parallel in both inner > > >> and > > >> outer loop and another is only parallel in the inner loop?) This > > >> is > > >> a more general problem I think but the IRBuilder would only be > > >> useful for brute force adding the same metadata to all > > >> instructions > > >> with mayReadOrWriteMemory(). > > > > > > Well, it would be more complicated than that anyway because you > > > specifically don't want to tag the loads and stores to the local > > > alloca'd variable storage locations, only the "explicit" memory > > > references. I think an extra boolean parameter to CreateLoad, > > > etc. > > > could take care of those? Nevertheless, IRBuilder could keep a > > > stack of parallel loops to handle nested cases, right? > > > > > > > Hmm, I guess I'm missing something. It is my understanding that all > > loads and stores require the metadata (and based on the > > implementation of isAnnotatedParallel). What you describe suggests > > that a loop goes from non-parallel to parallel after SROA? > > Or after mem2reg; something like that, yes. I think that's right.On second thought, to be more specific, it might depend on whether the alloca is inside or outside the loop. -Hal> > -Hal > > > > > paul > > > > > -Hal > > > > > >> > > >> Thoughts? > > >> > > >> paul > > >> > > >>> > > >>> -Hal > > >>> > > >>>> > > >>>> I'd like to reopen the discussion on requiring the > > >>>> llvm.mem.parallel_loop_access metadata. I understand the > > >>>> reason > > >>>> for > > >>>> the metadata is to protect against transformations that may > > >>>> introduce unsafe parallel memory accesses (the reg2mem > > >>>> example.) > > >>>> I'm > > >>>> wondering if perhaps we can make the metadata more > > >>>> user-friendly > > >>>> by > > >>>> providing a single loop metadata which can be expanded into > > >>>> "safer" > > >>>> metadata as required. Specifically a loop pass could be added > > >>>> that > > >>>> expands llvm.loop.parallel into llvm.loop.parallel_protected + > > >>>> llvm.mem.parallel_loop_access. > > >>>> > > >>>> Perhaps there are simpler options I've overlooked? > > >>>> > > >>>> paul > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> _______________________________________________ > > >>>> LLVM Developers mailing list > > >>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >>>> > > >> > > >> > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On 2013-03-01, at 12:23 PM, Hal Finkel wrote:> ----- Original Message ----- >> From: "Hal Finkel" <hfinkel at anl.gov> >> To: "Paul Redmond" <paul.redmond at intel.com> >> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >> Sent: Friday, March 1, 2013 11:13:06 AM >> Subject: Re: [LLVMdev] parallel loop metadata simplification >> >> ----- Original Message ----- >>> From: "Paul Redmond" <paul.redmond at intel.com> >>> To: "Hal Finkel" <hfinkel at anl.gov> >>> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >>> Sent: Friday, March 1, 2013 10:49:24 AM >>> Subject: Re: [LLVMdev] parallel loop metadata simplification >>> >>> >>> On 2013-03-01, at 11:35 AM, Hal Finkel wrote: >>> >>>> ----- Original Message ----- >>>>> From: "Paul Redmond" <paul.redmond at intel.com> >>>>> To: "Hal Finkel" <hfinkel at anl.gov> >>>>> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >>>>> Sent: Friday, March 1, 2013 10:06:51 AM >>>>> Subject: Re: [LLVMdev] parallel loop metadata simplification >>>>> >>>>> Hi Hal, >>>>> >>>>> On 2013-02-28, at 9:33 PM, Hal Finkel wrote: >>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "Paul Redmond" <paul.redmond at intel.com> >>>>>>> To: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu> >>>>>>> Sent: Thursday, February 28, 2013 1:30:57 PM >>>>>>> Subject: [LLVMdev] parallel loop metadata simplification >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've been working on clang codegen for #pragma ivdep and >>>>>>> creating >>>>>>> the >>>>>>> llvm.mem.parallel_loop_access metadata seems quite difficult. >>>>>>> The >>>>>>> main problem is that there are so many places where loads and >>>>>>> stores >>>>>>> are created and all of them need to be changed when emitting a >>>>>>> parallel loop. Note that creating llvm.loop.parallel is not a >>>>>>> problem. >>>>>>> >>>>>>> One option is to modify IRBuilder to enable attaching the >>>>>>> metadata >>>>>>> for loads and stores but that seems like a huge hack. >>>>>> >>>>>> Can you please explain why this is a bad option? To be honest, >>>>>> this >>>>>> is what I expected you to do. The IRBuilder sits on top of all >>>>>> of >>>>>> the loads and stores, and seems like the perfect place to keep >>>>>> state for something that affects "all" loads and stores in some >>>>>> code region. In addition, putting this in IRBuilder should also >>>>>> make it easier to use this feature from other frontends. >>>>> >>>>> One concern I have is that the llvm.mem.parallel_loop_access can >>>>> refer to nested loops. Is it possible that instructions in the >>>>> same >>>>> loop require different metadatas (one is parallel in both inner >>>>> and >>>>> outer loop and another is only parallel in the inner loop?) This >>>>> is >>>>> a more general problem I think but the IRBuilder would only be >>>>> useful for brute force adding the same metadata to all >>>>> instructions >>>>> with mayReadOrWriteMemory(). >>>> >>>> Well, it would be more complicated than that anyway because you >>>> specifically don't want to tag the loads and stores to the local >>>> alloca'd variable storage locations, only the "explicit" memory >>>> references. I think an extra boolean parameter to CreateLoad, >>>> etc. >>>> could take care of those? Nevertheless, IRBuilder could keep a >>>> stack of parallel loops to handle nested cases, right? >>>> >>> >>> Hmm, I guess I'm missing something. It is my understanding that all >>> loads and stores require the metadata (and based on the >>> implementation of isAnnotatedParallel). What you describe suggests >>> that a loop goes from non-parallel to parallel after SROA? >> >> Or after mem2reg; something like that, yes. I think that's right. > > On second thought, to be more specific, it might depend on whether the alloca is inside or outside the loop. >I have discovered that you can provide a custom inserter to IRBuilder (who knew!). This has basically solved all my problems and allowed me to generate the proper metadata with minimal changes to clang codegen. Currently it adds the metadata to all loads and stores but I don't think this is a problem and can be refined later if necessary. paul> -Hal > >> >> -Hal >> >>> >>> paul >>> >>>> -Hal >>>> >>>>> >>>>> Thoughts? >>>>> >>>>> paul >>>>> >>>>>> >>>>>> -Hal >>>>>> >>>>>>> >>>>>>> I'd like to reopen the discussion on requiring the >>>>>>> llvm.mem.parallel_loop_access metadata. I understand the >>>>>>> reason >>>>>>> for >>>>>>> the metadata is to protect against transformations that may >>>>>>> introduce unsafe parallel memory accesses (the reg2mem >>>>>>> example.) >>>>>>> I'm >>>>>>> wondering if perhaps we can make the metadata more >>>>>>> user-friendly >>>>>>> by >>>>>>> providing a single loop metadata which can be expanded into >>>>>>> "safer" >>>>>>> metadata as required. Specifically a loop pass could be added >>>>>>> that >>>>>>> expands llvm.loop.parallel into llvm.loop.parallel_protected + >>>>>>> llvm.mem.parallel_loop_access. >>>>>>> >>>>>>> Perhaps there are simpler options I've overlooked? >>>>>>> >>>>>>> paul >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> LLVM Developers mailing list >>>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>>>>>> >>>>> >>>>> >>> >>> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >>
On 03/01/2013 11:05 PM, Redmond, Paul wrote:> I have discovered that you can provide a custom inserter to IRBuilder (who > knew!). This has basically solved all my problems and allowed me to generate > the proper metadata with minimal changes to clang codegen. Currently it adds > the metadata to all loads and stores but I don't think this is a problem and > can be refined later if necessary.Good. I do not see why allocas should cause problems/special handling here: if the programmer is declaring a loop as "parallel" and still writes/reads a stack object in such a way that it adds loop-carried dependencies, it's a programmer-error (it's not a parallel loop). -- --Pekka
On 03/01/2013 10:05 PM, Redmond, Paul wrote: [...]> I have discovered that you can provide a custom inserter to IRBuilder (who knew!). This has basically solved all my problems and allowed me to generate the proper metadata with minimal changes to clang codegen. Currently it adds the metadata to all loads and stores but I don't think this is a problem and can be refined later if necessary.Great that you found a good solution. I have one example, which we may want to have a look into: Given the following input (test.c): #pragma ivdep for (long i = 0; i < 100; i++) { long t = i + 2; A[i] = t; } clang creates the following IR for the loop body: %1 = load i64* %i, align 8 %add = add nsw i64 %1, 2 store i64 %add, i64* %t, align 8 %2 = load i64* %t, align 8 %3 = load i64* %i, align 8 %4 = load i64** %A.addr, align 8 %arrayidx = getelementptr inbounds i64* %4, i64 %3 store i64 %2, i64* %arrayidx, align 8 br label %for.inc What are the implications of pragma ivdep on this code? My intuition would be that 't' is iteration private and that using pragma ivdep for this loop is correct. If the use of ivdep is correct, it seems necessary to _not_ annotate the loads and stores from and to 't'. Only after 't' is moved into a register, the loop is actually parallel on the IR level. Cheers, Tobi -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-csrc Size: 89 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130302/adb705ea/attachment.c>