thr3ads.net - llvm dev - [LLVMdev] parallel loop metadata simplification [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Hal Finkel

2013-Mar-01 17:23 UTC

[LLVMdev] parallel loop metadata simplification

----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov>
> To: "Paul Redmond" <paul.redmond at intel.com>
> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu>
> Sent: Friday, March 1, 2013 11:13:06 AM
> Subject: Re: [LLVMdev] parallel loop metadata simplification
> 
> ----- Original Message -----
> > From: "Paul Redmond" <paul.redmond at intel.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
> > Sent: Friday, March 1, 2013 10:49:24 AM
> > Subject: Re: [LLVMdev] parallel loop metadata simplification
> > 
> > 
> > On 2013-03-01, at 11:35 AM, Hal Finkel wrote:
> > 
> > > ----- Original Message -----
> > >> From: "Paul Redmond" <paul.redmond at
intel.com>
> > >> To: "Hal Finkel" <hfinkel at anl.gov>
> > >> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
> > >> Sent: Friday, March 1, 2013 10:06:51 AM
> > >> Subject: Re: [LLVMdev] parallel loop metadata simplification
> > >> 
> > >> Hi Hal,
> > >> 
> > >> On 2013-02-28, at 9:33 PM, Hal Finkel wrote:
> > >> 
> > >>> ----- Original Message -----
> > >>>> From: "Paul Redmond" <paul.redmond at
intel.com>
> > >>>> To: "llvmdev at cs.uiuc.edu Dev"
<llvmdev at cs.uiuc.edu>
> > >>>> Sent: Thursday, February 28, 2013 1:30:57 PM
> > >>>> Subject: [LLVMdev] parallel loop metadata
simplification
> > >>>> 
> > >>>> Hi,
> > >>>> 
> > >>>> I've been working on clang codegen for #pragma
ivdep and
> > >>>> creating
> > >>>> the
> > >>>> llvm.mem.parallel_loop_access metadata seems quite
difficult.
> > >>>> The
> > >>>> main problem is that there are so many places where
loads and
> > >>>> stores
> > >>>> are created and all of them need to be changed when
emitting a
> > >>>> parallel loop. Note that creating llvm.loop.parallel
is not a
> > >>>> problem.
> > >>>> 
> > >>>> One option is to modify IRBuilder to enable attaching
the
> > >>>> metadata
> > >>>> for loads and stores but that seems like a huge hack.
> > >>> 
> > >>> Can you please explain why this is a bad option? To be
honest,
> > >>> this
> > >>> is what I expected you to do. The IRBuilder sits on top
of all
> > >>> of
> > >>> the loads and stores, and seems like the perfect place to
keep
> > >>> state for something that affects "all" loads
and stores in some
> > >>> code region. In addition, putting this in IRBuilder
should also
> > >>> make it easier to use this feature from other frontends.
> > >> 
> > >> One concern I have is that the llvm.mem.parallel_loop_access
can
> > >> refer to nested loops. Is it possible that instructions in
the
> > >> same
> > >> loop require different metadatas (one is parallel in both
inner
> > >> and
> > >> outer loop and another is only parallel in the inner loop?)
This
> > >> is
> > >> a more general problem I think but the IRBuilder would only
be
> > >> useful for brute force adding the same metadata to all
> > >> instructions
> > >> with mayReadOrWriteMemory().
> > > 
> > > Well, it would be more complicated than that anyway because you
> > > specifically don't want to tag the loads and stores to the
local
> > > alloca'd variable storage locations, only the
"explicit" memory
> > > references. I think an extra boolean parameter to CreateLoad,
> > > etc.
> > > could take care of those? Nevertheless, IRBuilder could keep a
> > > stack of parallel loops to handle nested cases, right?
> > > 
> > 
> > Hmm, I guess I'm missing something. It is my understanding that
all
> > loads and stores require the metadata (and based on the
> > implementation of isAnnotatedParallel). What you describe suggests
> > that a loop goes from non-parallel to parallel after SROA?
> 
> Or after mem2reg; something like that, yes. I think that's right.
On second thought, to be more specific, it might depend on whether the alloca is
inside or outside the loop.

 -Hal
> 
>  -Hal
> 
> > 
> > paul
> > 
> > > -Hal
> > > 
> > >> 
> > >> Thoughts?
> > >> 
> > >> paul
> > >> 
> > >>> 
> > >>> -Hal
> > >>> 
> > >>>> 
> > >>>> I'd like to reopen the discussion on requiring
the
> > >>>> llvm.mem.parallel_loop_access metadata. I understand
the
> > >>>> reason
> > >>>> for
> > >>>> the metadata is to protect against transformations
that may
> > >>>> introduce unsafe parallel memory accesses (the
reg2mem
> > >>>> example.)
> > >>>> I'm
> > >>>> wondering if perhaps we can make the metadata more
> > >>>> user-friendly
> > >>>> by
> > >>>> providing a single loop metadata which can be
expanded into
> > >>>> "safer"
> > >>>> metadata as required. Specifically a loop pass could
be added
> > >>>> that
> > >>>> expands llvm.loop.parallel into
llvm.loop.parallel_protected +
> > >>>> llvm.mem.parallel_loop_access.
> > >>>> 
> > >>>> Perhaps there are simpler options I've
overlooked?
> > >>>> 
> > >>>> paul
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> _______________________________________________
> > >>>> LLVM Developers mailing list
> > >>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
> > >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> > >>>> 
> > >> 
> > >> 
> > 
> > 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Redmond, Paul

2013-Mar-01 21:05 UTC

head link

[LLVMdev] parallel loop metadata simplification

On 2013-03-01, at 12:23 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Hal Finkel" <hfinkel at anl.gov>
>> To: "Paul Redmond" <paul.redmond at intel.com>
>> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
>> Sent: Friday, March 1, 2013 11:13:06 AM
>> Subject: Re: [LLVMdev] parallel loop metadata simplification
>> 
>> ----- Original Message -----
>>> From: "Paul Redmond" <paul.redmond at intel.com>
>>> To: "Hal Finkel" <hfinkel at anl.gov>
>>> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
>>> Sent: Friday, March 1, 2013 10:49:24 AM
>>> Subject: Re: [LLVMdev] parallel loop metadata simplification
>>> 
>>> 
>>> On 2013-03-01, at 11:35 AM, Hal Finkel wrote:
>>> 
>>>> ----- Original Message -----
>>>>> From: "Paul Redmond" <paul.redmond at
intel.com>
>>>>> To: "Hal Finkel" <hfinkel at anl.gov>
>>>>> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at
cs.uiuc.edu>
>>>>> Sent: Friday, March 1, 2013 10:06:51 AM
>>>>> Subject: Re: [LLVMdev] parallel loop metadata
simplification
>>>>> 
>>>>> Hi Hal,
>>>>> 
>>>>> On 2013-02-28, at 9:33 PM, Hal Finkel wrote:
>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Paul Redmond" <paul.redmond at
intel.com>
>>>>>>> To: "llvmdev at cs.uiuc.edu Dev"
<llvmdev at cs.uiuc.edu>
>>>>>>> Sent: Thursday, February 28, 2013 1:30:57 PM
>>>>>>> Subject: [LLVMdev] parallel loop metadata
simplification
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I've been working on clang codegen for #pragma
ivdep and
>>>>>>> creating
>>>>>>> the
>>>>>>> llvm.mem.parallel_loop_access metadata seems quite
difficult.
>>>>>>> The
>>>>>>> main problem is that there are so many places where
loads and
>>>>>>> stores
>>>>>>> are created and all of them need to be changed when
emitting a
>>>>>>> parallel loop. Note that creating
llvm.loop.parallel is not a
>>>>>>> problem.
>>>>>>> 
>>>>>>> One option is to modify IRBuilder to enable
attaching the
>>>>>>> metadata
>>>>>>> for loads and stores but that seems like a huge
hack.
>>>>>> 
>>>>>> Can you please explain why this is a bad option? To be
honest,
>>>>>> this
>>>>>> is what I expected you to do. The IRBuilder sits on top
of all
>>>>>> of
>>>>>> the loads and stores, and seems like the perfect place
to keep
>>>>>> state for something that affects "all" loads
and stores in some
>>>>>> code region. In addition, putting this in IRBuilder
should also
>>>>>> make it easier to use this feature from other
frontends.
>>>>> 
>>>>> One concern I have is that the
llvm.mem.parallel_loop_access can
>>>>> refer to nested loops. Is it possible that instructions in
the
>>>>> same
>>>>> loop require different metadatas (one is parallel in both
inner
>>>>> and
>>>>> outer loop and another is only parallel in the inner loop?)
This
>>>>> is
>>>>> a more general problem I think but the IRBuilder would only
be
>>>>> useful for brute force adding the same metadata to all
>>>>> instructions
>>>>> with mayReadOrWriteMemory().
>>>> 
>>>> Well, it would be more complicated than that anyway because you
>>>> specifically don't want to tag the loads and stores to the
local
>>>> alloca'd variable storage locations, only the
"explicit" memory
>>>> references. I think an extra boolean parameter to CreateLoad,
>>>> etc.
>>>> could take care of those? Nevertheless, IRBuilder could keep a
>>>> stack of parallel loops to handle nested cases, right?
>>>> 
>>> 
>>> Hmm, I guess I'm missing something. It is my understanding that
all
>>> loads and stores require the metadata (and based on the
>>> implementation of isAnnotatedParallel). What you describe suggests
>>> that a loop goes from non-parallel to parallel after SROA?
>> 
>> Or after mem2reg; something like that, yes. I think that's right.
> 
> On second thought, to be more specific, it might depend on whether the
alloca is inside or outside the loop.
> 
I have discovered that you can provide a custom inserter to IRBuilder (who
knew!). This has basically solved all my problems and allowed me to generate the
proper metadata with minimal changes to clang codegen. Currently it adds the
metadata to all loads and stores but I don't think this is a problem and can
be refined later if necessary.

paul
> -Hal
> 
>> 
>> -Hal
>> 
>>> 
>>> paul
>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> paul
>>>>> 
>>>>>> 
>>>>>> -Hal
>>>>>> 
>>>>>>> 
>>>>>>> I'd like to reopen the discussion on requiring
the
>>>>>>> llvm.mem.parallel_loop_access metadata. I
understand the
>>>>>>> reason
>>>>>>> for
>>>>>>> the metadata is to protect against transformations
that may
>>>>>>> introduce unsafe parallel memory accesses (the
reg2mem
>>>>>>> example.)
>>>>>>> I'm
>>>>>>> wondering if perhaps we can make the metadata more
>>>>>>> user-friendly
>>>>>>> by
>>>>>>> providing a single loop metadata which can be
expanded into
>>>>>>> "safer"
>>>>>>> metadata as required. Specifically a loop pass
could be added
>>>>>>> that
>>>>>>> expands llvm.loop.parallel into
llvm.loop.parallel_protected +
>>>>>>> llvm.mem.parallel_loop_access.
>>>>>>> 
>>>>>>> Perhaps there are simpler options I've
overlooked?
>>>>>>> 
>>>>>>> paul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Pekka Jääskeläinen

2013-Mar-01 22:10 UTC

head link

[LLVMdev] parallel loop metadata simplification

On 03/01/2013 11:05 PM, Redmond, Paul wrote:> I have discovered that you can provide a custom inserter to IRBuilder (who
> knew!). This has basically solved all my problems and allowed me to
generate
> the proper metadata with minimal changes to clang codegen. Currently it
adds
> the metadata to all loads and stores but I don't think this is a
problem and
> can be refined later if necessary.
Good. I do not see why allocas should cause problems/special handling
here: if the programmer is declaring a loop as "parallel" and still
writes/reads a stack object in such a way that it adds loop-carried 
dependencies, it's a programmer-error (it's not a parallel loop).

-- 
--Pekka

Tobias Grosser

2013-Mar-02 18:44 UTC

head link

[LLVMdev] parallel loop metadata simplification

On 03/01/2013 10:05 PM, Redmond, Paul wrote:
[...]> I have discovered that you can provide a custom inserter to IRBuilder (who
knew!). This has basically solved all my problems and allowed me to generate the
proper metadata with minimal changes to clang codegen. Currently it adds the
metadata to all loads and stores but I don't think this is a problem and can
be refined later if necessary.
Great that you found a good solution. I have one example, which we may 
want to have a look into:

Given the following input (test.c):

         #pragma ivdep
	for (long i = 0; i < 100; i++) {
		long t = i + 2;
		A[i] = t;
	}

clang creates the following IR for the loop body:

   %1 = load i64* %i, align 8
   %add = add nsw i64 %1, 2
   store i64 %add, i64* %t, align 8
   %2 = load i64* %t, align 8
   %3 = load i64* %i, align 8
   %4 = load i64** %A.addr, align 8
   %arrayidx = getelementptr inbounds i64* %4, i64 %3
   store i64 %2, i64* %arrayidx, align 8
   br label %for.inc

What are the implications of pragma ivdep on this code? My intuition 
would be that 't' is iteration private and that using pragma ivdep for
this loop is correct. If the use of ivdep is correct, it seems necessary 
to _not_ annotate the loads and stores from and to 't'. Only after
't'
is moved into a register, the loop is actually parallel on the IR level.

Cheers,
Tobi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 89 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130302/adb705ea/attachment.c>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Mar 2013 - [LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

[LLVMdev] parallel loop metadata simplification

Apparently Analagous Threads