thr3ads.net - search: "int_fetch

Displaying 5 results from an estimated 5 matches for "int_fetch_add".

2012 Feb 07

[LLVMdev] Vectorization: Next Steps

...i] = 0; > > for (unsigned i = 0; i < n; i++) > count[src[i]]++; > > start[0] = 0; > for (unsigned i = 1; i < buckets; i++) > start[i] = start[i - 1] + count[i - 1]; > > #pragma assert parallel > for (unsigned i = 0; i < n; i++) { > unsigned loc = int_fetch_add(start + src[i], 1); Should this be: unsigned loc = int_fetch_add(start[src[i]], 1); > dst[loc] = src[i]; > } > > > The 1st loop is trivially parallel. I think Polly would recognize > this and do good things. This case is trivial. But keep in mind that unsigned loop ivs a...

[LLVMdev] Vectorization: Next Steps

2012 Feb 07

[LLVMdev] Vectorization: Next Steps

...i = 0; i < n; i++) >> count[src[i]]++; >> >> start[0] = 0; >> for (unsigned i = 1; i < buckets; i++) >> start[i] = start[i - 1] + count[i - 1]; >> >> #pragma assert parallel >> for (unsigned i = 0; i < n; i++) { >> unsigned loc = int_fetch_add(start + src[i], 1); >> dst[loc] = src[i]; >> } > Should this be: > > unsigned loc = int_fetch_add(start[src[i]], 1); Our intrinsic wants a pointer, so either int_fetch_add(start + src[i], 1) or int_fetch_add(&start[src[i]], 1) wil work. >> The 1st loop is tri...

[LLVMdev] Vectorization: Next Steps

2012 Feb 06

[LLVMdev] Vectorization: Next Steps

...r (unsigned i = 0; i < buckets; i++) count[i] = 0; for (unsigned i = 0; i < n; i++) count[src[i]]++; start[0] = 0; for (unsigned i = 1; i < buckets; i++) start[i] = start[i - 1] + count[i - 1]; #pragma assert parallel for (unsigned i = 0; i < n; i++) { unsigned loc = int_fetch_add(start + src[i], 1); dst[loc] = src[i]; } The 1st loop is trivially parallel. I think Polly would recognize this and do good things. The 2nd loop has a race condition that can be handled by using an atomic increment provided by the architecture, if the compiler knows about such things. I don...

[LLVMdev] Vectorization: Next Steps

2012 Feb 06

[LLVMdev] Vectorization: Next Steps

On Sat, Feb 4, 2012 at 2:27 PM, Hal Finkel <hfinkel at anl.gov> wrote: > On Fri, 2012-02-03 at 20:59 -0800, Preston Briggs wrote: >> so are building a dependence graph for a complete function. Of >> course, such a thing is useful for vectorization and all sorts of >> other dependence-based loop transforms. >> >> I'm looking at the problem in two parts:

[LLVMdev] Vectorization: Next Steps

2012 Feb 08

[LLVMdev] Vectorization: Next Steps

...tools people commonly use annotations to provide information about live-in, live-out values, the sizes of arrays, ... It makes perfect sense to state the absence of dependences. >>> #pragma assert parallel >>> for (unsigned i = 0; i< n; i++) { >>> unsigned loc = int_fetch_add(&start[src[i]], 1); >>> dst[loc] = src[i]; >>> } >> >> As the int_fetch_add is side effect free, it is fully >> polyhedral. It can be implemented with the relevant LLVM >> atomicrmw instruction [1]. Polly does not yet allow atomicrmw instructions, &gt...

search for: int_fetch_add