Displaying 20 results from an estimated 10000 matches similar to: "Is this undefined behavior optimization legal?"
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
On 06/13/2017 07:33 PM, Matt Arsenault wrote:
>
>> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote:
>>
>> On 06/12/2017 08:03 PM, Connor Abbott wrote:
>>> On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote:
>>>> On
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/14/2017 05:05 PM, Connor Abbott wrote:
> On Tue, Jun 13, 2017 at 6:13 PM, Tom Stellard <tstellar at redhat.com> wrote:
>> On 06/13/2017 07:33 PM, Matt Arsenault wrote:
>>>
>>>> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote:
>>>>
>>>> On 06/12/2017 08:03 PM, Connor
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
I'm wondering about the focus on bound_cntl. Any cleared bit in the row_mask or bank_mask will also disable updating the result.
Brian
-----Original Message-----
From: Connor Abbott [mailto:cwabbott0 at gmail.com]
Sent: Wednesday, June 14, 2017 6:13 PM
To: tstellar at redhat.com
Cc: Matt Arsenault; llvm-dev at lists.llvm.org; Kolton, Sam; Sumner, Brian; Pykhtin, Valery
Subject: Re:
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
Hi all,
I've been looking into how to implement the more advanced Shader Model
6 reduction operations in radv (and obviously most of the work would
be useful for radeonsi too). They're explained in the spec for
GL_AMD_shader_ballot at
https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_ballot.txt,
but I'll summarize them here. There are two types of operations:
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/12/2017 08:03 PM, Connor Abbott wrote:
> On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com> wrote:
>> On 06/12/2017 07:15 PM, Tom Stellard via llvm-dev wrote:
>>> cc some people who have worked on this.
>>>
>>> On 06/12/2017 05:58 PM, Connor Abbott via llvm-dev wrote:
>>>> Hi all,
>>>>
>>>>
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
Sorry about the formatting...
Anyway, I think there may be a misinterpretation of bound_cntl. My understanding is that:
0 => if the source is invalid or disabled, do not write a result
1 => if the source is invalid or disabled, use a 0 instead
So the problematic case is where bound_cntl is 0, not when it is 1.
-----Original Message-----
From: Tom Stellard [mailto:tstellar at redhat.com]
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/12/2017 07:15 PM, Tom Stellard via llvm-dev wrote:
> cc some people who have worked on this.
>
> On 06/12/2017 05:58 PM, Connor Abbott via llvm-dev wrote:
>> Hi all,
>>
>> I've been looking into how to implement the more advanced Shader Model
>> 6 reduction operations in radv (and obviously most of the work would
>> be useful for radeonsi too).
2016 Sep 19
3
[arm, aarch64] Alignment checking in interleaved access pass
Hi,
As a follow up to Patch D23646 <https://reviews.llvm.org/D23646>, I'm
trying to figure out if there should be an alignment check and what the
correct approach is.
Some background:
For stores, the pass turns:
%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,
<0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
store <12 x i32> %i.vec, <12 x i32>* %ptr
2016 Jun 24
6
RFC: Strong GC References in LLVM
This is a proposal to add strong GC reference types to LLVM.
We have some local (downstream) patches that are needed to prevent
LLVM's optimizer from making transforms that are problematic in the
presence of a precise relocating GC. Adding a notion of a strong GC
reference to LLVM will let us upstream these patches in a principled
manner, and will act as a measure to avoid new problematic
2017 Nov 14
2
[SCEV][ScalarEvolution] SE limitation impacting LV
Hi!
I would appreciate some feedback from someone with experience in SCEV/SE. D39346 tries to fix an issue in LV (PR34965) that exposes a limitation in SCEV/SE. The best solution to the LV issue might not be a fix at SCEV/SE level but we may want to report/address SCEV/SE limitation as well.
For the snippet below, LV expects SE to return a SCEVAddRecExpr for %21. However, SE returns ((4 * (zext
2015 Oct 24
2
[AMDGPU] AMDGPUAsmParser fails to parse several instructions
Thanks you. I'm new to LLVM backend, so the help is much appreciated.
On Sat, Oct 24, 2015 at 2:12 AM, Matt Arsenault <arsenm2 at gmail.com> wrote:
>
> > On Oct 23, 2015, at 3:36 AM, 李弘宇 via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
>
> > The first line has the following error message:
> >
> > sop1-playground.s:1:15: error: invalid immediate:
2015 Feb 09
2
[LLVMdev] DataLayout missing in isDereferenceablePointer()
Eric Christopher wrote:
> How are you trying to call it? Do you have a DataLayout?
In test/Analysis/ValueTracking/memory-dereferenceable.ll, just change
byval to dereferenceable(8), and %dparam won't match (see
lib/IR/Value.cpp:521 for the logic that is supposed to fire). How do I
get it to pass? I tried introducing a target-triple and
target-datalayout, but it didn't help.
2013 Nov 28
2
[LLVMdev] [llvm] r195903 - AArch64: Fix a bug about disassembling post-index load single element to 4 vectors
I"m getting build errors I think from one of your patches O tjoml.
You need to have a build area that builds with clang and does warnings
as errors to avoid these issues on putback.
here is my configure step for example:
/home/rkotler/llvm_trunk/configure --enable-werror
--prefix=/home/rkotler/ll
vm/install CC=/home/rkotler/llvm_3_2/install/bin/clang
CXX=/home/rkotler/llvm_3_
2013 Nov 28
1
[LLVMdev] [llvm] r195903 - AArch64: Fix a bug about disassembling post-index load single element to 4 vectors
I'm still seeing this problem.
On 11/28/2013 09:37 AM, NAKAMURA Takumi wrote:
> It is r195843 and fixed in r195905, FYI.
>
> 2013/11/29 Reed Kotler <rkotler at mips.com>:
>> I"m getting build errors I think from one of your patches O tjoml.
>>
>> You need to have a build area that builds with clang and does warnings as
>> errors to avoid these
2017 Nov 22
2
[SCEV][ScalarEvolution] SE limitation impacting LV
Thanks for the feedback, Sanjoy.
> SCEV is fairly conservative around PHI nodes that aren't recurrences and aren't obviously equivalent to a min-max branch-phi idiom. Is that the limitation you're running into here?
Yes, that's exactly the problem. The problematic PHI nodes (%bc.resume.val and %bc.resume.val1) aren't either recurrences or related to min-max idioms. I
2011 Sep 26
4
Testing for arguments in a function
I don't understand how this function can subset by i when i is missing....
## My function:
myfun = function(vec, i){
ret = vec[i]
ret
}
## My data:
i = 10
vec = 1:100
## Expected input and behavior:
myfun(vec, i)
## Missing an argument, but error is not caught!
## How is subsetting even possible here???
myfun(vec)
Is there a way to check for missing function arguments, *and*
2007 Dec 17
2
[LLVMdev] Elsa and LLVM and LLVM submissions
Devang Patel wrote:
> On Dec 15, 2007, at 12:15 PM, Richard Pennington wrote:
>
>> I got the current version of LLVM via svn yesterday and modified my
>> code to
>> use the LLVMFoldingBuilder. Very nice!
>>
>> My question is this: I noticed that the folding builder doesn't fold
>> some
>> operations, e.g. casts. Is there some reason why? If
2014 Apr 04
2
[LLVMdev] How should I update LiveIntervals after removing a use of a register?
Hi,
I am working on a simple copy propagation pass for the R600 backend that
propagates immediates rather than registers. For example, I want to
transform:
...
%vreg1 = V_MOV_B32 1
%vreg2 = V_ADD_I32 %vreg1, %vreg0
...
into:
%vreg1 = V_MOV_B32 1 ; <- Only delete this if it is dead
%vreg2 = V_ADD_I32 1, %vreg0
For best results, I am trying to run this pass after the
TwoAddressInstruction
2014 Oct 03
2
[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
Hi Tom, Matt,
I'm running into strange issues with the cos test (piglit
generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)
I have been seeing random failures (incorrect results) for some time and
tried to investigate. the weird part is that the failures are not 100%
reproducible, sometimes the tests pass, or partly pass
(it's usually float8 and float16 subtests that
2016 May 18
3
sum elements in the vector
Hi Rail,
We used a very simple pattern expansion (actually, not a pattern fragment). For example, for AND, ADD (horizontal sum), OR and XOR of 4 elements we use something like the following TableGen structure:
class HORIZ_Op4<SDNode opc, RegisterClass regVT, ValueType rt, ValueType vt, string asmstr> :
SHAVE_Instr<(outs regVT:$dst), (ins VRF128:$src),