Displaying 5 results from an estimated 5 matches for "gpuocelot".
2012 Apr 08
1
[LLVMdev] LLVM show error preprocessor "Must #define __STDC_LIMIT_MACROS before #including Support/DataTypes.h"
Hello All,
I build source code of Ocelot[http://code.google.com/p/gpuocelot/]. It
using LLVM dependency of Ocelot. llvm-config get cppflags represent as
below in order to build with Ocelot.
./llvm-config --cppflags
-I/home/chatsiri/workspacecpp/llvm/include
-I/home/chatsiri/workspacecpp/llvm/include -D_DEBUG -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS...
2013 May 08
0
[LLVMdev] Predicated Vector Operations
...resumably
%newvalue will be consumed, possibly by another arithmetic operation.
Presumably %oldvalue can similarly come from a previous arithmetic
operation feeding into the add. If that's true, then %oldvalue is
either %x or %y. Otherwise it is some other thing highly
context-dependent.
The gpuocelot project ran into the problem and they talk about it here:
http://code.google.com/p/gpuocelot/source/browse/wiki/LLVM.wiki?r=272
The bottom line is that it is probably easier to set this up before LLVM
IR goes into SSA form.
There is a lot of interest in predication and a lot of recent
discussion...
2009 Oct 12
0
[LLVMdev] Re presenting SIMT programs in LLVM
...uld like to start by thanking every developer who has contributed to LLVM
for releasing such a high quality project. It has been incredibly valuable
to several projects that I have worked on.
My name is Gregory Diamos, I am a PhD student at Georgia Tech working on
Ocelot (http://code.google.com/p/gpuocelot/). Ocelot is a dynamic binary
translator from PTX (a virtual instruction set used by NVIDIA GPUs) to
multi-core x86. We currently use LLVM's JIT as our x86 code generator. We
have a prototype implementation finished that can execute most CUDA
applications on our google code page using LLVM a...
2013 May 07
6
[LLVMdev] Predicated Vector Operations
I'm trying to understand how predicated/masked instructions can be
generated in llvm, specifically an instruction where a set bit in the mask
will write the new result into the corresponding vector lane in the
destination and a clear bit will cause the lane in the destination to
remain what it was before the instruction executed.
I've seen a few places that suggest 'select' is the
2013 May 02
8
[LLVMdev] Handling Masked Vector Operations
...tion is to create an intrinsic:
llvm_int_load_masked mask, [addr]
But this unnecessarily shuts down optimization.
Similar problems exist with any trapping instruction (div, mod, etc.).
It gets even worse when you consider than any floating point operation
can trap on a signalling NaN input.
The gpuocelot project is essentially trying to do the same thing but I
haven't dived deep enough into their notes and implementation to see how
they handle this issue. Perhaps because current GPUs don't trap it's a
non-issue. But that will likely change in the future.
So are there any ideas out th...