Wei
2008-Nov-24 15:25 UTC
[LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
> The machines I worked with didn't support any integer ops, but GLSL > let us get by with "emulated" 16 bit integers (storing and operating > on them as floating point; divides required truncation after the op - > that sort of thing).Although my platform indeed supports integer operations, however, it only supports integer +,-,*, not /. The document says if I need to do integer division, I have to convert them to floating point first. Hence, I have similar problems. So... Does your method means you write some codes in your 'frontend' to emit LLVM IR to convert the integer to floating point first, then perform the operations, and then convert the result back to integer? Or you write such codes in your 'backend'? No matter what your answer is, I think the 'frontend' approach is more cleaner than the 'backend' approach (The 'backend' approach is more like a hack?). Am I right? Or writing such mechanism in backend has other advantages?> What I mean is that you can probably get away with LLVM working with > float literals as f32, then converting them to your 24 bit format > during code gen.I think I got you here.> Integers too: let LLVM work with i32 internally, and convert literals > during code gen.Huh.. I think I got you here, too. But I probably don't know how you handle integer constants larger than 24-bit. For example, if I sees the following instructions during code gen: int %a, add int %b, int 0x12345678 Do I have to emit machine instructions similar to the following? int %a, add int %b, int 0x5678 int %c, add int %d, int 0x1234 int %e, add int %c, 1 <--- depends on the result of the first addition However, this means the backend has to remember the register %a now stores low bytes of the result, and the register %c stores the high bytes of the result. This tracking is not an easy job, I think.> I assume you'll be starting with the reference GLSL parser (from > 3DLabs, IIRC - I don't even know if they stil exist, actually)You can find the 3Dlabs frontend here: http://l4.me.uk/static/glsl/ And I don't think anyone has ported this frontend onto LLVM before.> The issue would be that LLVM would want to store register values as 32 > bits - and do all the pointer math that way.I don't really get you here. Why LLVM do all the pointer math in 32-bit just because I store register values as 32-bit?> I haven't had to work with register constraints in LLVM, so I'm not > sure what would be best approach if I/O is done through specific GPRs: > you don't want to reserve those registers for I/O only.... it would > take some exploration.unfortunately~! my platform indeed uses GPRs to do the input/output. My current thought is to compute the amount of used attributes/ varyings in a shader, and reserve same amount GPRs for those attributes/varyings ONLY. Because I think if I have NO memory can spill registers out, there is no much space for the register allocation. The method I might use is to INLINE all functions, and perform the register allocation. This strategy is bad, or course, or do you think of some other better solution? Wei. On Nov 23, 1:37 am, Daniel M Gessel <ges... at apple.com> wrote:> On Nov 22, 2008, at 11:03 AM, Wei wrote: > > > I have 24-bit integer operations as well as 24-bit floating point > > (s7.16) operations. > > > The H/W supports load/store instructions, however, they does suggest > > us not to use these load/store instructions besides debugging purpose. > > That is to say, you can imagine we don't have load/store instructions, > > we don't have memory, we just have registers. > > > I will run OpenGL shading laugnage programs on these chip. > > GLSL doesn't have pointers, so no "generic" load + store simplifying > things. > > Unextended GLSL only requires support for integers in the 16 bit > range, and has no bitwise operations. It also doesn't specify integer > overflow behavior in any way. > > The machines I worked with didn't support any integer ops, but GLSL > let us get by with "emulated" 16 bit integers (storing and operating > on them as floating point; divides required truncation after the op - > that sort of thing). > > Since you have 24 bit integer operations, you're in better shape. > > > About your comments, I (a new LLVM user) have some more questions: > > > 1) You mention "custom handle the conversion of the integer/float > > constants that LLVM spits out", does it means: > > I have to register a callback function which will operate when LLVM > > wants to spits out a constant value to memory. But what about non- > > constant value? > > What I mean is that you can probably get away with LLVM working with > float literals as f32, then converting them to your 24 bit format > during code gen. The specifics depend on how you want to handle > constants in your backend: literals in instructions or a constant pool > are the options I know of. For now, I'm using special "load literal" > instructions, but a constant pool may be more appropriate in the long > run. I'm still learning. > > Integers too: let LLVM work with i32 internally, and convert literals > during code gen. > > Since GLSL doesn't require load/store, and it sounds like your HW may > not 100% reliable for these ops, you want to make sure your code stays > in registers. > > I assume you'll be starting with the reference GLSL parser (from > 3DLabs, IIRC - I don't even know if they stil exist, actually) and > having it generate LLVM IR (has anybody done this before?). This will > give you much more control over the code - Clang is the front end for > the project I'm working on, and it generates stack based code; most of > the stack operations get optimized out by inlining and the mem2reg > pass, but not everything. > > > ex: > > int a; > > and LLVM wants to put a into memory. > > > and I don't really know what the "i32/f32 sounds a good place to > > start" means... > > I mean that having your registers declared as i32 + f32 will probably > work out well, especially since you don't have pointers in your > language. > > The issue would be that LLVM would want to store register values as 32 > bits - and do all the pointer math that way. Depending on how your HW > works, this may or may not be okay. Even then, you might be able to > patch it up if you really needed to store your registers 3 byte aligned. > > Fortunately, this is not an issue with GLSL. > > > 2) I don't know why you mention "I'd assume you'd have intrinsics for > > I/O." > > For GLSL, you have to have some way of reading attributes and > uniforms, exporting to/reading from varyings, etc. > > Different GPUs do things differently of course: in some cases, it's a > matter of certain GPRs being initialized by "fixed function" HW with > input values at the start of the shader and certain GPRs being left > with output values at the end of the shader. Other GPUs require > explicit "export" instructions, perhaps just reads/writes to dedicated > I/O registers. Some have a mix (this is the case for HW I've worked > with). > > If you have export instructions, or even special I/O registers, I was > thinking that they could be represented or accessed by Target specific > ops -intrinsics. You'd have the GLSL front end generate these > intrinsic operations. > > I haven't had to work with register constraints in LLVM, so I'm not > sure what would be best approach if I/O is done through specific GPRs: > you don't want to reserve those registers for I/O only.... it would > take some exploration. > > > > > 3) I don't think I get you about the following statements: > >> If you want to support memory operations, your integers need to > >> support the addressing range correctly - you effectively have 17 bits > >> of mantissa - so it may be a tight squeeze without 24 bit integer ops > >> (shifts and ands and stuff will also be a painful, but that's a more > >> expansive topic). > > Can you give some example? > > Sorry, I was "thinking out loud". > > I made the assumption here that you didn't have 24 bit integer ops, > and that you might try to represent pointers as integers in a single > 24 bit float value (maybe with a 1D texture as your addressable > memory). In that case, you'd have a very limited range. > > But GLSL doesn't have pointers, so this isn't an issue (and 24 bit > integers gives you a decent addressing range for debugging). > > Dan > > > > > Really really thanks about your comments. > > > Wei. > > > On Nov 20, 10:24 pm, Daniel M Gessel <ges... at apple.com> wrote: > >> This is similar to ATI's R300/R420 pixel shaders. I'm familiar with > >> this hardware, but not really an LLVM expert (working on a code > >> generator myself, but learning as I go). > > >> Do you have 24-bit integer operations, or just floating point? > > >> What about load/store? > > >> Are you looking to run large C programs with complex data structures, > >> or just comparatively simple math functions (i.e. a compute > >> "kernel")? > > >> If you only want to support programs that can live entirely within > >> registers, you can custom handle the conversion of the integer/float > >> constants that LLVM spits out and i32/f32 sounds a good place to > >> start > >> - LLVM's mem2reg and inlining is very effective at getting rid the > >> majority of stack operations, and I'd assume you'd have intrinsics > >> for > >> I/O. > > >> If you want to support memory operations, your integers need to > >> support the addressing range correctly - you effectively have 17 bits > >> of mantissa - so it may be a tight squeeze without 24 bit integer ops > >> (shifts and ands and stuff will also be a painful, but that's a more > >> expansive topic). > > >> Dan > > >> On Nov 20, 2008, at 7:46 AM, Wei wrote: > > >>> Because each channel contains 24-bit, so.. what is the > >>> llvm::SimpleValueType I should use for each channel? > >>> the current llvm::SimpleValueType contains i1, i8, i16, i32, i64, > >>> f32, > >>> f64, f80, none of them are fit one channel (24-bit). > > >>> I think I can use i32 or f32 to represent each 24-bit channel, if > >>> the > >>> runtime result of some machine instructions exceeds 23-bit (1 bit is > >>> for sign), then it is an overflow. > >>> Is it correct to claim that the programmers needs to revise his > >>> program to fix this problem? > >>> Am I right or wrong about this thought? > > >>> If there is a chip, whose registers are 24-bit long, and you have to > >>> compile C/C++ programs on it. > >>> How would you represent the following statement? > > >>> int a = 3; > >>> (Programmers think sizeof(int) = 4) > > >>> Wei. > > >>> On Nov 19, 2:01 am, Evan Cheng <evan.ch... at apple.com> wrote: > >>>> Why not model each channel as a separate physical register? > > >>>> Evan > > >>>> On Nov 17, 2008, at 6:36 AM, Wei wrote: > > >>>>> I have a very strange and complicate H/W platform. > >>>>> It has many registers in one format. > >>>>> The register format is: > > >>>>> ------------------------------ > >>>>> ---------------------------------------------------------------------------------------- > >>>>> | 24-bit | 24-bit > >>>>> | 24-bit | 24- > >>>>> bit | > >>>>> ---------------------------------------------------------------------------------------------------------------------- > >>>>> a > >>>>> b > >>>>> c d > > >>>>> There are 4 channels in a register, and each channel contains 24- > >>>>> bit, hence, there are total 96-bit in 'one' register. > >>>>> You can store a 24-bit integer or a s7.16 floating-point data into > >>>>> each channel. > >>>>> You can name each channel 'a', 'b', 'c', 'd'. > > >>>>> Here is an example of the operation in this H/W platform: > > >>>>> ADD R3.ab, R1.abab, R2.bbaa > > >>>>> it means > > >>>>> Add 'abab' channel of R1 and 'bbaa' channel of R2, and > >>>>> put the result into the 'ab' channel of R3. > > >>>>> It's complicate. > >>>>> Imagine a non-existed temp register named 'Rt1', the content of > >>>>> its > >>>>> 'a','b','c','d' channel are got from 'a','b','a','b' channel of > >>>>> R1, > >>>>> and imagine another non-existed temp register named 'Rt2', the > >>>>> content of its 'a','b','c','d' channel are got from > >>>>> 'b','b','a','a' > >>>>> channel of R2. > >>>>> and then add Rt1 & Rt2, put the result to R3 > >>>>> this means > >>>>> the 'a' channel of R3 will be equal to the 'a' channel of Rt1 plus > >>>>> the 'a' channel of Rt2, (i.e. 'a' from R1 + 'b' from R2, because > >>>>> R1.'a'bab and R2.'b'baa) > >>>>> the 'b' channel of R3 will be equal to the 'b' channel of Rt1 plus > >>>>> the 'b' channel of Rt2, (i.e. 'b' from R1 + 'b' from R2, because > >>>>> R1.a'b'ab and R2.b'b'aa) > >>>>> the 'c' channel of R3 will be untouched, the value of the 'c' > >>>>> channel of Rt1 plus the 'c' channel of Rt2 (i.e. 'a' from R1 + 'a' > >>>>> from R2, because R1.ab'a'b and R2.bb'a'a) will be lost. > >>>>> the 'd' channel of R3 will be untouched, too. The value of the 'd' > >>>>> channel of Rt1 plus the 'd' channel of Rt2 (i.e. 'b' from R1 + 'a' > >>>>> from R2, because R1.aba'b' and R2.bba'a') will be lost, too. > > >>>>> I don't know whether I can set the 'type' of such register using a > >>>>> llvm::MVT::SimpleValueType? > >>>>> According the LLVM doc & LLVM source codes, I think > >>>>> llvm::MVT::v8i8, > >>>>> v2f32, etc is used to represent register for SIMD > > ... > > read more » > > _______________________________________________ > LLVM Developers mailing list > LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Daniel M Gessel
2008-Nov-24 16:16 UTC
[LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
Let me clarify - I haven't used LLVM for GLSL - I'm also relatively new to LLVM targeting a modern GPU. My GLSL work was back in the timeframe of AMD's R300/R400 series, which was 4 years ago. On Nov 24, 2008, at 10:25 AM, Wei wrote:>> The machines I worked with didn't support any integer ops, but GLSL >> let us get by with "emulated" 16 bit integers (storing and operating >> on them as floating point; divides required truncation after the op - >> that sort of thing). > > Although my platform indeed supports integer operations, however, it > only supports integer +,-,*, not /. The document says if I need to do > integer division, I have to convert them to floating point first. > Hence, I have similar problems. > > So... > Does your method means you write some codes in your 'frontend' to emit > LLVM IR to convert the integer to floating point first, then perform > the operations, and then convert the result back to integer? > Or you write such codes in your 'backend'? > > No matter what your answer is, I think the 'frontend' approach is more > cleaner than the 'backend' approach (The 'backend' approach is more > like a hack?). Am I right? Or writing such mechanism in backend has > other advantages?IMHO I don't think of the backend approach as a hack: Minimizing the dependencies of the frontend on the target is generally a good thing, assuming you'll possibly be targeting different HW in the future. The backend approach means that integer division is a fairly long code sequence: that's just fine within LLVM.> > >> What I mean is that you can probably get away with LLVM working with >> float literals as f32, then converting them to your 24 bit format >> during code gen. > > I think I got you here. > >> Integers too: let LLVM work with i32 internally, and convert literals >> during code gen. > > Huh.. I think I got you here, too. > But I probably don't know how you handle integer constants larger than > 24-bit. > For example, if I sees the following instructions during code gen: > > int %a, add int %b, int 0x12345678 > > Do I have to emit machine instructions similar to the following? > > int %a, add int %b, int 0x5678 > int %c, add int %d, int 0x1234 > int %e, add int %c, 1 <--- depends on the result of the first addition > > However, this means the backend has to remember the register %a now > stores low bytes of the result, and the register %c stores the high > bytes of the result. This tracking is not an easy job, I think.Unextended GLSL doesn't require support for integers larger than 16 bits.> >> I assume you'll be starting with the reference GLSL parser (from >> 3DLabs, IIRC - I don't even know if they stil exist, actually) > > You can find the 3Dlabs frontend here: > http://l4.me.uk/static/glsl/ > > And I don't think anyone has ported this frontend onto LLVM before. > >> The issue would be that LLVM would want to store register values as >> 32 >> bits - and do all the pointer math that way. > > I don't really get you here. > Why LLVM do all the pointer math in 32-bit just because I store > register values as 32-bit?What I mean is that LLVM would think of your registers as taking 4 bytes in memory, and do all the pointer math that way: multiplying array indexes by 4. This may be fine on your machine, but it seems plausible that you would want 3 byte alignment, and, in that case, you would have to patch things up.> >> I haven't had to work with register constraints in LLVM, so I'm not >> sure what would be best approach if I/O is done through specific >> GPRs: >> you don't want to reserve those registers for I/O only.... it would >> take some exploration. > > unfortunately~! my platform indeed uses GPRs to do the input/output. > My current thought is to compute the amount of used attributes/ > varyings in a shader, and reserve same amount GPRs for those > attributes/varyings ONLY. Because I think if I have NO memory can > spill registers out, there is no much space for the register > allocation. The method I might use is to INLINE all functions, and > perform the register allocation. This strategy is bad, or course, or > do you think of some other better solution?This sounds like a good bringup approach to get you started, both I/O and inlining all functions. I've been learning LLVM as I go - my suspicion is that LLVM can do better on the I/O question with the right register information - as you learn more, some creative approach will present itself. Similarly for inlining - calls and returns can be custom handled - maybe there's a way to tie this in to a customized register allocator... As long as your shaders aren't busting out of your instruction limits (or instruction cache size, depending on the HW), inlining is a good thing. In addition to GLSL, Khronos' recently announced OpenCL which also disallows recursion in part because stack operations are still very slow on GPUs (small dependent load/stores aren't great for the huge pipeline). A random non-expert thought: maybe there's some general approach to non-stack based function calling that could be implemented with a global register allocator and an analysis of the call tree? Dan> > > Wei. > > > On Nov 23, 1:37 am, Daniel M Gessel <ges... at apple.com> wrote: >> On Nov 22, 2008, at 11:03 AM, Wei wrote: >> >>> I have 24-bit integer operations as well as 24-bit floating point >>> (s7.16) operations. >> >>> The H/W supports load/store instructions, however, they does suggest >>> us not to use these load/store instructions besides debugging >>> purpose. >>> That is to say, you can imagine we don't have load/store >>> instructions, >>> we don't have memory, we just have registers. >> >>> I will run OpenGL shading laugnage programs on these chip. >> >> GLSL doesn't have pointers, so no "generic" load + store simplifying >> things. >> >> Unextended GLSL only requires support for integers in the 16 bit >> range, and has no bitwise operations. It also doesn't specify integer >> overflow behavior in any way. >> >> The machines I worked with didn't support any integer ops, but GLSL >> let us get by with "emulated" 16 bit integers (storing and operating >> on them as floating point; divides required truncation after the op - >> that sort of thing). >> >> Since you have 24 bit integer operations, you're in better shape. >> >>> About your comments, I (a new LLVM user) have some more questions: >> >>> 1) You mention "custom handle the conversion of the integer/float >>> constants that LLVM spits out", does it means: >>> I have to register a callback function which will operate when LLVM >>> wants to spits out a constant value to memory. But what about non- >>> constant value? >> >> What I mean is that you can probably get away with LLVM working with >> float literals as f32, then converting them to your 24 bit format >> during code gen. The specifics depend on how you want to handle >> constants in your backend: literals in instructions or a constant >> pool >> are the options I know of. For now, I'm using special "load literal" >> instructions, but a constant pool may be more appropriate in the long >> run. I'm still learning. >> >> Integers too: let LLVM work with i32 internally, and convert literals >> during code gen. >> >> Since GLSL doesn't require load/store, and it sounds like your HW may >> not 100% reliable for these ops, you want to make sure your code >> stays >> in registers. >> >> I assume you'll be starting with the reference GLSL parser (from >> 3DLabs, IIRC - I don't even know if they stil exist, actually) and >> having it generate LLVM IR (has anybody done this before?). This will >> give you much more control over the code - Clang is the front end for >> the project I'm working on, and it generates stack based code; most >> of >> the stack operations get optimized out by inlining and the mem2reg >> pass, but not everything. >> >>> ex: >>> int a; >>> and LLVM wants to put a into memory. >> >>> and I don't really know what the "i32/f32 sounds a good place to >>> start" means... >> >> I mean that having your registers declared as i32 + f32 will probably >> work out well, especially since you don't have pointers in your >> language. >> >> The issue would be that LLVM would want to store register values as >> 32 >> bits - and do all the pointer math that way. Depending on how your HW >> works, this may or may not be okay. Even then, you might be able to >> patch it up if you really needed to store your registers 3 byte >> aligned. >> >> Fortunately, this is not an issue with GLSL. >> >>> 2) I don't know why you mention "I'd assume you'd have intrinsics >>> for >>> I/O." >> >> For GLSL, you have to have some way of reading attributes and >> uniforms, exporting to/reading from varyings, etc. >> >> Different GPUs do things differently of course: in some cases, it's a >> matter of certain GPRs being initialized by "fixed function" HW with >> input values at the start of the shader and certain GPRs being left >> with output values at the end of the shader. Other GPUs require >> explicit "export" instructions, perhaps just reads/writes to >> dedicated >> I/O registers. Some have a mix (this is the case for HW I've worked >> with). >> >> If you have export instructions, or even special I/O registers, I was >> thinking that they could be represented or accessed by Target >> specific >> ops -intrinsics. You'd have the GLSL front end generate these >> intrinsic operations. >> >> I haven't had to work with register constraints in LLVM, so I'm not >> sure what would be best approach if I/O is done through specific >> GPRs: >> you don't want to reserve those registers for I/O only.... it would >> take some exploration. >> >> >> >>> 3) I don't think I get you about the following statements: >>>> If you want to support memory operations, your integers need to >>>> support the addressing range correctly - you effectively have 17 >>>> bits >>>> of mantissa - so it may be a tight squeeze without 24 bit integer >>>> ops >>>> (shifts and ands and stuff will also be a painful, but that's a >>>> more >>>> expansive topic). >>> Can you give some example? >> >> Sorry, I was "thinking out loud". >> >> I made the assumption here that you didn't have 24 bit integer ops, >> and that you might try to represent pointers as integers in a single >> 24 bit float value (maybe with a 1D texture as your addressable >> memory). In that case, you'd have a very limited range. >> >> But GLSL doesn't have pointers, so this isn't an issue (and 24 bit >> integers gives you a decent addressing range for debugging). >> >> Dan >> >> >> >>> Really really thanks about your comments. >> >>> Wei. >> >>> On Nov 20, 10:24 pm, Daniel M Gessel <ges... at apple.com> wrote: >>>> This is similar to ATI's R300/R420 pixel shaders. I'm familiar with >>>> this hardware, but not really an LLVM expert (working on a code >>>> generator myself, but learning as I go). >> >>>> Do you have 24-bit integer operations, or just floating point? >> >>>> What about load/store? >> >>>> Are you looking to run large C programs with complex data >>>> structures, >>>> or just comparatively simple math functions (i.e. a compute >>>> "kernel")? >> >>>> If you only want to support programs that can live entirely within >>>> registers, you can custom handle the conversion of the integer/ >>>> float >>>> constants that LLVM spits out and i32/f32 sounds a good place to >>>> start >>>> - LLVM's mem2reg and inlining is very effective at getting rid the >>>> majority of stack operations, and I'd assume you'd have intrinsics >>>> for >>>> I/O. >> >>>> If you want to support memory operations, your integers need to >>>> support the addressing range correctly - you effectively have 17 >>>> bits >>>> of mantissa - so it may be a tight squeeze without 24 bit integer >>>> ops >>>> (shifts and ands and stuff will also be a painful, but that's a >>>> more >>>> expansive topic). >> >>>> Dan >> >>>> On Nov 20, 2008, at 7:46 AM, Wei wrote: >> >>>>> Because each channel contains 24-bit, so.. what is the >>>>> llvm::SimpleValueType I should use for each channel? >>>>> the current llvm::SimpleValueType contains i1, i8, i16, i32, i64, >>>>> f32, >>>>> f64, f80, none of them are fit one channel (24-bit). >> >>>>> I think I can use i32 or f32 to represent each 24-bit channel, if >>>>> the >>>>> runtime result of some machine instructions exceeds 23-bit (1 >>>>> bit is >>>>> for sign), then it is an overflow. >>>>> Is it correct to claim that the programmers needs to revise his >>>>> program to fix this problem? >>>>> Am I right or wrong about this thought? >> >>>>> If there is a chip, whose registers are 24-bit long, and you >>>>> have to >>>>> compile C/C++ programs on it. >>>>> How would you represent the following statement? >> >>>>> int a = 3; >>>>> (Programmers think sizeof(int) = 4) >> >>>>> Wei. >> >>>>> On Nov 19, 2:01 am, Evan Cheng <evan.ch... at apple.com> wrote: >>>>>> Why not model each channel as a separate physical register? >> >>>>>> Evan >> >>>>>> On Nov 17, 2008, at 6:36 AM, Wei wrote: >> >>>>>>> I have a very strange and complicate H/W platform. >>>>>>> It has many registers in one format. >>>>>>> The register format is: >> >>>>>>> ------------------------------ >>>>>>> ---------------------------------------------------------------------------------------- >>>>>>> | 24-bit | 24-bit >>>>>>> | 24-bit | 24- >>>>>>> bit | >>>>>>> ---------------------------------------------------------------------------------------------------------------------- >>>>>>> a >>>>>>> b >>>>>>> c d >> >>>>>>> There are 4 channels in a register, and each channel contains >>>>>>> 24- >>>>>>> bit, hence, there are total 96-bit in 'one' register. >>>>>>> You can store a 24-bit integer or a s7.16 floating-point data >>>>>>> into >>>>>>> each channel. >>>>>>> You can name each channel 'a', 'b', 'c', 'd'. >> >>>>>>> Here is an example of the operation in this H/W platform: >> >>>>>>> ADD R3.ab, R1.abab, R2.bbaa >> >>>>>>> it means >> >>>>>>> Add 'abab' channel of R1 and 'bbaa' channel of R2, >>>>>>> and >>>>>>> put the result into the 'ab' channel of R3. >> >>>>>>> It's complicate. >>>>>>> Imagine a non-existed temp register named 'Rt1', the content of >>>>>>> its >>>>>>> 'a','b','c','d' channel are got from 'a','b','a','b' channel of >>>>>>> R1, >>>>>>> and imagine another non-existed temp register named 'Rt2', the >>>>>>> content of its 'a','b','c','d' channel are got from >>>>>>> 'b','b','a','a' >>>>>>> channel of R2. >>>>>>> and then add Rt1 & Rt2, put the result to R3 >>>>>>> this means >>>>>>> the 'a' channel of R3 will be equal to the 'a' channel of Rt1 >>>>>>> plus >>>>>>> the 'a' channel of Rt2, (i.e. 'a' from R1 + 'b' from R2, because >>>>>>> R1.'a'bab and R2.'b'baa) >>>>>>> the 'b' channel of R3 will be equal to the 'b' channel of Rt1 >>>>>>> plus >>>>>>> the 'b' channel of Rt2, (i.e. 'b' from R1 + 'b' from R2, because >>>>>>> R1.a'b'ab and R2.b'b'aa) >>>>>>> the 'c' channel of R3 will be untouched, the value of the 'c' >>>>>>> channel of Rt1 plus the 'c' channel of Rt2 (i.e. 'a' from R1 + >>>>>>> 'a' >>>>>>> from R2, because R1.ab'a'b and R2.bb'a'a) will be lost. >>>>>>> the 'd' channel of R3 will be untouched, too. The value of the >>>>>>> 'd' >>>>>>> channel of Rt1 plus the 'd' channel of Rt2 (i.e. 'b' from R1 + >>>>>>> 'a' >>>>>>> from R2, because R1.aba'b' and R2.bba'a') will be lost, too. >> >>>>>>> I don't know whether I can set the 'type' of such register >>>>>>> using a >>>>>>> llvm::MVT::SimpleValueType? >>>>>>> According the LLVM doc & LLVM source codes, I think >>>>>>> llvm::MVT::v8i8, >>>>>>> v2f32, etc is used to represent register for SIMD >> >> ... >> >> read more » >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Nico
2008-Nov-25 13:10 UTC
[LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
Hi, perhaps a little bit off topic, but I read 'OpenCL': OpenCL is very often mentioned with LLVM and Clang. Is it possible to use OpenCL with LLVM/Clang (I mean the official repository) by now? Or is there a schedule which shows when we will see OpenCL-support in LLVM/Clang? Thanks, Nico On Nov 24, 2008, at 5:16 PM, Daniel M Gessel wrote:> In addition to GLSL, Khronos' recently announced OpenCL which also > disallows recursion in part because stack operations are still very > slow on GPUs (small dependent load/stores aren't great for the huge > pipeline). A random non-expert thought: maybe there's some general > approach to non-stack based function calling that could be implemented > with a global register allocator and an analysis of the call tree?
Reasonably Related Threads
- [LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
- [LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
- [LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
- [LLVMdev] Does current LLVM target-independent code generator supports my strange chip?
- [LLVMdev] Does current LLVM target-independent code generator supports my strange chip?