Tim Northover
2013-Aug-09 12:12 UTC
[LLVMdev] [global-isel] Type-independence of load/store
Hi Jakob, Sounds like a really exciting topic; I'd love to be involved in implementation. I've not really had time to think about the implications of the larger picture, but one detail did strike me on the first read-through:> On the other hand, when types are not used to select register banks, it > becomes really difficult to explain the difference between load i32 and load > f32. The hardware doesn't care either, it simply knows how to load 32 bits > into a given register.> We can use a three-level hierarchical type system to > better describe this:That may be something we want to be flexible about. I know we don't support big-endian ARM at the moment, but its NEON load/store instructions do take an interest in more than just the total number of bits I think. vst1.16 {d0}, [r0] vst1.64 {d0}, [r0] would give byte-wise layouts of 1 0 3 2 5 4 7 6 7 6 5 4 3 2 1 0 Other big-endian targets may have similar issues, but I know virtually nothing about them. Of course, if those three categories are just helpful mental models (perhaps with convenience functions) then there's likely no issue. We probably shouldn't go around discarding the "irrelevant" information though. Cheers. Tim.
Jakob Stoklund Olesen
2013-Aug-09 18:50 UTC
[LLVMdev] [global-isel] Type-independence of load/store
On Aug 9, 2013, at 5:12 AM, Tim Northover <t.p.northover at gmail.com> wrote:> Sounds like a really exciting topic; I'd love to be involved in > implementation.We need all the volunteers we can get. ;)>> On the other hand, when types are not used to select register banks, it >> becomes really difficult to explain the difference between load i32 and load >> f32. The hardware doesn't care either, it simply knows how to load 32 bits >> into a given register. > >> We can use a three-level hierarchical type system to >> better describe this: > > That may be something we want to be flexible about. I know we don't > support big-endian ARM at the moment, but its NEON load/store > instructions do take an interest in more than just the total number of > bits I think. > > vst1.16 {d0}, [r0] > vst1.64 {d0}, [r0] > would give byte-wise layouts of > 1 0 3 2 5 4 7 6 > 7 6 5 4 3 2 1 0 > > Other big-endian targets may have similar issues, but I know virtually > nothing about them.ARM’s is an interesting implementation of big-endian vectors. AFAIK, other architectures go all in and use both big-endian lanes and elements. That makes the problem go away, and you only need one load instruction. Note that LLVM IR requires a bitcast to be equivalent to storing one type and loading the other, and it seems that this would turn a bitcast into a kind of shuffle. I think Dan has opinions on this particular topic. Thanks, /jakob
Tim Northover
2013-Aug-09 20:11 UTC
[LLVMdev] [global-isel] Type-independence of load/store
> ARM’s is an interesting implementation of big-endian vectors. > AFAIK, other architectures go all in and use both big-endian > lanes and elements. That makes the problem go away, and you > only need one load instruction.Hmm, I suppose the "cost" is that any instruction referring to lanes has to behave differently under big and little endian conditions. Not an issue if you only support one, of course.> Note that LLVM IR requires a bitcast to be equivalent to storing one > type and loading the other, and it seems that this would turn a > bitcast into a kind of shuffle.Interesting. We'll have some fun if we ever try that, I think! Tim.
Daniel Sanders
2013-Aug-12 14:06 UTC
[LLVMdev] [global-isel] Type-independence of load/store
> > Other big-endian targets may have similar issues, but I know virtually > > nothing about them. > > ARM's is an interesting implementation of big-endian vectors. AFAIK, other > architectures go all in and use both big-endian lanes and elements. That > makes the problem go away, and you only need one load instruction.The recently published MIPS SIMD Architecture (MSA) has the same issue for big-endian vectors. There's a small non-functional benefit to accounting for this in little-endian too. For little-endian mode, the emitted code is a bit easier to understand if the 'correct' loads and stores are used. Daniel Sanders Leading Software Design Engineer, MIPS Processor IP Imagination Technologies Limited www.imgtec.com