Kyle Moffett
2009-Jan-28 01:45 UTC
[LLVMdev] inline asm semantics: output constraint width smaller than input
On Tue, Jan 27, 2009 at 4:25 PM, H. Peter Anvin <hpa at zytor.com> wrote:> However, things get a bit ugly in the case of different widths that affect > individually scheduled registers, like 32- and 64-bit types on a 32-bit > machine. Consider the case above where "bar" is a 64-bit type and "baz" is > a 32-bit type, then you functionally have, at least on x86: > > uint64_t tmp = bar; > asm("foo" : "+r" (tmp)); > baz = (uint32_t)tmp; > > One could possibly argue that the latter case should be > "baz = (uint32_t)(tmp >> 32);" on a bigendian machine... since this is a gcc > syntax it probably should be "whatever gcc does" in that case, as opposed to > what might make sense. > > (I'm afraid I don't have a bigendian box readily available at the moment, so > I can't test it out to see what gcc does. I have a powerpc machine, but > it's at home and turned off.)Actually, PPC64 boxes basically don't care... the usable GPRs are all either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit instructions are identical across both, they just truncate/sign-extend/etc based on the lower 32-bits of the register. Also, you would only do a right-shift if you were going all the way out to memory as 64-bit and all the way back in as 32-bit... within a single register it's kept coherent. Structs are basically irrelevant for inline ASM as you can't pass a struct to one... you can only pass the *address* of a struct, which is always pointer-sized. I think that really the only sane solution (which is hopefully what GCC does) for integer types is to use a register the same size as the larger of the two integers. Then you copy the value to/from the smaller register (or just mask it on PPC64-alike architectures) before or after the inline ASM. Cheers, Kyle Moffett
H. Peter Anvin
2009-Jan-28 01:56 UTC
[LLVMdev] inline asm semantics: output constraint width smaller than input
Kyle Moffett wrote:> > Actually, PPC64 boxes basically don't care... the usable GPRs are all > either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit > instructions are identical across both, they just > truncate/sign-extend/etc based on the lower 32-bits of the register. > Also, you would only do a right-shift if you were going all the way > out to memory as 64-bit and all the way back in as 32-bit... within a > single register it's kept coherent. >Think about a 64-bit integer on ppc32. It will by necessity kept in two registers. On gcc I believe it will always be a consecutive pair of registers (AFAIK that's a hard-coded assumption in gcc, with the result that gcc has a nonstandard internal register numbering for x86 since the commonly used dx:ax pair is actually registers 2:0 in the hardware numbering.)> Structs are basically irrelevant for inline ASM as you can't pass a > struct to one... you can only pass the *address* of a struct, which is > always pointer-sized.Right, of course.> I think that really the only sane solution (which is hopefully what > GCC does) for integer types is to use a register the same size as the > larger of the two integers. Then you copy the value to/from the > smaller register (or just mask it on PPC64-alike architectures) before > or after the inline ASM.Pretty much. Then you can do conventional copy propagation and elimination after expanding subregisters to get rid of the extra ops in the common case. -hpa
Kyle Moffett
2009-Jan-28 13:28 UTC
[LLVMdev] inline asm semantics: output constraint width smaller than input
On Tue, Jan 27, 2009 at 8:56 PM, H. Peter Anvin <hpa at zytor.com> wrote:> Kyle Moffett wrote: >> Actually, PPC64 boxes basically don't care... the usable GPRs are all >> either 32-bit (for PPC32) or 64-bit (for PPC64), the <=32-bit >> instructions are identical across both, they just >> truncate/sign-extend/etc based on the lower 32-bits of the register. >> Also, you would only do a right-shift if you were going all the way >> out to memory as 64-bit and all the way back in as 32-bit... within a >> single register it's kept coherent. > > Think about a 64-bit integer on ppc32. It will by necessity kept in two > registers. On gcc I believe it will always be a consecutive pair of > registers (AFAIK that's a hard-coded assumption in gcc, with the result that > gcc has a nonstandard internal register numbering for x86 since the commonly > used dx:ax pair is actually registers 2:0 in the hardware numbering.)Even in the 64-bit-integer on 32-bit-CPU case, you still end up with the lower 32-bits in a standard integer GPR, and it's trivial to just ignore the "upper" register. You also would not need to do any kind of bit-shift, so long as your inline assembly initializes both GPRs and puts the halves of the result where they belong. Cheers, Kyle Moffett
Maybe Matching Threads
- [LLVMdev] inline asm semantics: output constraint width smaller than input
- [LLVMdev] inline asm semantics: output constraint width smaller than input
- [LLVMdev] inline asm semantics: output constraint width smaller than input
- [LLVMdev] inline asm semantics: output constraint width smaller than input
- [LLVMdev] inline asm semantics: output constraint width smaller than input