Ramkumar Ramachandra
2013-Jul-14 12:56 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) changed a bunch of btrl/btsl instructions to btr/bts, with the following justification: The inline assembly for the bit operations has been changed to remove explicit sizing hints on the instructions, so the assembler will pick the appropriate instruction forms depending on the architecture and the context. Unfortunately, GNU as does no such thing, and the AT&T syntax manual [1] contains no references to any such inference. As evidenced by the following experiment, gas always disambiguates btr/bts to btrl/btsl. Feed the following input to gas: btrl $1, 0 btr $1, 0 btsl $1, 0 bts $1, 0 Check that btr matches btrl, and bts matches btsl in both cases: $ as --32 -a in.s $ as --64 -a in.s To avoid giving readers the illusion of such an inference, and for clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to disambiguate btr/bts automatically. [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf Cc: Jeremy Fitzhardinge <jeremy at goop.org> Cc: Andi Kleen <ak at linux.intel.com> Cc: Linus Torvalds <torvalds at linux-foundation.org> Cc: Ingo Molnar <mingo at kernel.org> Cc: Thomas Gleixner <tglx at linutronix.de> Cc: Eli Friedman <eli.friedman at gmail.com> Cc: Jim Grosbach <grosbach at apple.com> Cc: Stephen Checkoway <s at pahtak.org> Cc: LLVMdev <llvmdev at cs.uiuc.edu> Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com> --- We discussed this pretty extensively on LLVMDev, but I'm still not sure that I haven't missed something. arch/x86/include/asm/bitops.h | 16 ++++++++-------- arch/x86/include/asm/percpu.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 6dfd019..6ed3d1e 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) : "iq" ((u8)CONST_MASK(nr)) : "memory"); } else { - asm volatile(LOCK_PREFIX "bts %1,%0" + asm volatile(LOCK_PREFIX "btsl %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); } } @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) */ static inline void __set_bit(int nr, volatile unsigned long *addr) { - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); } /** @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) : CONST_MASK_ADDR(nr, addr) : "iq" ((u8)~CONST_MASK(nr))); } else { - asm volatile(LOCK_PREFIX "btr %1,%0" + asm volatile(LOCK_PREFIX "btrl %1,%0" : BITOP_ADDR(addr) : "Ir" (nr)); } @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) static inline void __clear_bit(int nr, volatile unsigned long *addr) { - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); } /* @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); return oldbit; @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm("bts %2,%1\n\t" + asm("btsl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr)); @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) { int oldbit; - asm volatile("btr %2,%1\n\t" + asm volatile("btrl %2,%1\n\t" "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr)); diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 0da5200..fda54c9 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -490,7 +490,7 @@ do { \ #define x86_test_and_clear_bit_percpu(bit, var) \ ({ \ int old__; \ - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ : "=r" (old__), "+m" (var) \ : "dIr" (bit)); \ old__; \ -- 1.8.3.2.736.g869de25
Linus Torvalds
2013-Jul-14 17:19 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
On Sun, Jul 14, 2013 at 5:56 AM, Ramkumar Ramachandra <artagnon at gmail.com> wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thingYes it does.> btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0What the heck is that supposed to show? It shows nothing at all. With an argument of '1', *of*course* gas will use "btsl", since that's the short form. Using the rex-predix and a btsq would be *stupid*. So gas will pick the appropriate form, exactly as claimed. Try some actual relevant test instead: bt %eax,mem bt %rax,mem and notice how they are actually fundamentally different. Test-case: int main(int argc, char **argv) { asm("bt %1,%0":"=m" (**argv): "a" (argc)); asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc))); } and I get 0f a3 02 bt %eax,(%rdx) 48 0f a3 02 bt %rax,(%rdx) exactly as expected and wanted. Now, there are possible cases where you want to make the size explicit because you are mixing memory operand sizes and there can be nasty performance implications of doing a 32-bit write and then doing a 64-bit read of the result. I'm not actually aware of us having ever worried/cared about it, but it's a possible source of trouble: mixing bitop instructions with non-bitop instructions can have some subtle interactions, and you need to be careful, since the size of the operand affects both the offset *and* the memory access size. The access size generally is meaningless from a semantic standpoint (little-endian being the only sane model), but the access size *can* have performance implications for the write queue forwarding. Linus
Ramkumar Ramachandra
2013-Jul-14 18:26 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
Linus Torvalds wrote:>> btrl $1, 0 >> btr $1, 0 >> btsl $1, 0 >> bts $1, 0 > > What the heck is that supposed to show?I was trying to show a reduced case where gas doesn't complain, but llvm-mc does. Try compiling this with llvm-mc, and you'll get: .text btrl $1, 0 in.s:2:1: error: ambiguous instructions require an explicit suffix (could be 'btrw', 'btrl', or 'btrq') btr $1, 0 ^ btsl $1, 0 in.s:4:1: error: ambiguous instructions require an explicit suffix (could be 'btsw', 'btsl', or 'btsq') bts $1, 0 ^ Obviously, I misunderstood something major and screwed up the commit message.> int main(int argc, char **argv) > { > asm("bt %1,%0":"=m" (**argv): "a" (argc)); > asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc))); > }Right, so in: int main(int argc, char **argv) { asm("bts %1,%0":"=m" (**argv): "r" (argc)); asm("btsl %1,%0":"=m" (**argv): "r" (argc)); asm("btr %1,%0":"=m" (**argv): "r" ((unsigned long)(argc))); asm("btrq %1,%0":"=m" (**argv): "r" ((unsigned long)(argc))); } bts disambiguates to btsl, and btr disambiguates to btrq, as advertised. Is it dependent on whether I have a 32-bit machine or 64-bit machine, or just on the operand lengths? Either way, this is not a very enlightening example, because clang also compiles this fine, and doesn't complain about any ambiguity. To see the ambiguity I'm talking about, try to compile linux.git with clang; I'll paste one error: arch/x86/include/asm/bitops.h:129:15: error: ambiguous instructions require an explicit suffix (could be 'btrw', 'btrl', or 'btrq') asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); ^ <inline asm>:1:2: note: instantiated into assembly here btr $0,(%rsi) ^ Since nr is an int, and ADDR is *(volatile long *), this should disambiguate to btrl, right? Any clue why clang is complaining?
Tim Northover
2013-Jul-14 18:35 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
Hi, The issue perhaps wasn't explained ideally (and possibly shouldn't have been CCed directly to you either, so apologies, but now that there *is* a discussion...)> Try some actual relevant test instead: > > bt %eax,mem > bt %rax,mem > > and notice how they are actually fundamentally different. Test-case:I'm coming at this from the compiler side, where the register form is unambiguous and not questioned. The discussion we're having involves only the immediate form of the instruction. GNU as interprets: bt $63, mem as btl $63, mem which may or may not be what the user intended, but is not the same as "btq $63, mem". I'm not an official LLVM spokesperson or anything, but our consensus seems to be that "bt $imm, whatever" is ambiguous (the %eax and %rax versions you quoted disambiguate the width) and should be disallowed by the assembler. The patch we're replying to implements that as a NOP fix to the kernel (GNU as always treats "bt" with an immediate as "btl"). I don't believe there's any situation in which it will produce different code, but it will allow Clang to compile (this part of) the kernel. There is, however, a potential optimisation here for someone who knows their inline asm. Currently "set_bit(63, addr)" will use the "r" version of the constraint even on amd64 targets, materialising 63 with a "movl". With sufficiently clever faff, it could use "btsq" instead. Cheers. Tim.
Jeremy Fitzhardinge
2013-Jul-14 19:10 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing, and the AT&T syntax manual > [1] contains no references to any such inference. As evidenced by the > following experiment, gas always disambiguates btr/bts to btrl/btsl. > Feed the following input to gas: > > btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0When I originally did those patches, I was careful make sure that we didn't give implied sizes to operations with only immediate and/or memory operands because - in general - gas can't infer the operation size from such operands. However, in the case of the bit test/set operations, the memory access size is not really derived from the operation size (the SDM is a bit vague), and even if it were it would be an operation rather than semantic difference. So there's no real problem with gas choosing 'l' as a default size in the absence of any explicit override or constraint.> Check that btr matches btrl, and bts matches btsl in both cases: > > $ as --32 -a in.s > $ as --64 -a in.s > > To avoid giving readers the illusion of such an inference, and for > clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to > disambiguate btr/bts automatically.That sounds reasonable for all other operations because it makes a real semantic difference, but overly strict for bit operations. J> > [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > Cc: Jeremy Fitzhardinge <jeremy at goop.org> > Cc: Andi Kleen <ak at linux.intel.com> > Cc: Linus Torvalds <torvalds at linux-foundation.org> > Cc: Ingo Molnar <mingo at kernel.org> > Cc: Thomas Gleixner <tglx at linutronix.de> > Cc: Eli Friedman <eli.friedman at gmail.com> > Cc: Jim Grosbach <grosbach at apple.com> > Cc: Stephen Checkoway <s at pahtak.org> > Cc: LLVMdev <llvmdev at cs.uiuc.edu> > Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com> > --- > We discussed this pretty extensively on LLVMDev, but I'm still not > sure that I haven't missed something. > > arch/x86/include/asm/bitops.h | 16 ++++++++-------- > arch/x86/include/asm/percpu.h | 2 +- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index 6dfd019..6ed3d1e 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > : "iq" ((u8)CONST_MASK(nr)) > : "memory"); > } else { > - asm volatile(LOCK_PREFIX "bts %1,%0" > + asm volatile(LOCK_PREFIX "btsl %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > } > } > @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > */ > static inline void __set_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); > + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); > } > > /** > @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) > : CONST_MASK_ADDR(nr, addr) > : "iq" ((u8)~CONST_MASK(nr))); > } else { > - asm volatile(LOCK_PREFIX "btr %1,%0" > + asm volatile(LOCK_PREFIX "btrl %1,%0" > : BITOP_ADDR(addr) > : "Ir" (nr)); > } > @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) > > static inline void __clear_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); > + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); > } > > /* > @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" > "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > return oldbit; > @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm("bts %2,%1\n\t" > + asm("btsl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile("btr %2,%1\n\t" > + asm volatile("btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index 0da5200..fda54c9 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -490,7 +490,7 @@ do { \ > #define x86_test_and_clear_bit_percpu(bit, var) \ > ({ \ > int old__; \ > - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > : "=r" (old__), "+m" (var) \ > : "dIr" (bit)); \ > old__; \-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/fd95816e/attachment.html>
Jeremy Fitzhardinge
2013-Jul-14 19:10 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
On 07/14/2013 10:19 AM, Linus Torvalds wrote:> Now, there are possible cases where you want to make the size explicit > because you are mixing memory operand sizes and there can be nasty > performance implications of doing a 32-bit write and then doing a > 64-bit read of the result. I'm not actually aware of us having ever > worried/cared about it, but it's a possible source of trouble: mixing > bitop instructions with non-bitop instructions can have some subtle > interactions, and you need to be careful, since the size of the > operand affects both the offset *and* the memory access size.The SDM entry for BT mentions that the instruction may touch 2 or 4 bytes depending on the operand size, but doesn't specifically mention that a 64 bit operation size touches 8 bytes - and it doesn't mention anything at all about operand size and access size in BTR/BTS/BTC (unless it's implied as part of the discussion about encoding the MSBs of a constant bit offset in the offset of the addressing mode). Is that an oversight?> The > access size generally is meaningless from a semantic standpoint > (little-endian being the only sane model), but the access size *can* > have performance implications for the write queue forwarding.It looks like that if the base address isn't aligned then neither is the generated access, so you could get a protection fault if it overlaps a page boundary, which is a semantic rather than purely operational difference. J -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/22661e3f/attachment.html>
Jeremy Fitzhardinge
2013-Jul-14 19:23 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
(resent without HTML) On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing, and the AT&T syntax manual > [1] contains no references to any such inference. As evidenced by the > following experiment, gas always disambiguates btr/bts to btrl/btsl. > Feed the following input to gas: > > btrl $1, 0 > btr $1, 0 > btsl $1, 0 > bts $1, 0When I originally did those patches, I was careful make sure that we didn't give implied sizes to operations with only immediate and/or memory operands because - in general - gas can't infer the operation size from such operands. However, in the case of the bit test/set operations, the memory access size is not really derived from the operation size (the SDM is a bit vague), and even if it were it would be an operation rather than semantic difference. So there's no real problem with gas choosing 'l' as a default size in the absence of any explicit override or constraint.> Check that btr matches btrl, and bts matches btsl in both cases: > > $ as --32 -a in.s > $ as --64 -a in.s > > To avoid giving readers the illusion of such an inference, and for > clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to > disambiguate btr/bts automatically.That sounds reasonable for all other operations because it makes a real semantic difference, but overly strict for bit operations. J> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > Cc: Jeremy Fitzhardinge <jeremy at goop.org> > Cc: Andi Kleen <ak at linux.intel.com> > Cc: Linus Torvalds <torvalds at linux-foundation.org> > Cc: Ingo Molnar <mingo at kernel.org> > Cc: Thomas Gleixner <tglx at linutronix.de> > Cc: Eli Friedman <eli.friedman at gmail.com> > Cc: Jim Grosbach <grosbach at apple.com> > Cc: Stephen Checkoway <s at pahtak.org> > Cc: LLVMdev <llvmdev at cs.uiuc.edu> > Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com> > --- > We discussed this pretty extensively on LLVMDev, but I'm still not > sure that I haven't missed something. > > arch/x86/include/asm/bitops.h | 16 ++++++++-------- > arch/x86/include/asm/percpu.h | 2 +- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index 6dfd019..6ed3d1e 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > : "iq" ((u8)CONST_MASK(nr)) > : "memory"); > } else { > - asm volatile(LOCK_PREFIX "bts %1,%0" > + asm volatile(LOCK_PREFIX "btsl %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > } > } > @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > */ > static inline void __set_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); > + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); > } > > /** > @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) > : CONST_MASK_ADDR(nr, addr) > : "iq" ((u8)~CONST_MASK(nr))); > } else { > - asm volatile(LOCK_PREFIX "btr %1,%0" > + asm volatile(LOCK_PREFIX "btrl %1,%0" > : BITOP_ADDR(addr) > : "Ir" (nr)); > } > @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr) > > static inline void __clear_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); > + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); > } > > /* > @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" > "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > return oldbit; > @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm("bts %2,%1\n\t" > + asm("btsl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "btr %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr) > { > int oldbit; > > - asm volatile("btr %2,%1\n\t" > + asm volatile("btrl %2,%1\n\t" > "sbb %0,%0" > : "=r" (oldbit), ADDR > : "Ir" (nr)); > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index 0da5200..fda54c9 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -490,7 +490,7 @@ do { \ > #define x86_test_and_clear_bit_percpu(bit, var) \ > ({ \ > int old__; \ > - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \ > : "=r" (old__), "+m" (var) \ > : "dIr" (bit)); \ > old__; \
Jeremy Fitzhardinge
2013-Jul-14 19:23 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
(Resent without HTML) On 07/14/2013 10:19 AM, Linus Torvalds wrote:> Now, there are possible cases where you want to make the size explicit > because you are mixing memory operand sizes and there can be nasty > performance implications of doing a 32-bit write and then doing a > 64-bit read of the result. I'm not actually aware of us having ever > worried/cared about it, but it's a possible source of trouble: mixing > bitop instructions with non-bitop instructions can have some subtle > interactions, and you need to be careful, since the size of the > operand affects both the offset *and* the memory access size.The SDM entry for BT mentions that the instruction may touch 2 or 4 bytes depending on the operand size, but doesn't specifically mention that a 64 bit operation size touches 8 bytes - and it doesn't mention anything at all about operand size and access size in BTR/BTS/BTC (unless it's implied as part of the discussion about encoding the MSBs of a constant bit offset in the offset of the addressing mode). Is that an oversight?> The > access size generally is meaningless from a semantic standpoint > (little-endian being the only sane model), but the access size *can* > have performance implications for the write queue forwarding.It looks like that if the base address isn't aligned then neither is the generated access, so you could get a protection fault if it overlaps a page boundary, which is a semantic rather than purely operational difference. J
Andi Kleen
2013-Jul-14 21:14 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
I think best would be to just find some way to implement LOCK prefix patching using atomic compiler intrinsics and then switch to those Then all this inline assembler horror could be ifdef'ed away for old compilers only, and likely the generated code would be better as the compiler could optimize more. Or just give up on LOCK patching, as single CPU systems and VMs are less and less interesting? -Andi -- ak at linux.intel.com -- Speaking for myself only
H. Peter Anvin
2013-Jul-15 18:47 UTC
[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote:> (resent without HTML) > > On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote: >> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) >> changed a bunch of btrl/btsl instructions to btr/bts, with the following >> justification: >> >> The inline assembly for the bit operations has been changed to remove >> explicit sizing hints on the instructions, so the assembler will pick >> the appropriate instruction forms depending on the architecture and >> the context. >> >> Unfortunately, GNU as does no such thing, and the AT&T syntax manual >> [1] contains no references to any such inference. As evidenced by the >> following experiment, gas always disambiguates btr/bts to btrl/btsl. >> Feed the following input to gas: >> >> btrl $1, 0 >> btr $1, 0 >> btsl $1, 0 >> bts $1, 0 > > When I originally did those patches, I was careful make sure that we > didn't give implied sizes to operations with only immediate and/or > memory operands because - in general - gas can't infer the operation > size from such operands. However, in the case of the bit test/set > operations, the memory access size is not really derived from the > operation size (the SDM is a bit vague), and even if it were it would be > an operation rather than semantic difference. So there's no real > problem with gas choosing 'l' as a default size in the absence of any > explicit override or constraint. > >> Check that btr matches btrl, and bts matches btsl in both cases: >> >> $ as --32 -a in.s >> $ as --64 -a in.s >> >> To avoid giving readers the illusion of such an inference, and for >> clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to >> disambiguate btr/bts automatically. > > That sounds reasonable for all other operations because it makes a real > semantic difference, but overly strict for bit operations. >To be fair, we *ought to* be able to do something like: asm volatile(LOCK_PREFIX "bts%z0 %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); ... but some older version of gcc are broken and emit "ll" rather than "q". Furthermore, since that would actually result in *worse* code emitted overall (unnecessary REX prefixes), I'm not exactly happy on the idea. -hpa
Possibly Parallel Threads
- [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
- [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
- [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
- [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
- [LLVMdev] [PATCH] x86: disambiguate unqualified btr, bts