thr3ads.net - llvm dev - [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Ramkumar Ramachandra

2013-Jul-14 12:56 UTC

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
changed a bunch of btrl/btsl instructions to btr/bts, with the following
justification:

  The inline assembly for the bit operations has been changed to remove
  explicit sizing hints on the instructions, so the assembler will pick
  the appropriate instruction forms depending on the architecture and
  the context.

Unfortunately, GNU as does no such thing, and the AT&T syntax manual
[1] contains no references to any such inference.  As evidenced by the
following experiment, gas always disambiguates btr/bts to btrl/btsl.
Feed the following input to gas:

  btrl	$1, 0
  btr	$1, 0
  btsl	$1, 0
  bts	$1, 0

Check that btr matches btrl, and bts matches btsl in both cases:

  $ as --32 -a in.s
  $ as --64 -a in.s

To avoid giving readers the illusion of such an inference, and for
clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
disambiguate btr/bts automatically.

[1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf

Cc: Jeremy Fitzhardinge <jeremy at goop.org>
Cc: Andi Kleen <ak at linux.intel.com>
Cc: Linus Torvalds <torvalds at linux-foundation.org>
Cc: Ingo Molnar <mingo at kernel.org>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: Eli Friedman <eli.friedman at gmail.com>
Cc: Jim Grosbach <grosbach at apple.com>
Cc: Stephen Checkoway <s at pahtak.org>
Cc: LLVMdev <llvmdev at cs.uiuc.edu>
Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com>
---
 We discussed this pretty extensively on LLVMDev, but I'm still not
 sure that I haven't missed something.

 arch/x86/include/asm/bitops.h | 16 ++++++++--------
 arch/x86/include/asm/percpu.h |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 6dfd019..6ed3d1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
 			: "iq" ((u8)CONST_MASK(nr))
 			: "memory");
 	} else {
-		asm volatile(LOCK_PREFIX "bts %1,%0"
+		asm volatile(LOCK_PREFIX "btsl %1,%0"
 			: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
 	}
 }
@@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
  */
 static inline void __set_bit(int nr, volatile unsigned long *addr)
 {
-	asm volatile("bts %1,%0" : ADDR : "Ir" (nr) :
"memory");
+	asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) :
"memory");
 }
 
 /**
@@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
 			: CONST_MASK_ADDR(nr, addr)
 			: "iq" ((u8)~CONST_MASK(nr)));
 	} else {
-		asm volatile(LOCK_PREFIX "btr %1,%0"
+		asm volatile(LOCK_PREFIX "btrl %1,%0"
 			: BITOP_ADDR(addr)
 			: "Ir" (nr));
 	}
@@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile
unsigned long *addr)
 
 static inline void __clear_bit(int nr, volatile unsigned long *addr)
 {
-	asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
+	asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
 }
 
 /*
@@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned
long *addr)
 {
 	int oldbit;
 
-	asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
+	asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
 		     "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir"
(nr) : "memory");
 
 	return oldbit;
@@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile
unsigned long *addr)
 {
 	int oldbit;
 
-	asm("bts %2,%1\n\t"
+	asm("btsl %2,%1\n\t"
 	    "sbb %0,%0"
 	    : "=r" (oldbit), ADDR
 	    : "Ir" (nr));
@@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile
unsigned long *addr)
 {
 	int oldbit;
 
-	asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
+	asm volatile(LOCK_PREFIX "btrl %2,%1\n\t"
 		     "sbb %0,%0"
 		     : "=r" (oldbit), ADDR : "Ir" (nr) :
"memory");
 
@@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile
unsigned long *addr)
 {
 	int oldbit;
 
-	asm volatile("btr %2,%1\n\t"
+	asm volatile("btrl %2,%1\n\t"
 		     "sbb %0,%0"
 		     : "=r" (oldbit), ADDR
 		     : "Ir" (nr));
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0da5200..fda54c9 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -490,7 +490,7 @@ do {									\
 #define x86_test_and_clear_bit_percpu(bit, var)				\
 ({									\
 	int old__;							\
-	asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0"		\
+	asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0"		\
 		     : "=r" (old__), "+m" (var)				\
 		     : "dIr" (bit));					\
 	old__;								\
-- 
1.8.3.2.736.g869de25

Linus Torvalds

2013-Jul-14 17:19 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 5:56 AM, Ramkumar Ramachandra
<artagnon at gmail.com> wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing
Yes it does.
>   btrl  $1, 0
>   btr   $1, 0
>   btsl  $1, 0
>   bts   $1, 0
What the heck is that supposed to show? It shows nothing at all. With
an argument of '1', *of*course* gas will use "btsl", since
that's the
short form. Using the rex-predix and a btsq would be *stupid*.

So gas will pick the appropriate form, exactly as claimed.

Try some actual relevant test instead:

   bt %eax,mem
   bt %rax,mem

and notice how they are actually fundamentally different. Test-case:

int main(int argc, char **argv)
{
  asm("bt %1,%0":"=m" (**argv): "a" (argc));
  asm("bt %1,%0":"=m" (**argv): "a" ((unsigned
long)(argc)));
}

and I get

   0f a3 02             bt     %eax,(%rdx)
   48 0f a3 02           bt     %rax,(%rdx)

exactly as expected and wanted.

Now, there are possible cases where you want to make the size explicit
because you are mixing memory operand sizes and there can be nasty
performance implications of doing a 32-bit write and then doing a
64-bit read of the result. I'm not actually aware of us having ever
worried/cared about it, but it's a possible source of trouble: mixing
bitop instructions with non-bitop instructions can have some subtle
interactions, and you need to be careful, since the size of the
operand affects both the offset *and* the memory access size. The
access size generally is meaningless from a semantic standpoint
(little-endian being the only sane model), but the access size *can*
have performance implications for the write queue forwarding.

                      Linus

Ramkumar Ramachandra

2013-Jul-14 18:26 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

Linus Torvalds wrote:>>   btrl  $1, 0
>>   btr   $1, 0
>>   btsl  $1, 0
>>   bts   $1, 0
>
> What the heck is that supposed to show?
I was trying to show a reduced case where gas doesn't complain, but
llvm-mc does.  Try compiling this with llvm-mc, and you'll get:

        .text
        btrl    $1, 0
in.s:2:1: error: ambiguous instructions require an explicit suffix
      (could be 'btrw', 'btrl', or 'btrq')
btr     $1, 0
^
        btsl    $1, 0
in.s:4:1: error: ambiguous instructions require an explicit suffix
      (could be 'btsw', 'btsl', or 'btsq')
bts     $1, 0
^

Obviously, I misunderstood something major and screwed up the commit message.
> int main(int argc, char **argv)
> {
>   asm("bt %1,%0":"=m" (**argv): "a" (argc));
>   asm("bt %1,%0":"=m" (**argv): "a"
((unsigned long)(argc)));
> }
Right, so in:

int main(int argc, char **argv)
{
  asm("bts %1,%0":"=m" (**argv): "r" (argc));
  asm("btsl %1,%0":"=m" (**argv): "r" (argc));
  asm("btr %1,%0":"=m" (**argv): "r" ((unsigned
long)(argc)));
  asm("btrq %1,%0":"=m" (**argv): "r" ((unsigned
long)(argc)));
}

bts disambiguates to btsl, and btr disambiguates to btrq, as
advertised.  Is it dependent on whether I have a 32-bit machine or
64-bit machine, or just on the operand lengths?

Either way, this is not a very enlightening example, because clang
also compiles this fine, and doesn't complain about any ambiguity.  To
see the ambiguity I'm talking about, try to compile linux.git with
clang; I'll paste one error:

arch/x86/include/asm/bitops.h:129:15: error: ambiguous instructions
      require an explicit suffix (could be 'btrw', 'btrl', or
'btrq')
        asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
                     ^
<inline asm>:1:2: note: instantiated into assembly here
        btr $0,(%rsi)
        ^

Since nr is an int, and ADDR is *(volatile long *), this should
disambiguate to btrl, right?  Any clue why clang is complaining?

Tim Northover

2013-Jul-14 18:35 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

Hi,

The issue perhaps wasn't explained ideally (and possibly shouldn't
have been CCed directly to you either, so apologies, but now that
there *is* a discussion...)
> Try some actual relevant test instead:
>
>    bt %eax,mem
>    bt %rax,mem
>
> and notice how they are actually fundamentally different. Test-case:
I'm coming at this from the compiler side, where the register form is
unambiguous and not questioned. The discussion we're having involves
only the immediate form of the instruction. GNU as interprets:

    bt $63, mem

as
    btl $63, mem

which may or may not be what the user intended, but is not the same as
"btq $63, mem".

I'm not an official LLVM spokesperson or anything, but our consensus
seems to be that "bt $imm, whatever" is ambiguous (the %eax and %rax
versions you quoted disambiguate the width) and should be disallowed
by the assembler.

The patch we're replying to implements that as a NOP fix to the kernel
(GNU as always treats "bt" with an immediate as "btl"). I
don't
believe there's any situation in which it will produce different code,
but it will allow Clang to compile (this part of) the kernel.

There is, however, a potential optimisation here for someone who knows
their inline asm. Currently "set_bit(63, addr)" will use the
"r"
version of the constraint even on amd64 targets, materialising 63 with
a "movl". With sufficiently clever faff, it could use "btsq"
instead.

Cheers.

Tim.

Jeremy Fitzhardinge

2013-Jul-14 19:10 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing, and the AT&T syntax manual
> [1] contains no references to any such inference.  As evidenced by the
> following experiment, gas always disambiguates btr/bts to btrl/btsl.
> Feed the following input to gas:
>
>   btrl	$1, 0
>   btr	$1, 0
>   btsl	$1, 0
>   bts	$1, 0
When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference.  So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.
> Check that btr matches btrl, and bts matches btsl in both cases:
>
>   $ as --32 -a in.s
>   $ as --64 -a in.s
>
> To avoid giving readers the illusion of such an inference, and for
> clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
> disambiguate btr/bts automatically.
That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

    J

>
> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
>
> Cc: Jeremy Fitzhardinge <jeremy at goop.org>
> Cc: Andi Kleen <ak at linux.intel.com>
> Cc: Linus Torvalds <torvalds at linux-foundation.org>
> Cc: Ingo Molnar <mingo at kernel.org>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Eli Friedman <eli.friedman at gmail.com>
> Cc: Jim Grosbach <grosbach at apple.com>
> Cc: Stephen Checkoway <s at pahtak.org>
> Cc: LLVMdev <llvmdev at cs.uiuc.edu>
> Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com>
> ---
>  We discussed this pretty extensively on LLVMDev, but I'm still not
>  sure that I haven't missed something.
>
>  arch/x86/include/asm/bitops.h | 16 ++++++++--------
>  arch/x86/include/asm/percpu.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 6dfd019..6ed3d1e 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>  			: "iq" ((u8)CONST_MASK(nr))
>  			: "memory");
>  	} else {
> -		asm volatile(LOCK_PREFIX "bts %1,%0"
> +		asm volatile(LOCK_PREFIX "btsl %1,%0"
>  			: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
>  	}
>  }
> @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   */
>  static inline void __set_bit(int nr, volatile unsigned long *addr)
>  {
> -	asm volatile("bts %1,%0" : ADDR : "Ir" (nr) :
"memory");
> +	asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) :
"memory");
>  }
>  
>  /**
> @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
>  			: CONST_MASK_ADDR(nr, addr)
>  			: "iq" ((u8)~CONST_MASK(nr)));
>  	} else {
> -		asm volatile(LOCK_PREFIX "btr %1,%0"
> +		asm volatile(LOCK_PREFIX "btrl %1,%0"
>  			: BITOP_ADDR(addr)
>  			: "Ir" (nr));
>  	}
> @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr,
volatile unsigned long *addr)
>  
>  static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  {
> -	asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
> +	asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
>  }
>  
>  /*
> @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
> +	asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
>  		     "sbb %0,%0" : "=r" (oldbit), ADDR :
"Ir" (nr) : "memory");
>  
>  	return oldbit;
> @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm("bts %2,%1\n\t"
> +	asm("btsl %2,%1\n\t"
>  	    "sbb %0,%0"
>  	    : "=r" (oldbit), ADDR
>  	    : "Ir" (nr));
> @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
> +	asm volatile(LOCK_PREFIX "btrl %2,%1\n\t"
>  		     "sbb %0,%0"
>  		     : "=r" (oldbit), ADDR : "Ir" (nr) :
"memory");
>  
> @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile("btr %2,%1\n\t"
> +	asm volatile("btrl %2,%1\n\t"
>  		     "sbb %0,%0"
>  		     : "=r" (oldbit), ADDR
>  		     : "Ir" (nr));
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 0da5200..fda54c9 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -490,7 +490,7 @@ do {									\
>  #define x86_test_and_clear_bit_percpu(bit, var)				\
>  ({									\
>  	int old__;							\
> -	asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0"
\
> +	asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl
%0,%0"		\
>  		     : "=r" (old__), "+m" (var)				\
>  		     : "dIr" (bit));					\
>  	old__;								\
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/fd95816e/attachment.html>

Jeremy Fitzhardinge

2013-Jul-14 19:10 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 10:19 AM, Linus Torvalds wrote:> Now, there are possible cases where you want to make the size explicit
> because you are mixing memory operand sizes and there can be nasty
> performance implications of doing a 32-bit write and then doing a
> 64-bit read of the result. I'm not actually aware of us having ever
> worried/cared about it, but it's a possible source of trouble: mixing
> bitop instructions with non-bitop instructions can have some subtle
> interactions, and you need to be careful, since the size of the
> operand affects both the offset *and* the memory access size.The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?
>  The
> access size generally is meaningless from a semantic standpoint
> (little-endian being the only sane model), but the access size *can*
> have performance implications for the write queue forwarding.
It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

    J

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130714/22661e3f/attachment.html>

Jeremy Fitzhardinge

2013-Jul-14 19:23 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

(resent without HTML)

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing, and the AT&T syntax manual
> [1] contains no references to any such inference.  As evidenced by the
> following experiment, gas always disambiguates btr/bts to btrl/btsl.
> Feed the following input to gas:
>
>   btrl	$1, 0
>   btr	$1, 0
>   btsl	$1, 0
>   bts	$1, 0
When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference.  So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.
> Check that btr matches btrl, and bts matches btsl in both cases:
>
>   $ as --32 -a in.s
>   $ as --64 -a in.s
>
> To avoid giving readers the illusion of such an inference, and for
> clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
> disambiguate btr/bts automatically.
That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

    J

> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
>
> Cc: Jeremy Fitzhardinge <jeremy at goop.org>
> Cc: Andi Kleen <ak at linux.intel.com>
> Cc: Linus Torvalds <torvalds at linux-foundation.org>
> Cc: Ingo Molnar <mingo at kernel.org>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Eli Friedman <eli.friedman at gmail.com>
> Cc: Jim Grosbach <grosbach at apple.com>
> Cc: Stephen Checkoway <s at pahtak.org>
> Cc: LLVMdev <llvmdev at cs.uiuc.edu>
> Signed-off-by: Ramkumar Ramachandra <artagnon at gmail.com>
> ---
>  We discussed this pretty extensively on LLVMDev, but I'm still not
>  sure that I haven't missed something.
>
>  arch/x86/include/asm/bitops.h | 16 ++++++++--------
>  arch/x86/include/asm/percpu.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 6dfd019..6ed3d1e 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>  			: "iq" ((u8)CONST_MASK(nr))
>  			: "memory");
>  	} else {
> -		asm volatile(LOCK_PREFIX "bts %1,%0"
> +		asm volatile(LOCK_PREFIX "btsl %1,%0"
>  			: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
>  	}
>  }
> @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   */
>  static inline void __set_bit(int nr, volatile unsigned long *addr)
>  {
> -	asm volatile("bts %1,%0" : ADDR : "Ir" (nr) :
"memory");
> +	asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) :
"memory");
>  }
>  
>  /**
> @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
>  			: CONST_MASK_ADDR(nr, addr)
>  			: "iq" ((u8)~CONST_MASK(nr)));
>  	} else {
> -		asm volatile(LOCK_PREFIX "btr %1,%0"
> +		asm volatile(LOCK_PREFIX "btrl %1,%0"
>  			: BITOP_ADDR(addr)
>  			: "Ir" (nr));
>  	}
> @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr,
volatile unsigned long *addr)
>  
>  static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  {
> -	asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
> +	asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
>  }
>  
>  /*
> @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
> +	asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
>  		     "sbb %0,%0" : "=r" (oldbit), ADDR :
"Ir" (nr) : "memory");
>  
>  	return oldbit;
> @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm("bts %2,%1\n\t"
> +	asm("btsl %2,%1\n\t"
>  	    "sbb %0,%0"
>  	    : "=r" (oldbit), ADDR
>  	    : "Ir" (nr));
> @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
> +	asm volatile(LOCK_PREFIX "btrl %2,%1\n\t"
>  		     "sbb %0,%0"
>  		     : "=r" (oldbit), ADDR : "Ir" (nr) :
"memory");
>  
> @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile
unsigned long *addr)
>  {
>  	int oldbit;
>  
> -	asm volatile("btr %2,%1\n\t"
> +	asm volatile("btrl %2,%1\n\t"
>  		     "sbb %0,%0"
>  		     : "=r" (oldbit), ADDR
>  		     : "Ir" (nr));
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 0da5200..fda54c9 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -490,7 +490,7 @@ do {									\
>  #define x86_test_and_clear_bit_percpu(bit, var)				\
>  ({									\
>  	int old__;							\
> -	asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0"
\
> +	asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl
%0,%0"		\
>  		     : "=r" (old__), "+m" (var)				\
>  		     : "dIr" (bit));					\
>  	old__;								\

Jeremy Fitzhardinge

2013-Jul-14 19:23 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

(Resent without HTML)

On 07/14/2013 10:19 AM, Linus Torvalds wrote:> Now, there are possible cases where you want to make the size explicit
> because you are mixing memory operand sizes and there can be nasty
> performance implications of doing a 32-bit write and then doing a
> 64-bit read of the result. I'm not actually aware of us having ever
> worried/cared about it, but it's a possible source of trouble: mixing
> bitop instructions with non-bitop instructions can have some subtle
> interactions, and you need to be careful, since the size of the
> operand affects both the offset *and* the memory access size.The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?
>  The
> access size generally is meaningless from a semantic standpoint
> (little-endian being the only sane model), but the access size *can*
> have performance implications for the write queue forwarding.
It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

    J

Andi Kleen

2013-Jul-14 21:14 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

I think best would be to just find some way to implement LOCK prefix
patching using atomic compiler intrinsics and then switch to those

Then all this inline assembler horror could be ifdef'ed away 
for old compilers only, and likely the generated code would
be better as the compiler could optimize more.

Or just give up on LOCK patching, as single CPU systems
and VMs are less and less interesting?

-Andi
-- 
ak at linux.intel.com -- Speaking for myself only

H. Peter Anvin

2013-Jul-15 18:47 UTC

head link

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote:> (resent without HTML)
> 
> On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
>> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
>> changed a bunch of btrl/btsl instructions to btr/bts, with the
following
>> justification:
>>
>>   The inline assembly for the bit operations has been changed to remove
>>   explicit sizing hints on the instructions, so the assembler will pick
>>   the appropriate instruction forms depending on the architecture and
>>   the context.
>>
>> Unfortunately, GNU as does no such thing, and the AT&T syntax
manual
>> [1] contains no references to any such inference.  As evidenced by the
>> following experiment, gas always disambiguates btr/bts to btrl/btsl.
>> Feed the following input to gas:
>>
>>   btrl	$1, 0
>>   btr	$1, 0
>>   btsl	$1, 0
>>   bts	$1, 0
> 
> When I originally did those patches, I was careful make sure that we
> didn't give implied sizes to operations with only immediate and/or
> memory operands because - in general - gas can't infer the operation
> size from such operands. However, in the case of the bit test/set
> operations, the memory access size is not really derived from the
> operation size (the SDM is a bit vague), and even if it were it would be
> an operation rather than semantic difference.  So there's no real
> problem with gas choosing 'l' as a default size in the absence of
any
> explicit override or constraint.
> 
>> Check that btr matches btrl, and bts matches btsl in both cases:
>>
>>   $ as --32 -a in.s
>>   $ as --64 -a in.s
>>
>> To avoid giving readers the illusion of such an inference, and for
>> clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
>> disambiguate btr/bts automatically.
> 
> That sounds reasonable for all other operations because it makes a real
> semantic difference, but overly strict for bit operations.
> 
To be fair, we *ought to* be able to do something like:

	asm volatile(LOCK_PREFIX "bts%z0 %1,%0"
 			: BITOP_ADDR(addr) : "Ir" (nr) : "memory");

... but some older version of gcc are broken and emit "ll" rather than
"q".  Furthermore, since that would actually result in *worse* code
emitted overall (unnecessary REX prefixes), I'm not exactly happy on the
idea.

	-hpa

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jul 2013 - [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

[LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

Reasonably Related Threads