thr3ads.net - llvm dev - [LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev) [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Arnaud A. de Grandmaison

2013-Jan-18 17:30 UTC

[LLVMdev] Weird volatile propagation ?

Hi All,

Using clang+llvm at head, I noticed a weird behaviour with the following
reduced testcase :

$ cat test.c 

#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c 
; ModuleID = 'test.c'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align
32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}


When I would have expected something like :

define void @test(i16 zeroext %a) nounwind uwtable {
  store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
  store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

Using LLVM-3.1, I get the expected code. Would this ring a bell to anybody ?

Cheers,

-- 
Arnaud de Grandmaison

Arnaud de Grandmaison

2013-Jan-20 21:42 UTC

head link

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

As a results of my investigations, the thread is also added to cfe-dev.

The context : while porting my company code from the LLVM/Clang releases
3.1 to 3.2, I stumbled on a code size and performance regression. The
testcase is :

$ cat test.c 
#include <stdint.h>

struct R {
  uint16_t a;
  uint16_t b;
};

volatile struct R * const addr = (volatile struct R *) 416;

void test(uint16_t a)
{
  struct R r = { a, 1 };
  *addr = r;
}

$ clang -O2 -o - -emit-llvm -S -c test.c 
; ModuleID = 'test.c'
target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.R = type { i16, i16 }

@addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8

define void @test(i16 zeroext %a) nounwind uwtable {
  %r.sroa.0 = alloca i16, align 2
  %r.sroa.1 = alloca i16, align 2
  store i16 %a, i16* %r.sroa.0, align 2
  store i16 1, i16* %r.sroa.1, align 2
  %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
  store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*), align
32
  %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
  store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*), align 2
  ret void
}

The problem is in how clang codegen aggregate copies when the source or
the destination are volatile : it uses the memcpy intrinsic, setting the
volatile parameter. However, the LLVM IR reference manual specifies that
"The detailed access behavior is not very cleanly specified and it is
unwise to depend on it". The new SROA implementation is conservatively
correct, but behaves differently.

I believe we should either :
 - rework methods AggExprEmitter::EmitCopy and
CodeGenFunction::EmitAggregateCopy (from
clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when
either the source or the destination is volatile,
 - specify cleanly the memcpy intrinsic with respect to volatile behaviour

I would prefer option #2 as having a parameter whose effect is
unspecified does not make it very useful and invite more problems later.

Any thoughts ?

Cheers,
--
Arnaud A. de Grandmaison


On 01/18/2013 06:30 PM, Arnaud A. de Grandmaison wrote:> Hi All,
>
> Using clang+llvm at head, I noticed a weird behaviour with the following
> reduced testcase :
>
> $ cat test.c 
>
> #include <stdint.h>
>
> struct R {
>   uint16_t a;
>   uint16_t b;
> };
>
> volatile struct R * const addr = (volatile struct R *) 416;
>
> void test(uint16_t a)
> {
>   struct R r = { a, 1 };
>   *addr = r;
> }
>
> $ clang -O2 -o - -emit-llvm -S -c test.c 
> ; ModuleID = 'test.c'
> target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> %struct.R = type { i16, i16 }
>
> @addr = constant %struct.R* inttoptr (i64 416 to %struct.R*), align 8
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   %r.sroa.0 = alloca i16, align 2
>   %r.sroa.1 = alloca i16, align 2
>   store i16 %a, i16* %r.sroa.0, align 2
>   store i16 1, i16* %r.sroa.1, align 2
>   %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>   store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*),
align 32
>   %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>   store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*),
align 2
>   ret void
> }
>
>
> When I would have expected something like :
>
> define void @test(i16 zeroext %a) nounwind uwtable {
>   store volatile i16 %a, i16* inttoptr (i64 416 to i16*), align 2
>   store volatile i16 1, i16* inttoptr (i64 418 to i16*), align 2
>   ret void
> }
>
> Using LLVM-3.1, I get the expected code. Would this ring a bell to anybody
?
>
> Cheers,
>

-- 
Arnaud de Grandmaison
Senior CPU engineer
Business Unit Digital Tuner

Parrot S.A.
174, quai de Jemmapes
75010 Paris - France
Phone: +33 1 48 03 84 59

Chandler Carruth

2013-Jan-20 21:56 UTC

head link

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

I doubt you needed to add cfe-dev here. Sorry I hadn't seen this, this
seems like an easy and simple deficiency in the IR intrinsic for memcpy.
See below.

On Sun, Jan 20, 2013 at 1:42 PM, Arnaud de Grandmaison <
arnaud.allarddegrandmaison at parrot.com> wrote:
> define void @test(i16 zeroext %a) nounwind uwtable {
>   %r.sroa.0 = alloca i16, align 2
>   %r.sroa.1 = alloca i16, align 2
>   store i16 %a, i16* %r.sroa.0, align 2
>   store i16 1, i16* %r.sroa.1, align 2
>   %r.sroa.0.0.load3 = load volatile i16* %r.sroa.0, align 2
>   store volatile i16 %r.sroa.0.0.load3, i16* inttoptr (i64 416 to i16*),
> align 32
>   %r.sroa.1.0.load2 = load volatile i16* %r.sroa.1, align 2
>   store volatile i16 %r.sroa.1.0.load2, i16* inttoptr (i64 418 to i16*),
> align 2
>
Just so we're on the same page, the "regression" (which I claim is
actually
a bug-fix) are the loads here being volatile in addition to the stores. The
stores require it, but the loads get it.

The problem is in how clang codegen aggregate copies when the source
or> the destination are volatile : it uses the memcpy intrinsic, setting the
> volatile parameter. However, the LLVM IR reference manual specifies that
> "The detailed access behavior is not very cleanly specified and it is
> unwise to depend on it". The new SROA implementation is conservatively
> correct, but behaves differently.
>
> I believe we should either :
>  - rework methods AggExprEmitter::EmitCopy and
> CodeGenFunction::EmitAggregateCopy (from
> clang/lib/CodeGen/CGExprAgg.cpp) to codegen differently the copy when
> either the source or the destination is volatile,
>
I think this is bad...

>  - specify cleanly the memcpy intrinsic with respect to volatile behaviour
>
I agree that this is what we should do.

I took a conservative approach when implementing SROA w.r.t. volatile
memcpy, and I think we should codify those semantics: with a single
is-volatile bit both the loads and stores which the memcpy decomposes into
will have volatile semantics.

I think we should extend the memcpy intrinsic to support two volatile flags
and have that overload have the flags refer to the loads and stores
respectively. We should then auto-upgrade the existing memcpy intrinsics
into the more precise form by replicating the single volatile flag. This
will conservatively preserve semantics. As part of adding the new
intrinsic, everywhere in LLVM that examines the two flags should be audited
and should use the individual flag for load vs. store volatility. This will
change SROA so that it can emit the loads as non-volatile, and only the
stores will be volatile. Finally we switch Clang over to emit the new
intrinsic structure with more precise information, and your test case
should go back to normal.

As part of this, we should also clarify that a volatile memcpy guarantees
that each byte loaded is loaded exactly once, and each byte stored is
stored exactly once, but that these operations are free to be batched into
4-byte, 8-byte, or any other width of loads and stores, and there is no
guarantee about the interleaving of these loads and stores even when both
are volatile.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130120/ff274d19/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jan 2013 - [LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

[LLVMdev] Weird volatile propagation ?

[LLVMdev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

[LLVMdev] [cfe-dev] codegen of volatile aggregate copies (was "Weird volatile propagation" on llvm-dev)

Reasonably Related Threads