Alexey Perevalov
2015-Apr-08  08:21 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
Hello community,
I faced with bug in multithread environment in objective C code which using
dispatch_async and block,
__sync_add_and_fetch increments global variable. But in case of many..many
threads> 5, after every
__sync_add_and_fetch got damaged 
...
int32_t count = 0;
...
int
main(int argc, char *argv[])
{
   for (i = 1; i < 32; ++i) {
     ...
        char* name;
        asprintf(&name, "test.overcommit.%d", i);
        dispatch_queue_t queue = dispatch_queue_create(name, NULL);
        free(name);
        dispatch_set_target_queue(queue, dispatch_get_global_queue(0,
                    dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0)));
        /* async queue */
        dispatch_async(queue, ^{
           __sync_add_and_fetch(&count, 1); //<< Here count is
corrupted in case of number of threads> ~5
           printf("count addr %p, value %d\n", &count, count);
           fflush(stdout);
        });
   }
   ...
   dispatch_main();
}
in case of count is local variable in scope of main function and has __block
attribute, all is fine.
I'm using
clang version 3.3 (tags/RELEASE_33/final)
Target: armv7l-unknown-linux-gnueabi
Thread model: posix
libBlockRuntime 0.3
libdispatch for linux 1.2
CPU is ARMv7,
in disas I see dmb ish instruction, but I don't know is it enough.
I understand, my clang is out of date. Moving to new version could be painful )
Maybe somebody knows, was that bug fixed?
BR,
Alexey
Tim Northover
2015-Apr-08  13:53 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
> in disas I see dmb ish instruction, but I don't know is it enough.There should be 2 dmb instructions: one before the ldrex/strex loop and one after. But I wouldn't expect dropping one to actually cause a problem in the code you posted. In what way is "count" corrupted, and how do you observe it? What assembly is actually produced for the block?> I understand, my clang is out of date. Moving to new version could be painful ) > Maybe somebody knows, was that bug fixed?That area's certainly improved, but I'm not aware of any bugs on that scale ("can't dispatch 32 threads to atomically increment a single variable and print it") in clang 3.3 so I think something else is probably going on. A self-contained, minimal example we can examine would be useful. Cheers. Tim.
Alexey Perevalov
2015-Apr-09  07:58 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
Hi Tim ----------------------------------------> Date: Wed, 8 Apr 2015 06:53:44 -0700 > Subject: Re: [LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM > From: t.p.northover at gmail.com > To: alexey.perevalov at hotmail.com > CC: llvmdev at cs.uiuc.edu > >> in disas I see dmb ish instruction, but I don't know is it enough. > > There should be 2 dmb instructions: one before the ldrex/strex loop > and one after. But I wouldn't expect dropping one to actually cause a > problem in the code you posted.Yes, there are two dmb's => 0x00008ed8 <+224>: dmb ish 0x00008edc <+228>: movw r1, #10800 ; 0x2a30 0x00008ee0 <+232>: movt r1, #1 0x00008ee4 <+236>: str r0, [sp, #44] ; 0x2c 0x00008ee8 <+240>: str r1, [sp, #40] ; 0x28 0x00008eec <+244>: ldr r0, [sp, #40] ; 0x28 0x00008ef0 <+248>: ldrexb r1, [r0] 0x00008ef4 <+252>: ldr r2, [sp, #44] ; 0x2c 0x00008ef8 <+256>: add r3, r1, r2 0x00008efc <+260>: strexb r12, r3, [r0] 0x00008f00 <+264>: cmp r12, #0 0x00008f04 <+268>: str r1, [sp, #36] ; 0x24 ---Type <return> to continue, or q <return> to quit--- 0x00008f08 <+272>: bne 0x8eec <__main_block_invoke+244> 0x00008f0c <+276>: ldr r0, [sp, #36] ; 0x24 0x00008f10 <+280>: add r0, r0, #1 0x00008f14 <+284>: dmb ish 0x00008f18 <+288>: ldr r1, [sp, #40] ; 0x28 0x00008f1c <+292>: strb r0, [r1] 0x00008f20 <+296>: bl 0x8aa0 <pthread_self>> > In what way is "count" corrupted, and how do you observe it? What > assembly is actually produced for the block?The assembly for whole block is huge even for minimal test case. I attached source code. I used cocotron derived framework, but due canaries was alive I don't think it's due runtime. If you undef REPRODUCE_CASE in example, it will not reproduce, I think it's because of introducing additional time interval. The output of sample is following (when it reproduced): after -1316199408 count addr 0x12a30, value 1 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 2 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 3 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 33 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 34> >> I understand, my clang is out of date. Moving to new version could be painful ) >> Maybe somebody knows, was that bug fixed? > > That area's certainly improved, but I'm not aware of any bugs on that > scale ("can't dispatch 32 threads to atomically increment a single > variable and print it") in clang 3.3 so I think something else is > probably going on. > > A self-contained, minimal example we can examine would be useful. > > Cheers. > > Tim.-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: disp_async_min.m URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/fbe5d0df/attachment.ksh>