Alexey Perevalov
2015-Apr-08 08:21 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
Hello community, I faced with bug in multithread environment in objective C code which using dispatch_async and block, __sync_add_and_fetch increments global variable. But in case of many..many threads> 5, after every __sync_add_and_fetch got damaged ... int32_t count = 0; ... int main(int argc, char *argv[]) { for (i = 1; i < 32; ++i) { ... char* name; asprintf(&name, "test.overcommit.%d", i); dispatch_queue_t queue = dispatch_queue_create(name, NULL); free(name); dispatch_set_target_queue(queue, dispatch_get_global_queue(0, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_LOW, 0))); /* async queue */ dispatch_async(queue, ^{ __sync_add_and_fetch(&count, 1); //<< Here count is corrupted in case of number of threads> ~5 printf("count addr %p, value %d\n", &count, count); fflush(stdout); }); } ... dispatch_main(); } in case of count is local variable in scope of main function and has __block attribute, all is fine. I'm using clang version 3.3 (tags/RELEASE_33/final) Target: armv7l-unknown-linux-gnueabi Thread model: posix libBlockRuntime 0.3 libdispatch for linux 1.2 CPU is ARMv7, in disas I see dmb ish instruction, but I don't know is it enough. I understand, my clang is out of date. Moving to new version could be painful ) Maybe somebody knows, was that bug fixed? BR, Alexey
Tim Northover
2015-Apr-08 13:53 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
> in disas I see dmb ish instruction, but I don't know is it enough.There should be 2 dmb instructions: one before the ldrex/strex loop and one after. But I wouldn't expect dropping one to actually cause a problem in the code you posted. In what way is "count" corrupted, and how do you observe it? What assembly is actually produced for the block?> I understand, my clang is out of date. Moving to new version could be painful ) > Maybe somebody knows, was that bug fixed?That area's certainly improved, but I'm not aware of any bugs on that scale ("can't dispatch 32 threads to atomically increment a single variable and print it") in clang 3.3 so I think something else is probably going on. A self-contained, minimal example we can examine would be useful. Cheers. Tim.
Alexey Perevalov
2015-Apr-09 07:58 UTC
[LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
Hi Tim ----------------------------------------> Date: Wed, 8 Apr 2015 06:53:44 -0700 > Subject: Re: [LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM > From: t.p.northover at gmail.com > To: alexey.perevalov at hotmail.com > CC: llvmdev at cs.uiuc.edu > >> in disas I see dmb ish instruction, but I don't know is it enough. > > There should be 2 dmb instructions: one before the ldrex/strex loop > and one after. But I wouldn't expect dropping one to actually cause a > problem in the code you posted.Yes, there are two dmb's => 0x00008ed8 <+224>: dmb ish 0x00008edc <+228>: movw r1, #10800 ; 0x2a30 0x00008ee0 <+232>: movt r1, #1 0x00008ee4 <+236>: str r0, [sp, #44] ; 0x2c 0x00008ee8 <+240>: str r1, [sp, #40] ; 0x28 0x00008eec <+244>: ldr r0, [sp, #40] ; 0x28 0x00008ef0 <+248>: ldrexb r1, [r0] 0x00008ef4 <+252>: ldr r2, [sp, #44] ; 0x2c 0x00008ef8 <+256>: add r3, r1, r2 0x00008efc <+260>: strexb r12, r3, [r0] 0x00008f00 <+264>: cmp r12, #0 0x00008f04 <+268>: str r1, [sp, #36] ; 0x24 ---Type <return> to continue, or q <return> to quit--- 0x00008f08 <+272>: bne 0x8eec <__main_block_invoke+244> 0x00008f0c <+276>: ldr r0, [sp, #36] ; 0x24 0x00008f10 <+280>: add r0, r0, #1 0x00008f14 <+284>: dmb ish 0x00008f18 <+288>: ldr r1, [sp, #40] ; 0x28 0x00008f1c <+292>: strb r0, [r1] 0x00008f20 <+296>: bl 0x8aa0 <pthread_self>> > In what way is "count" corrupted, and how do you observe it? What > assembly is actually produced for the block?The assembly for whole block is huge even for minimal test case. I attached source code. I used cocotron derived framework, but due canaries was alive I don't think it's due runtime. If you undef REPRODUCE_CASE in example, it will not reproduce, I think it's because of introducing additional time interval. The output of sample is following (when it reproduced): after -1316199408 count addr 0x12a30, value 1 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 2 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 3 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 33 canary1 77, canary2 88 after -1324588016 count addr 0x12a30, value 34> >> I understand, my clang is out of date. Moving to new version could be painful ) >> Maybe somebody knows, was that bug fixed? > > That area's certainly improved, but I'm not aware of any bugs on that > scale ("can't dispatch 32 threads to atomically increment a single > variable and print it") in clang 3.3 so I think something else is > probably going on. > > A self-contained, minimal example we can examine would be useful. > > Cheers. > > Tim.-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: disp_async_min.m URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150409/fbe5d0df/attachment.ksh>
Reasonably Related Threads
- [LLVMdev] __sync_add_and_fetch in objc block for global variable on ARM
- objc object file generated for gnustep runtime for ELF target is too big
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- [LLVMdev] question about alignment of structures on the stack (arm 32)
- [LLVMdev] question about alignment of structures on the stack (arm 32)