On Thu, Aug 1, 2013 at 12:36 PM, Sam Cristall <cristall at eleveneng.com>
wrote:> I'm working with an experimental backend for an MCU with heavy
> multithreading capabilities but lacks proper acquire/release semantics.
> This is okay, as the programmer can customize __cxa_guard_acquire and
> __cxa_guard_release to lower/raise appropriate semaphores. The issue
I'm
> having is that I can't seem to figure out when to lower atomic load
into an
> acquire/load pair early enough that the __cxa_guard_acquire is evaluated
for
> optimization (most importantly inlining.) First, is this even the proper
> way to do this and further am I going about this the wrong way and is there
> a "best time" to do a pass to catch these guys?
The code clang generates for a guarded initialization looks like this normally:
entry:
%0 = load atomic i8* bitcast (i64* @_ZGVZ3barvE1x to i8*) acquire, align 8
%guard.uninitialized = icmp eq i8 %0, 0
br i1 %guard.uninitialized, label %init.check, label %init.end
init.check: ; preds = %entry
%1 = tail call i32 @__cxa_guard_acquire(i64* @_ZGVZ3barvE1x) #1
%tobool = icmp eq i32 %1, 0
br i1 %tobool, label %init.end, label %init
init: ; preds = %init.check
%call = tail call i32 @_Z3foov() #1
store i32 %call, i32* @_ZZ3barvE1x, align 4, !tbaa !0
tail call void @__cxa_guard_release(i64* @_ZGVZ3barvE1x) #1
br label %init.end
Given this, there is no reason to inline the call to
__cxa_guard_acquire; it would bloat code-size for no performance
benefit.
What does the IR you are working with look like?
-Eli