thr3ads.net - llvm dev - [LLVMdev] "fork" and "sync" for LLVM thread support

If this information is useful, please help other people find it:
Share via:

Duraid Madina

2006-Oct-30 09:56 UTC

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

Dear all,

	Recently I've wanted to add support for threads to LLVM (motivated by 
OpenMP, more or less), but before jumping in and implementing anything, 
I thought it might be a good idea to describe what I have in mind and 
ask for comments. Hence this email - if anyone has any comments, I'd be 
very glad to hear them.


WHAT I'M PROPOSING:

	The addition of two instructions - fork and sync - to the basic LLVM 
language.

	1) fork

syntax:     fork void <fn ptr>(<arg list>)

	fork is basically call, and it does what you think: an independent flow 
of execution begins at a given function (with no speed/synchronicity 
guarantees whatsoever). The main difference is that you can *only* fork 
functions that return void. The forking thread continues without 
waiting, while the forked thread either disappears when it reaches a ret 
void, or continues executing forever.

example:    fork void %doom(int %phd, int 666)

	2) sync

syntax:     sync [how] <type> (<token list>)

	sync is the synchronization primitive: when a thread reaches a sync 
instruction, it does not proceed until all tokens in its token list have 
been sync'd by other threads. e.g. If thread A issues:

	sync int (1, 2)

	then it will block until thread B issues, say:

	sync int (2, 1)

	or perhaps:

	sync int (1)
         sync int (2)

at which point both thread A and thread B will resume, or, B could issue:

         sync int (1, 2, 3)

at which point A would resume, but B would block pending a sync(3) from 
somewhere.

	The optional [how] parameter is there for performance reasons: it is 
simply a way to request that LLVM generate code to implement the sync in 
a particular way, e.g.

	AtomicOpBusyWait
	AtomicOpYielding
	Signal
	etc..


	That's it - fork and sync are (I hope) all that's required.


A PATRONIZING EXAMPLE:

     Summing the numbers in an array, using two threads:

%sumB = global int 0

int %sumfun(%base, %offset, %count)
   [simple loop adding numbers from base[offset] to base[offset+count]]

void %sumwrap(int *base, int %offset, int %count)
   %mysum = call int %sumfun(%base, %offset, %count)
   store int %mysum, int %sumB
   sync ubyte(42)
   ret void

int %main(void)
   %array = malloc [1000 x uint]
   %sumA = call int %sumfun(%array, 0, 500)
   fork void %sumwrap(%array, 500, 500)
   sync ubyte (42)
   %sum = add int %sumA, %sumB
   ret int %sum


    .. this example is simple enough, but there are many other 
possibilities of course. e.g. more OpenMP-like would be to use *three* 
threads, and have main() fork two workers, handing them sync tokens:

void %sumwrap(int *base, int %offset, int %count, int* %result, ubyte 
%token)
   %mysum = call int %sumfun(%base, %offset, %count)
   store int %mysum, int* %result
   sync ubyte(%token)
   ret void

int %main(void)
   %array = malloc [1000 x uint]
   fork void %sumwrap(%array,   0, 500, %sumA, 42)
   fork void %sumwrap(%array, 500, 500, %sumB, 43)
   sync ubyte (42, 43)
   %sum = add int %sumA, %sumB
   ret int %sum


    ..etc etc.


ASSORTED COMMENTS OF MY OWN:

	What I see as positive aspects of this proposal:

- It is relatively simple.

- It does not involve making any effort to hand-hold the developer: this 
is *LL*VM after all; e.g. in the example above, %sumfun() needs to be 
re-entrant: if the loop counter was a single global variable, obviously 
things would break down. That's fine, but OpenMP allows variables to be 
explicitly marked shared or thread-private. Supporting that, IMHO, is an 
issue for any OpenMP->LLVM compiler to take care of (e.g. building 
thread-specialized copies of functions/variables as necessary), rather 
than something LLVM should be talking about explicitly: changing LLVM so 
that variables *at the LLVM level* could be shared or private seems too 
high-level to me (not to mention being a very invasive change.)

- It is flexible in terms of the actual synchronization schedules that 
can be implemented. While semantically there is just a "big bag of 
shared, atomic booleans" (sync tokens), one can use any particular token 
type they like. There's no need to worry about a proliferation of 
tokens: massive sync() statements with millions of tokens could be 
replaced by log(n) many at the cost of some intermediate "sync only" 
functions. I expect that in typical use, sync statements will never get 
so large or so frequent that the supply of sync tokens would be a 
concern. Moreover, there's no need that sync tokens be compile-time 
constants: just knowing their type is enough.

- It is flexible in terms of the actual synchronization code that can 
get built. Again, the semantics are just simple barriers, but there are 
many different ways of implementing these, some of which are 
architecture- and/or OS-specific. sync's [how] parameter is intended to 
be a way to "get the right code" in any situation where that is 
important for performance (or whatever other) reasons. For example, on ia64:

   sync AtomicOpBusyWait byte (1,2,3,4,5,....14,15,16)

   could be codegenned to a tight busyloop using ld.acq, spinning to see 
if two 8-byte words become exactly 0, while matching

   sync AtomicOpBusyWait byte (x)

   instructions in the worker threads code be codegenned to tight 
ld.acq/cmpxchg.rel busyloops writing 0 bytes into the appropriate places.

   (Roughly, the idea is that LLVM backends supporting atomic ops would 
offer at least the following sync "styles":

    AtomicOpBusyWait - for minimum latency, busy wait using atomic ops
                       to get/put the sync token
    AtomicOpYielding - for increased politeness, codegen to a busyloop
                       with something like nanosleep(50000) inside.)

- If forks are replaced with calls, and syncs are replaced with NOPs, 
then threaded code without any race conditions should continue to work 
correctly in a single-threaded environment.

- A simple C library wrapping various OS/arch-specific threading 
primitives is all that's required to implement the basic support 
library. LLVM backends can simply lower fork and sync to calls to 
functions in this library for some basic level of support, though 
eventually the backends would probably want to emit synchronization code 
directly, for performance.

	Other random thoughts:

- I don't really have a strong opinion on what to do about sync's that 
are unpaired. We could either do some work and complain at compile-time, 
or simply have such syncs hang. The latter makes more sense to me.

- An important point is that when generating code, how do you know who 
is the "forker" and who is the "forkee" if all you have are
sync
instructions that look alike? There are a few answers:

   a) If all sync tokens are compile-time constants, then it actually 
shouldn't matter (expect maybe for performance.) All tokens (and by 
extension, their respective sync instructions) will be able to be paired 
at compile-time, and questions of "which way around" simply don't
matter
- you'll get the right semantics (since you'll be able to pair things up
correctly), but possibly poor performance.

   b) If not, the (global) CFG can be traversed to see who is forking 
and who is being forked. The only tricky case is where this involves a 
proper graph and not a tree (i.e. two bits of code mutually forking each 
other): at that point you can either again make arbitrary forker/forkee 
decisions, or use the [how] parameter to make the distinction clear 
where it's important.

- One possible gotcha with sync's optional [how] parameter is that 
(apart from having to implement all the different syncs, (possibly) 
across different platforms) one needs to bear in mind that a sync
of one [how] type needs to be mated with a sync of a compatible [how] type.

- It may be useful to add a [how] option to the fork instruction: most 
commonly used operating systems have more than one way of forking a 
thread, and sometimes the choice of which to use can be important for 
performance. (You know who you are. ;) It may be good to expose this to 
LLVM.

    	OK, that's enough of a ramble for one email. Once again, if anyone 
has any comments, I'd be most grateful.


	Thanks in advance,

	Duraid


P.S. I've been told that Misha Brukman implemented something similar to 
this at one point, but have been too busy^Wlazy to chase him up about it 
yet. Misha, if you read this, please yell at me! :)

Chris Lattner

2006-Oct-30 18:43 UTC

head link

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

> 	Recently I've wanted to add support for threads to LLVM (motivated by
> OpenMP, more or less), but before jumping in and implementing anything,
> I thought it might be a good idea to describe what I have in mind and
> ask for comments. Hence this email - if anyone has any comments, I'd be
> very glad to hear them.
Hey Duraid,

After a *really* quick look, here are some thoughts:

1. My standard rebuttal: why does this have to be in the LLVM ISA?  Both
    instructions seem that they could be implemented as a call to a
    standard library of some sort.  Being library functions does not
    prevent the code generator from inlining calls or doing other special
    stuff if needed for performance.
2. Your tokens can't be integer constants if you want them to be unique.
    The LLVM linker should not be in the business of having to rename these
    or decide how to handle conflicts when linking LLVM code.
3. You can't depend on being able to walk the interprocedural CFG,
    function pointers are undecidable in general.
4. Related to #3, consider the case when some part of the program is
    compiled with a non-llvm compiler, or when compiled with llvm but not
    linked as LLVM code.  Your proposal still has to work in these cases.
5. The semantics of fork need to be nailed down.  In particular, can the
    implementation use thread pooling? Can it defer starting the thread for
    arbitrary lengths of time (Subject to sync constraints etc).
6. You really should get in touch with Misha/Vikram :)

-Chris

> WHAT I'M PROPOSING:
>
> 	The addition of two instructions - fork and sync - to the basic LLVM
> language.
>
> 	1) fork
>
> syntax:     fork void <fn ptr>(<arg list>)
>
> 	fork is basically call, and it does what you think: an independent flow
> of execution begins at a given function (with no speed/synchronicity
> guarantees whatsoever). The main difference is that you can *only* fork
> functions that return void. The forking thread continues without
> waiting, while the forked thread either disappears when it reaches a ret
> void, or continues executing forever.
>
> example:    fork void %doom(int %phd, int 666)
>
> 	2) sync
>
> syntax:     sync [how] <type> (<token list>)
>
> 	sync is the synchronization primitive: when a thread reaches a sync
> instruction, it does not proceed until all tokens in its token list have
> been sync'd by other threads. e.g. If thread A issues:
>
> 	sync int (1, 2)
>
> 	then it will block until thread B issues, say:
>
> 	sync int (2, 1)
>
> 	or perhaps:
>
> 	sync int (1)
>         sync int (2)
>
> at which point both thread A and thread B will resume, or, B could issue:
>
>         sync int (1, 2, 3)
>
> at which point A would resume, but B would block pending a sync(3) from
> somewhere.
>
> 	The optional [how] parameter is there for performance reasons: it is
> simply a way to request that LLVM generate code to implement the sync in
> a particular way, e.g.
>
> 	AtomicOpBusyWait
> 	AtomicOpYielding
> 	Signal
> 	etc..
>
>
> 	That's it - fork and sync are (I hope) all that's required.
>
>
> A PATRONIZING EXAMPLE:
>
>     Summing the numbers in an array, using two threads:
>
> %sumB = global int 0
>
> int %sumfun(%base, %offset, %count)
>   [simple loop adding numbers from base[offset] to base[offset+count]]
>
> void %sumwrap(int *base, int %offset, int %count)
>   %mysum = call int %sumfun(%base, %offset, %count)
>   store int %mysum, int %sumB
>   sync ubyte(42)
>   ret void
>
> int %main(void)
>   %array = malloc [1000 x uint]
>   %sumA = call int %sumfun(%array, 0, 500)
>   fork void %sumwrap(%array, 500, 500)
>   sync ubyte (42)
>   %sum = add int %sumA, %sumB
>   ret int %sum
>
>
>    .. this example is simple enough, but there are many other
> possibilities of course. e.g. more OpenMP-like would be to use *three*
> threads, and have main() fork two workers, handing them sync tokens:
>
> void %sumwrap(int *base, int %offset, int %count, int* %result, ubyte
> %token)
>   %mysum = call int %sumfun(%base, %offset, %count)
>   store int %mysum, int* %result
>   sync ubyte(%token)
>   ret void
>
> int %main(void)
>   %array = malloc [1000 x uint]
>   fork void %sumwrap(%array,   0, 500, %sumA, 42)
>   fork void %sumwrap(%array, 500, 500, %sumB, 43)
>   sync ubyte (42, 43)
>   %sum = add int %sumA, %sumB
>   ret int %sum
>
>
>    ..etc etc.
>
>
> ASSORTED COMMENTS OF MY OWN:
>
> 	What I see as positive aspects of this proposal:
>
> - It is relatively simple.
>
> - It does not involve making any effort to hand-hold the developer: this
> is *LL*VM after all; e.g. in the example above, %sumfun() needs to be
> re-entrant: if the loop counter was a single global variable, obviously
> things would break down. That's fine, but OpenMP allows variables to be
> explicitly marked shared or thread-private. Supporting that, IMHO, is an
> issue for any OpenMP->LLVM compiler to take care of (e.g. building
> thread-specialized copies of functions/variables as necessary), rather
> than something LLVM should be talking about explicitly: changing LLVM so
> that variables *at the LLVM level* could be shared or private seems too
> high-level to me (not to mention being a very invasive change.)
>
> - It is flexible in terms of the actual synchronization schedules that
> can be implemented. While semantically there is just a "big bag of
> shared, atomic booleans" (sync tokens), one can use any particular
token
> type they like. There's no need to worry about a proliferation of
> tokens: massive sync() statements with millions of tokens could be
> replaced by log(n) many at the cost of some intermediate "sync
only"
> functions. I expect that in typical use, sync statements will never get
> so large or so frequent that the supply of sync tokens would be a
> concern. Moreover, there's no need that sync tokens be compile-time
> constants: just knowing their type is enough.
>
> - It is flexible in terms of the actual synchronization code that can
> get built. Again, the semantics are just simple barriers, but there are
> many different ways of implementing these, some of which are
> architecture- and/or OS-specific. sync's [how] parameter is intended to
> be a way to "get the right code" in any situation where that is
> important for performance (or whatever other) reasons. For example, on
ia64:
>
>   sync AtomicOpBusyWait byte (1,2,3,4,5,....14,15,16)
>
>   could be codegenned to a tight busyloop using ld.acq, spinning to see
> if two 8-byte words become exactly 0, while matching
>
>   sync AtomicOpBusyWait byte (x)
>
>   instructions in the worker threads code be codegenned to tight
> ld.acq/cmpxchg.rel busyloops writing 0 bytes into the appropriate places.
>
>   (Roughly, the idea is that LLVM backends supporting atomic ops would
> offer at least the following sync "styles":
>
>    AtomicOpBusyWait - for minimum latency, busy wait using atomic ops
>                       to get/put the sync token
>    AtomicOpYielding - for increased politeness, codegen to a busyloop
>                       with something like nanosleep(50000) inside.)
>
> - If forks are replaced with calls, and syncs are replaced with NOPs,
> then threaded code without any race conditions should continue to work
> correctly in a single-threaded environment.
>
> - A simple C library wrapping various OS/arch-specific threading
> primitives is all that's required to implement the basic support
> library. LLVM backends can simply lower fork and sync to calls to
> functions in this library for some basic level of support, though
> eventually the backends would probably want to emit synchronization code
> directly, for performance.
>
> 	Other random thoughts:
>
> - I don't really have a strong opinion on what to do about sync's
that
> are unpaired. We could either do some work and complain at compile-time,
> or simply have such syncs hang. The latter makes more sense to me.
>
> - An important point is that when generating code, how do you know who
> is the "forker" and who is the "forkee" if all you have
are sync
> instructions that look alike? There are a few answers:
>
>   a) If all sync tokens are compile-time constants, then it actually
> shouldn't matter (expect maybe for performance.) All tokens (and by
> extension, their respective sync instructions) will be able to be paired
> at compile-time, and questions of "which way around" simply
don't matter
> - you'll get the right semantics (since you'll be able to pair
things up
> correctly), but possibly poor performance.
>
>   b) If not, the (global) CFG can be traversed to see who is forking
> and who is being forked. The only tricky case is where this involves a
> proper graph and not a tree (i.e. two bits of code mutually forking each
> other): at that point you can either again make arbitrary forker/forkee
> decisions, or use the [how] parameter to make the distinction clear
> where it's important.
>
> - One possible gotcha with sync's optional [how] parameter is that
> (apart from having to implement all the different syncs, (possibly)
> across different platforms) one needs to bear in mind that a sync
> of one [how] type needs to be mated with a sync of a compatible [how] type.
>
> - It may be useful to add a [how] option to the fork instruction: most
> commonly used operating systems have more than one way of forking a
> thread, and sometimes the choice of which to use can be important for
> performance. (You know who you are. ;) It may be good to expose this to
> LLVM.
>
>    	OK, that's enough of a ramble for one email. Once again, if anyone
> has any comments, I'd be most grateful.
>
>
> 	Thanks in advance,
>
> 	Duraid
>
>
> P.S. I've been told that Misha Brukman implemented something similar to
> this at one point, but have been too busy^Wlazy to chase him up about it
> yet. Misha, if you read this, please yell at me! :)
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Vikram Adve

2006-Oct-30 19:27 UTC

head link

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

Duraid,

Adding support for explicit threads in LLVM would be great.  Some  
questions/comments:

1. About sync:

    (a) Optimizations, especially those that move code like PRE or  
LICM, need to do the right thing for sync.  This is because programs  
can use sync operations to ensure apparently unrelated operations  
(i.e., unrelated by dataflow) come before/after.  Code motion  
generally could work with fork() because it appears like a function  
call to either an unknown function (for intra-procedural opts) or a  
known function (for any affected interprocedural transformations).   
Perhaps defining sync to behave like a call to an unknown function  
would work but I'm not sure even that is enough.

    (b) I don't see how the run-time could ever reclaim your sync  
tokens and the underlying representation of them.  There seems to be  
no general way for either the compiler or run-time to known when a  
token is no longer needed.  pthreads defines the explicit operation  
pthread_cond_destroy for exactly this purpose.

    (c) Using int tokens also could involve significant performance  
overhead because you'd have to map from int values to run-time info  
if the space of token numbers is sparse.


2. About some missing features:

    (a) M-T programs often use operations on threads like exit, join,  
cancel, etc.  Wouldn't you need similar operations?

    (b) Similarly, atomic operations are common.  What about those?


3. Relationship to system-specific thread libs like pthreads, win32  
threads, etc.:

    What is the plan here?  Do you see your operations as a portable  
interface to these, or an alternative to be used by language  
implementations (like OpenMP), or something else?  I think there are  
several issues here, especially generality, compatibility (as in co- 
existence of code using the LLVM operations and the library  
operations), and memory model.


--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/


On Oct 30, 2006, at 3:56 AM, Duraid Madina wrote:
> Dear all,
>
> 	Recently I've wanted to add support for threads to LLVM (motivated by
> OpenMP, more or less), but before jumping in and implementing  
> anything,
> I thought it might be a good idea to describe what I have in mind and
> ask for comments. Hence this email - if anyone has any comments,  
> I'd be
> very glad to hear them.
>
>
> WHAT I'M PROPOSING:
>
> 	The addition of two instructions - fork and sync - to the basic LLVM
> language.
>
> 	1) fork
>
> syntax:     fork void <fn ptr>(<arg list>)
>
> 	fork is basically call, and it does what you think: an independent  
> flow
> of execution begins at a given function (with no speed/synchronicity
> guarantees whatsoever). The main difference is that you can *only*  
> fork
> functions that return void. The forking thread continues without
> waiting, while the forked thread either disappears when it reaches  
> a ret
> void, or continues executing forever.
>
> example:    fork void %doom(int %phd, int 666)
>
> 	2) sync
>
> syntax:     sync [how] <type> (<token list>)
>
> 	sync is the synchronization primitive: when a thread reaches a sync
> instruction, it does not proceed until all tokens in its token list  
> have
> been sync'd by other threads. e.g. If thread A issues:
>
> 	sync int (1, 2)
>
> 	then it will block until thread B issues, say:
>
> 	sync int (2, 1)
>
> 	or perhaps:
>
> 	sync int (1)
>          sync int (2)
>
> at which point both thread A and thread B will resume, or, B could  
> issue:
>
>          sync int (1, 2, 3)
>
> at which point A would resume, but B would block pending a sync(3)  
> from
> somewhere.
>
> 	The optional [how] parameter is there for performance reasons: it is
> simply a way to request that LLVM generate code to implement the  
> sync in
> a particular way, e.g.
>
> 	AtomicOpBusyWait
> 	AtomicOpYielding
> 	Signal
> 	etc..
>
>
> 	That's it - fork and sync are (I hope) all that's required.
>
>
> A PATRONIZING EXAMPLE:
>
>      Summing the numbers in an array, using two threads:
>
> %sumB = global int 0
>
> int %sumfun(%base, %offset, %count)
>    [simple loop adding numbers from base[offset] to base[offset 
> +count]]
>
> void %sumwrap(int *base, int %offset, int %count)
>    %mysum = call int %sumfun(%base, %offset, %count)
>    store int %mysum, int %sumB
>    sync ubyte(42)
>    ret void
>
> int %main(void)
>    %array = malloc [1000 x uint]
>    %sumA = call int %sumfun(%array, 0, 500)
>    fork void %sumwrap(%array, 500, 500)
>    sync ubyte (42)
>    %sum = add int %sumA, %sumB
>    ret int %sum
>
>
>     .. this example is simple enough, but there are many other
> possibilities of course. e.g. more OpenMP-like would be to use *three*
> threads, and have main() fork two workers, handing them sync tokens:
>
> void %sumwrap(int *base, int %offset, int %count, int* %result, ubyte
> %token)
>    %mysum = call int %sumfun(%base, %offset, %count)
>    store int %mysum, int* %result
>    sync ubyte(%token)
>    ret void
>
> int %main(void)
>    %array = malloc [1000 x uint]
>    fork void %sumwrap(%array,   0, 500, %sumA, 42)
>    fork void %sumwrap(%array, 500, 500, %sumB, 43)
>    sync ubyte (42, 43)
>    %sum = add int %sumA, %sumB
>    ret int %sum
>
>
>     ..etc etc.
>
>
> ASSORTED COMMENTS OF MY OWN:
>
> 	What I see as positive aspects of this proposal:
>
> - It is relatively simple.
>
> - It does not involve making any effort to hand-hold the developer:  
> this
> is *LL*VM after all; e.g. in the example above, %sumfun() needs to be
> re-entrant: if the loop counter was a single global variable,  
> obviously
> things would break down. That's fine, but OpenMP allows variables  
> to be
> explicitly marked shared or thread-private. Supporting that, IMHO,  
> is an
> issue for any OpenMP->LLVM compiler to take care of (e.g. building
> thread-specialized copies of functions/variables as necessary), rather
> than something LLVM should be talking about explicitly: changing  
> LLVM so
> that variables *at the LLVM level* could be shared or private seems  
> too
> high-level to me (not to mention being a very invasive change.)
>
> - It is flexible in terms of the actual synchronization schedules that
> can be implemented. While semantically there is just a "big bag of
> shared, atomic booleans" (sync tokens), one can use any particular  
> token
> type they like. There's no need to worry about a proliferation of
> tokens: massive sync() statements with millions of tokens could be
> replaced by log(n) many at the cost of some intermediate "sync
only"
> functions. I expect that in typical use, sync statements will never  
> get
> so large or so frequent that the supply of sync tokens would be a
> concern. Moreover, there's no need that sync tokens be compile-time
> constants: just knowing their type is enough.
>
> - It is flexible in terms of the actual synchronization code that can
> get built. Again, the semantics are just simple barriers, but there  
> are
> many different ways of implementing these, some of which are
> architecture- and/or OS-specific. sync's [how] parameter is  
> intended to
> be a way to "get the right code" in any situation where that is
> important for performance (or whatever other) reasons. For example,  
> on ia64:
>
>    sync AtomicOpBusyWait byte (1,2,3,4,5,....14,15,16)
>
>    could be codegenned to a tight busyloop using ld.acq, spinning  
> to see
> if two 8-byte words become exactly 0, while matching
>
>    sync AtomicOpBusyWait byte (x)
>
>    instructions in the worker threads code be codegenned to tight
> ld.acq/cmpxchg.rel busyloops writing 0 bytes into the appropriate  
> places.
>
>    (Roughly, the idea is that LLVM backends supporting atomic ops  
> would
> offer at least the following sync "styles":
>
>     AtomicOpBusyWait - for minimum latency, busy wait using atomic ops
>                        to get/put the sync token
>     AtomicOpYielding - for increased politeness, codegen to a busyloop
>                        with something like nanosleep(50000) inside.)
>
> - If forks are replaced with calls, and syncs are replaced with NOPs,
> then threaded code without any race conditions should continue to work
> correctly in a single-threaded environment.
>
> - A simple C library wrapping various OS/arch-specific threading
> primitives is all that's required to implement the basic support
> library. LLVM backends can simply lower fork and sync to calls to
> functions in this library for some basic level of support, though
> eventually the backends would probably want to emit synchronization  
> code
> directly, for performance.
>
> 	Other random thoughts:
>
> - I don't really have a strong opinion on what to do about sync's
that
> are unpaired. We could either do some work and complain at compile- 
> time,
> or simply have such syncs hang. The latter makes more sense to me.
>
> - An important point is that when generating code, how do you know who
> is the "forker" and who is the "forkee" if all you have
are sync
> instructions that look alike? There are a few answers:
>
>    a) If all sync tokens are compile-time constants, then it actually
> shouldn't matter (expect maybe for performance.) All tokens (and by
> extension, their respective sync instructions) will be able to be  
> paired
> at compile-time, and questions of "which way around" simply
don't
> matter
> - you'll get the right semantics (since you'll be able to pair  
> things up
> correctly), but possibly poor performance.
>
>    b) If not, the (global) CFG can be traversed to see who is forking
> and who is being forked. The only tricky case is where this involves a
> proper graph and not a tree (i.e. two bits of code mutually forking  
> each
> other): at that point you can either again make arbitrary forker/ 
> forkee
> decisions, or use the [how] parameter to make the distinction clear
> where it's important.
>
> - One possible gotcha with sync's optional [how] parameter is that
> (apart from having to implement all the different syncs, (possibly)
> across different platforms) one needs to bear in mind that a sync
> of one [how] type needs to be mated with a sync of a compatible  
> [how] type.
>
> - It may be useful to add a [how] option to the fork instruction: most
> commonly used operating systems have more than one way of forking a
> thread, and sometimes the choice of which to use can be important for
> performance. (You know who you are. ;) It may be good to expose  
> this to
> LLVM.
>
>     	OK, that's enough of a ramble for one email. Once again, if  
> anyone
> has any comments, I'd be most grateful.
>
>
> 	Thanks in advance,
>
> 	Duraid
>
>
> P.S. I've been told that Misha Brukman implemented something  
> similar to
> this at one point, but have been too busy^Wlazy to chase him up  
> about it
> yet. Misha, if you read this, please yell at me! :)
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Oct 2006 - [LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?

Reasonably Related Threads