On Mon, 2008-11-03 at 23:59 -0800, Chris Lattner wrote:> On Nov 3, 2008, at 3:55 PM, heisenbug wrote: > > What about "inventing" pseudo-constants (which point to the right > > thing) and build the piece of IR with them. When done, grab mutex and > > RAUW it in. Alternatively, submit to a privileged thread that performs > > the RAUW. > > The trick is to prepare the def/use chain(s) to a degree that the > > mutex is only held a minimal time. If only IR-builder threads are > > running concurrently there is no danger that a real constant vanishes, > > leaving behind a stale reference from a pseudo-constant. > > That could work. It would also have to be done for global values as > well, and inline asm objects etc. However, I don't see any show- > stoppers. The implementation could be tricky, but a nice property of > your approach is that the single threaded case could be made > particularly fast (instead of doing atomic ops or locking always).It might work for the IR construction phase, but not for optimization and emitting object code. The locking issue is going to be severe because it will be nearly (completely?) impossible to guarantee a globally consistent lock order for any given def/use chain. Therefore, such a solution would require a kind of high-level contention manager akin to software transactional memory (STM). Even the fastest STMs in research are much slower than locking. I think that there is a better way. I would like to propose a different solution: Lift all internalized objects to be unique at the Module level instead of globally. This will require an initial pass to be performed (called IRFinalize?), and equality of Type objects by pointer comparison will not be valid until after this pass is complete. The Module is already the unit of compilation once LLVM IR has been initially emitted for most cases, and it should be straightfoward to structure the pass such that it can work on single functions if the user is compiling at that level. The IRFinalize pass can allocate the bookkeeping storage for identifying duplicate Constants and Types and then release it so long as none of the optimization and analysis passes 1) Emit new Types or 2) are broken by duplicate Constants. -Jonathan PS: What is RAUW? I'll volunteer the clerical work of adding it to the Lexicon if you'd be kind enough to hand me a small dose of clue :)> -Chris > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Jonathan Brandmeyer wrote:> PS: What is RAUW? I'll volunteer the clerical work of adding it to the > Lexicon if you'd be kind enough to hand me a small dose of clue :)ReplaceAllUsesWith. It can be used to substitute any one Value for another (except that on a Constant, replaceUsesOfWithOnConstant should be used instead). For example, given: %x = add i32 %i, 0 an optimization would call %x->RAUW(%i). Nick
Jonathan Brandmeyer wrote:> It might work for the IR construction phase, but not for optimization > and emitting object code. The locking issue is going to be severe > because it will be nearly (completely?) impossible to guarantee a > globally consistent lock order for any given def/use chain. Therefore, > such a solution would require a kind of high-level contention manager > akin to software transactional memory (STM). Even the fastest STMs in > research are much slower than locking. I think that there is a better > way.We in the TM community are always looking for "real-world" applications that can benefit from TM for benchmarking purposes. In the past we have found that unstructured graph and worklist algorithms are a prime example where TM can really help. If you can provide a reference I would be interested in looking at the code inside LLVM for transformation/codegen passes that result in the problems that you describe. -Luke> I would like to propose a different solution: Lift all internalized > objects to be unique at the Module level instead of globally. This will > require an initial pass to be performed (called IRFinalize?), and > equality of Type objects by pointer comparison will not be valid until > after this pass is complete. The Module is already the unit of > compilation once LLVM IR has been initially emitted for most cases, and > it should be straightfoward to structure the pass such that it can work > on single functions if the user is compiling at that level. > > The IRFinalize pass can allocate the bookkeeping storage for identifying > duplicate Constants and Types and then release it so long as none of the > optimization and analysis passes 1) Emit new Types or 2) are broken by > duplicate Constants. > > -Jonathan > > PS: What is RAUW? I'll volunteer the clerical work of adding it to the > Lexicon if you'd be kind enough to hand me a small dose of clue :) > >> -Chris >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Nov 6, 4:22 am, Jonathan Brandmeyer <jbrandme... at earthlink.net> wrote:> On Mon, 2008-11-03 at 23:59 -0800, Chris Lattner wrote: > > On Nov 3, 2008, at 3:55 PM, heisenbug wrote: > > > What about "inventing" pseudo-constants (which point to the right > > > thing) and build the piece of IR with them. When done, grab mutex and > > > RAUW it in. Alternatively, submit to a privileged thread that performs > > > the RAUW. > > > The trick is to prepare the def/use chain(s) to a degree that the > > > mutex is only held a minimal time. If only IR-builder threads are > > > running concurrently there is no danger that a real constant vanishes, > > > leaving behind a stale reference from a pseudo-constant. > > > That could work. It would also have to be done for global values as > > well, and inline asm objects etc. However, I don't see any show- > > stoppers. The implementation could be tricky, but a nice property of > > your approach is that the single threaded case could be made > > particularly fast (instead of doing atomic ops or locking always). > > It might work for the IR construction phase, but not for optimization > and emitting object code. The locking issue is going to be severeWhen optimization is done, you can certainly go on and emit code in parallel for many functions. Since the IR is not mutated any more, nothing needs to be locked. I believe the SDGraphs can be isolated from each other. But Dan may know better... Cheers, Gabor> because it will be nearly (completely?) impossible to guarantee a > globally consistent lock order for any given def/use chain. Therefore, > such a solution would require a kind of high-level contention manager > akin to software transactional memory (STM). Even the fastest STMs in > research are much slower than locking. I think that there is a better > way. > > I would like to propose a different solution: Lift all internalized > objects to be unique at the Module level instead of globally. This will > require an initial pass to be performed (called IRFinalize?), and > equality of Type objects by pointer comparison will not be valid until > after this pass is complete. The Module is already the unit of > compilation once LLVM IR has been initially emitted for most cases, and > it should be straightfoward to structure the pass such that it can work > on single functions if the user is compiling at that level. > > The IRFinalize pass can allocate the bookkeeping storage for identifying > duplicate Constants and Types and then release it so long as none of the > optimization and analysis passes 1) Emit new Types or 2) are broken by > duplicate Constants. > > -Jonathan > > PS: What is RAUW? I'll volunteer the clerical work of adding it to the > Lexicon if you'd be kind enough to hand me a small dose of clue :) > > > -Chris > > _______________________________________________ > > LLVM Developers mailing list > > LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.edu > >http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVM... at cs.uiuc.edu http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Nov 6, 2008, at 1:15 PM, heisenbug wrote:> On Nov 6, 4:22 am, Jonathan Brandmeyer <jbrandme... at earthlink.net> > wrote: >> On Mon, 2008-11-03 at 23:59 -0800, Chris Lattner wrote: >>> On Nov 3, 2008, at 3:55 PM, heisenbug wrote: >>>> What about "inventing" pseudo-constants (which point to the right >>>> thing) and build the piece of IR with them. When done, grab mutex >>>> and >>>> RAUW it in. Alternatively, submit to a privileged thread that >>>> performs >>>> the RAUW. >>>> The trick is to prepare the def/use chain(s) to a degree that the >>>> mutex is only held a minimal time. If only IR-builder threads are >>>> running concurrently there is no danger that a real constant >>>> vanishes, >>>> leaving behind a stale reference from a pseudo-constant. >> >>> That could work. It would also have to be done for global values as >>> well, and inline asm objects etc. However, I don't see any show- >>> stoppers. The implementation could be tricky, but a nice property >>> of >>> your approach is that the single threaded case could be made >>> particularly fast (instead of doing atomic ops or locking always). >> >> It might work for the IR construction phase, but not for optimization >> and emitting object code. The locking issue is going to be severe > > When optimization is done, you can certainly go on and emit code > in parallel for many functions. Since the IR is not mutated any > more, nothing needs to be locked. I believe the SDGraphs can be > isolated from each other. But Dan may know better...SelectionDAGs have a few things that are globally uniqued as well, such as type lists, but those could be fixed if needed. Also, the pool-allocated memory is currently reused for each SelectionDAG, though that could easily be reorganized in order to allow multiple SelectionDAGs to be processed at the same time. SelectionDAGs do contain references back to the LLVM IR for some things, and there are some situations where new LLVM IR Constants are created during SelectionDAG processing. Dan