thr3ads.net - llvm dev - [LLVMdev] Garbage collection [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Gabor Greif

2009-Feb-26 16:22 UTC

[LLVMdev] Garbage collection

On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com>
wrote:> Hello,
>
> 2009/2/26 Talin <viri... at gmail.com>
>
> > The IR-level intrinsics themselves don't much help you *write* a
GC, so
> > much as to integrate one with LLVM. What is provided is essentially a
> > mechanism for walking the stack, and a means to insert read/write
> > barriers into the generated code, which together form a tiny fraction
of
> > what it would take to design a working collector.
>
> <...>
>
>
>
> > In other words, I think that it should be possible to create a fairly
> > comprehensive and efficient collector implementation in which 80% of
the
> > implementation lives in a library, and the language front end
generates
> > the remaining 20%. Moreover, it should be possible to design this
> > library in a way that makes it possible for people to replace
> > significant parts of the collector, so that it can be used as a basis
> > for developing a diverse selection of collection strategies. And
because
> > all of the parts that are specific to the object model are generated
by
> > the front-end, the collector should work for a large number of
languages
> > with different object semantics.
>
> IMO LLVM should try to keep its <L>ow <L>evel design. Garbage
collection is
> already very high level.
> A library which supports building garbage collectors on the other hand
would
> be very helpful of course.
> Basically such a library boils down to an operatic system abstraction
> library. Functions like "stop the world" are completely managed
by the OS
> and not the CPU. I'm not sure if such functionality is part of LLVMs
goals.
>
> Having that said; functions like :
>
> - Enumerate threads
> - Pause/Resume thread
> - Get root pointers for a thread (including registers)
> - Get a list of modified memory-pages (GetWriteWatch in Windows - used in
> the .net GC)
> - ...
>
> for different platforms - would definitely help building a GC. Just look at
> the source code of the Boehm GC: It's a completely unmaintainable mess
of
> #ifdefs
>
> A little bit off topic: Has anybody tried building a concurrent GC -
running
> in a different _process_, instead of a thread?
Yes, I had a proof of concept implementation of a GC with
- shared memory as the GC arena,
- (C++) throw-catch-based marking
- simple lookup rules for (in-arena) associated
  instance metadata.

I never had the need to finish the implementation, but
the fork approach worked reasonably well, and the mark
and sweep parts ran in the forked process, with the
shared memory communicating back the collection results.

The most amusing thing was to see how the stack unwinding
in the forked process did the marking and the original process
was not harmed.

I hope to extend the concept to multiple threads by (m)protecting
increasing parts of the arena and hoping that all threads get
caught with time. Finally the last running threads must be
stopped and forced to fork. This last question and how to recover
the threads from the protection violations in the original process
are the big open questions to be solved.

Cheers,

    Gabor
> The idea: To perform a collection you do a fork(). The child process
> collects all unreferenced memory regions and reports them back to the
parent
> process. The parrent process waits for the result (in a sperate thread) and
> if it gets the result it frees the memory regions.
> This way you do not have to worry about barriers and all the nasty stuff.
> The maximum pause time is close to zero.
> Of course this is only useful with a working "copy on write"
implementation
> of fork().
>
> - Ralf
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu      
 http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Török Edwin

2009-Feb-26 16:40 UTC

head link

[LLVMdev] Garbage collection

On 2009-02-26 18:22, Gabor Greif wrote:> On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com> wrote:
>   
>> Hello,
>>
>> 2009/2/26 Talin <viri... at gmail.com>
>>
>>     
>>> The IR-level intrinsics themselves don't much help you *write*
a GC, so
>>> much as to integrate one with LLVM. What is provided is essentially
a
>>> mechanism for walking the stack, and a means to insert read/write
>>> barriers into the generated code, which together form a tiny
fraction of
>>> what it would take to design a working collector.
>>>       
>> <...>
>>
>>
>>
>>     
>>> In other words, I think that it should be possible to create a
fairly
>>> comprehensive and efficient collector implementation in which 80%
of the
>>> implementation lives in a library, and the language front end
generates
>>> the remaining 20%. Moreover, it should be possible to design this
>>> library in a way that makes it possible for people to replace
>>> significant parts of the collector, so that it can be used as a
basis
>>> for developing a diverse selection of collection strategies. And
because
>>> all of the parts that are specific to the object model are
generated by
>>> the front-end, the collector should work for a large number of
languages
>>> with different object semantics.
>>>       
>> IMO LLVM should try to keep its <L>ow <L>evel design.
Garbage collection is
>> already very high level.
>> A library which supports building garbage collectors on the other hand
would
>> be very helpful of course.
>> Basically such a library boils down to an operatic system abstraction
>> library. Functions like "stop the world" are completely
managed by the OS
>> and not the CPU. I'm not sure if such functionality is part of
LLVMs goals.
>>
>> Having that said; functions like :
>>
>> - Enumerate threads
>> - Pause/Resume thread
>> - Get root pointers for a thread (including registers)
>> - Get a list of modified memory-pages (GetWriteWatch in Windows - used
in
>> the .net GC)
>> - ...
>>
>> for different platforms - would definitely help building a GC. Just
look at
>> the source code of the Boehm GC: It's a completely unmaintainable
mess of
>> #ifdefs
>>
>> A little bit off topic: Has anybody tried building a concurrent GC -
running
>> in a different _process_, instead of a thread?
>>     
>
> Yes, I had a proof of concept implementation of a GC with
> - shared memory as the GC arena,
> - (C++) throw-catch-based marking
> - simple lookup rules for (in-arena) associated
>   instance metadata.
>
> I never had the need to finish the implementation, but
> the fork approach worked reasonably well, and the mark
> and sweep parts ran in the forked process, with the
> shared memory communicating back the collection results.
>
> The most amusing thing was to see how the stack unwinding
> in the forked process did the marking and the original process
> was not harmed.
>
> I hope to extend the concept to multiple threads by (m)protecting
> increasing parts of the arena and hoping that all threads get
> caught with time. Finally the last running threads must be
> stopped and forced to fork. This last question and how to recover
> the threads from the protection violations in the original process
> are the big open questions to be solved.
What do you do if the program is multi-threaded? fork()-ing a
multithreaded program can lead to problems ...
The functions you can call after forking is very limited according to
POSIX: "If a multi-threaded process calls /fork/(), the new process
shall contain a replica of the calling thread and its entire address
space, possibly including the states of mutexes and other resources.
Consequently, to avoid errors, the child process may only execute
async-signal-safe operations until such time as one of the /exec
<http://www.opengroup.org/onlinepubs/9699919799/functions/exec.html>/
functions is called."

Best regards,
--Edwin

Ralf Schneider

2009-Feb-26 17:42 UTC

head link

[LLVMdev] Garbage collection

2009/2/26 Török Edwin <edwintorok at gmail.com>
> What do you do if the program is multi-threaded? fork()-ing a
> multithreaded program can lead to problems ...
> The functions you can call after forking is very limited according to
> POSIX: "If a multi-threaded process calls /fork/(), the new process
> shall contain a replica of the calling thread and its entire address
> space, possibly including the states of mutexes and other resources.
> Consequently, to avoid errors, the child process may only execute
> async-signal-safe operations until such time as one of the /exec
> <http://www.opengroup.org/onlinepubs/9699919799/functions/exec.html>/
> functions is called."

Well, I have alread thought a little bit about this problem.
If I remember right only the thread calling fork will remain active in the
child process. Thats exactly what we want.
In pseudocode it could look like this.

Garbage collection thread:
    shared_mem = create_shared_memory()
    loop:
        wait_for_garbage_collection_signal()    # wait until a garbage
collection is requeted
        pause_all_other_threads()    # stop the world
        root_pointers = get_root_pointers_from_all_threads()
        resume_all_other_threads()
        child_pid = fork()
        if(!child_pid):
            # child process - all threads except of the current one has been
stopped by fork()
            collect_garbage(root_pointers, shared_memory)    # collect
unreferenced memory blocks, put the result in shared_memory
            exit()
        else:
            wait_for_process_exit(child_pid)
            free_memory_blocks(shared_mem)    # free all memory blocks the
GC has found

=> There is probably no problem with mutexes and other synchronization
primitieves, because we simply do not care in the garbage collection thread

But may be I'm missing something.

Regards,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090226/b067b769/attachment.html>

David Vandevoorde

2009-Feb-26 18:27 UTC

head link

[LLVMdev] Garbage collection

On Feb 26, 2009, at 11:40 AM, Török Edwin wrote:
[...]>>> A little bit off topic: Has anybody tried building a concurrent GC
>>> - running
>>> in a different _process_, instead of a thread?
>>>
>>
>> Yes, I had a proof of concept implementation of a GC with
>> - shared memory as the GC arena,
>> - (C++) throw-catch-based marking
>> - simple lookup rules for (in-arena) associated
>>  instance metadata.
>>
>> I never had the need to finish the implementation, but
>> the fork approach worked reasonably well, and the mark
>> and sweep parts ran in the forked process, with the
>> shared memory communicating back the collection results.
>>
>> The most amusing thing was to see how the stack unwinding
>> in the forked process did the marking and the original process
>> was not harmed.
>>
>> I hope to extend the concept to multiple threads by (m)protecting
>> increasing parts of the arena and hoping that all threads get
>> caught with time. Finally the last running threads must be
>> stopped and forced to fork. This last question and how to recover
>> the threads from the protection violations in the original process
>> are the big open questions to be solved.
>
> What do you do if the program is multi-threaded? fork()-ing a
> multithreaded program can lead to problems ...
> The functions you can call after forking is very limited according to
> POSIX: "If a multi-threaded process calls /fork/(), the new process
> shall contain a replica of the calling thread and its entire address
> space, possibly including the states of mutexes and other resources.
> Consequently, to avoid errors, the child process may only execute
> async-signal-safe operations until such time as one of the /exec
> <http://www.opengroup.org/onlinepubs/9699919799/functions/exec.html>/
> functions is called."
My understanding is that the child process would basically only take  
care of the "mark" phase (and return results via shared memory for a  
"lazy sweep" by the allcotor).  Do you anticipate async-signal-unsafe
operations would be required for that?

	Daveed

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Feb 2009 - [LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

[LLVMdev] Garbage collection

Reasonably Related Threads