thr3ads.net - llvm dev - [LLVMdev] MmapAllocator [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Török Edwin

2010-Aug-09 17:39 UTC

[LLVMdev] MmapAllocator

On Mon, 9 Aug 2010 10:17:27 -0700
Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> 
> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote:
> 
> > With mmap() it is always possible to fully release the memory once
> > you are done using it.
> 
> Sure. Is that the goal, though? 
If goal is to reduce fragmentation, possibly. You
don't know if you have fragmentation or not, the JITed app may fragment
memory for example.
> Why isn't malloc doing it already?
Because it can't. sbrk() can only increase/decrease memory usage at the
end (like a stack), you can't release something in the middle.
Thats one of the reasons why we wrote a pool-based memory allocator for
ClamAV.
> 
> > With malloc() no, it takes just 1 allocation at the end of the heap
> > to keep all the rest allocated. That wouldn't be a problem if libc
> > would use mmap() as the low-level allocator for malloc but it
> > doesn't. It uses sbrk() mostly for small (<128k) allocations,
and
> > even with mmaps it caches them for a while.
> 
> Recommended reading:
> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
If jemalloc provides same or better memory usage than
MMapAllocator, I think it'd be better to have a JEMallocAllocator
instead.
I think jemalloc is fairly portable (firefox uses it), isn't it?
> 
> > I think that is because mmap() is slow in multithreaded apps, since
> > it needs to take a process level lock, which also contends with the
> > lock taken by pagefaults from other existing mmaps (in fact that
> > lock is held during disk I/O!).
> 
> Sounds awesome, let's do that ;-)
Multithreaded performance should probably be benchmarked on a real app
with MMapAllocator, and with the MallocAllocator.
> 
> You are also leaving a bunch of 4K holes in your address space. On
> 32-bit systems, address space is a scarce resource.
Doesn't BumpPtrAllocator use a larger chunk size?

Best regards,
--Edwin

Reid Kleckner

2010-Aug-09 18:00 UTC

head link

[LLVMdev] MmapAllocator

2010/8/9 Török Edwin <edwintorok at gmail.com>:> On Mon, 9 Aug 2010 10:17:27 -0700
> Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
>>
>> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote:
>>
>> > With mmap() it is always possible to fully release the memory once
>> > you are done using it.
>>
>> Sure. Is that the goal, though?
>
> If goal is to reduce fragmentation, possibly. You
> don't know if you have fragmentation or not, the JITed app may fragment
> memory for example.
Yes, the goal is to fully release the memory back to the OS.
>> Why isn't malloc doing it already?
>
> Because it can't. sbrk() can only increase/decrease memory usage at the
> end (like a stack), you can't release something in the middle.
> Thats one of the reasons why we wrote a pool-based memory allocator for
> ClamAV.
Another thing malloc could do is to use madvise with MADV_DONTNEED to
free the pages in the middle of t heap, but malloc can't read your
mind, so it doesn't know that you aren't about to reallocate that
region of the heap.
>>
>> > With malloc() no, it takes just 1 allocation at the end of the
heap
>> > to keep all the rest allocated. That wouldn't be a problem if
libc
>> > would use mmap() as the low-level allocator for malloc but it
>> > doesn't. It uses sbrk() mostly for small (<128k)
allocations, and
>> > even with mmaps it caches them for a while.
>>
>> Recommended reading:
>> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
>
> If jemalloc provides same or better memory usage than
> MMapAllocator, I think it'd be better to have a JEMallocAllocator
> instead.
> I think jemalloc is fairly portable (firefox uses it), isn't it?
Reading the abstract, jemalloc seems like it has nothing to do with
keeping the total heap usage low and everything to do with performance
in a multithreaded app.
>> > I think that is because mmap() is slow in multithreaded apps,
since
>> > it needs to take a process level lock, which also contends with
the
>> > lock taken by pagefaults from other existing mmaps (in fact that
>> > lock is held during disk I/O!).
>>
>> Sounds awesome, let's do that ;-)
>
> Multithreaded performance should probably be benchmarked on a real app
> with MMapAllocator, and with the MallocAllocator.
I predict that mmap will be slower than malloc, for obvious reasons.
The only way in which mmap could be better is that it reduces your
steady state heap usage.
>> You are also leaving a bunch of 4K holes in your address space. On
>> 32-bit systems, address space is a scarce resource.
>
> Doesn't BumpPtrAllocator use a larger chunk size?
Nope, it defaults to 4K.  IMO that should be bumped up (pun wasn't
intended, but then I left it in...).  Especially if we want to use
mmap as the allocator, increasing the slab size will reduce the number
of expensive system calls that grab the process lock.

Reid

Reid Kleckner

2010-Aug-09 18:05 UTC

head link

[LLVMdev] MmapAllocator

So just to try to sum up, the goal of this kind of change is to reduce
long-term heap usage in applications that JIT code once at startup,
and then enter a steady state of doing work.  Think of server
applications written in any JITed language, or the finance
applications I keep seeing pop up on this list.

I would say that LLVM sees the *most* use as a static, ahead-of-time
compiler, where code generation occurs continously throughout the
process's life until it exits.  In that kind of application, I would
presume that malloc is preferable to mmap, for all of the reasons that
Jakob described, ie avoiding address space fragmentation, avoiding
heavy-weight system calls, and (in the future, maybe?) multi-threaded
performance.

What I'm asking is whether it's worth adding code to LLVM to support
reducing memory usage for applications with the first use case.

Reid

Török Edwin

2010-Aug-09 18:09 UTC

head link

[LLVMdev] MmapAllocator

On Mon, 9 Aug 2010 11:00:03 -0700
Reid Kleckner <reid.kleckner at gmail.com> wrote:
> 2010/8/9 Török Edwin <edwintorok at gmail.com>:
> > On Mon, 9 Aug 2010 10:17:27 -0700
> > Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> >
> >>
> >> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote:
> >>
> >> > With mmap() it is always possible to fully release the memory
> >> > once you are done using it.
> >>
> >> Sure. Is that the goal, though?
> >
> > If goal is to reduce fragmentation, possibly. You
> > don't know if you have fragmentation or not, the JITed app may
> > fragment memory for example.
> 
> Yes, the goal is to fully release the memory back to the OS.
> 
> >> Why isn't malloc doing it already?
> >
> > Because it can't. sbrk() can only increase/decrease memory usage
at
> > the end (like a stack), you can't release something in the middle.
> > Thats one of the reasons why we wrote a pool-based memory allocator
> > for ClamAV.
> 
> Another thing malloc could do is to use madvise with MADV_DONTNEED to
> free the pages in the middle of t heap, but malloc can't read your
> mind, so it doesn't know that you aren't about to reallocate that
> region of the heap.
> 
> >>
> >> > With malloc() no, it takes just 1 allocation at the end of
the
> >> > heap to keep all the rest allocated. That wouldn't be a
problem
> >> > if libc would use mmap() as the low-level allocator for
malloc
> >> > but it doesn't. It uses sbrk() mostly for small
(<128k)
> >> > allocations, and even with mmaps it caches them for a while.
> >>
> >> Recommended reading:
> >> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
> >
> > If jemalloc provides same or better memory usage than
> > MMapAllocator, I think it'd be better to have a JEMallocAllocator
> > instead.
> > I think jemalloc is fairly portable (firefox uses it), isn't it?
> 
> Reading the abstract, jemalloc seems like it has nothing to do with
> keeping the total heap usage low and everything to do with performance
> in a multithreaded app.
"In late 2007, the Mozilla Project was hard at work improving Firefox's
memory usage for the 3.0 release, and jemalloc was used to solve
fragmentation problems for Firefox on Microsoft Windows platforms. You
can read here about the fruits of that labor."

> 
> >> > I think that is because mmap() is slow in multithreaded apps,
> >> > since it needs to take a process level lock, which also
contends
> >> > with the lock taken by pagefaults from other existing mmaps
(in
> >> > fact that lock is held during disk I/O!).
> >>
> >> Sounds awesome, let's do that ;-)
> >
> > Multithreaded performance should probably be benchmarked on a real
> > app with MMapAllocator, and with the MallocAllocator.
> 
> I predict that mmap will be slower than malloc, for obvious reasons.
> The only way in which mmap could be better is that it reduces your
> steady state heap usage.
Did you try jemalloc though? AFAIK it can act as a drop-in replacement
for malloc().
> 
> >> You are also leaving a bunch of 4K holes in your address space. On
> >> 32-bit systems, address space is a scarce resource.
> >
> > Doesn't BumpPtrAllocator use a larger chunk size?
> 
> Nope, it defaults to 4K.  IMO that should be bumped up (pun wasn't
> intended, but then I left it in...).  Especially if we want to use
> mmap as the allocator, increasing the slab size will reduce the number
> of expensive system calls that grab the process lock.
Agreed.

Best regards,
--Edwin

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Aug 2010 - [LLVMdev] MmapAllocator

[LLVMdev] MmapAllocator

[LLVMdev] MmapAllocator

[LLVMdev] MmapAllocator

[LLVMdev] MmapAllocator

Reasonably Related Threads