On Mon, 9 Aug 2010 10:17:27 -0700 Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Aug 9, 2010, at 9:54 AM, Török Edwin wrote: > > > With mmap() it is always possible to fully release the memory once > > you are done using it. > > Sure. Is that the goal, though?If goal is to reduce fragmentation, possibly. You don't know if you have fragmentation or not, the JITed app may fragment memory for example.> Why isn't malloc doing it already?Because it can't. sbrk() can only increase/decrease memory usage at the end (like a stack), you can't release something in the middle. Thats one of the reasons why we wrote a pool-based memory allocator for ClamAV.> > > With malloc() no, it takes just 1 allocation at the end of the heap > > to keep all the rest allocated. That wouldn't be a problem if libc > > would use mmap() as the low-level allocator for malloc but it > > doesn't. It uses sbrk() mostly for small (<128k) allocations, and > > even with mmaps it caches them for a while. > > Recommended reading: > http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdfIf jemalloc provides same or better memory usage than MMapAllocator, I think it'd be better to have a JEMallocAllocator instead. I think jemalloc is fairly portable (firefox uses it), isn't it?> > > I think that is because mmap() is slow in multithreaded apps, since > > it needs to take a process level lock, which also contends with the > > lock taken by pagefaults from other existing mmaps (in fact that > > lock is held during disk I/O!). > > Sounds awesome, let's do that ;-)Multithreaded performance should probably be benchmarked on a real app with MMapAllocator, and with the MallocAllocator.> > You are also leaving a bunch of 4K holes in your address space. On > 32-bit systems, address space is a scarce resource.Doesn't BumpPtrAllocator use a larger chunk size? Best regards, --Edwin
2010/8/9 Török Edwin <edwintorok at gmail.com>:> On Mon, 9 Aug 2010 10:17:27 -0700 > Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > >> >> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote: >> >> > With mmap() it is always possible to fully release the memory once >> > you are done using it. >> >> Sure. Is that the goal, though? > > If goal is to reduce fragmentation, possibly. You > don't know if you have fragmentation or not, the JITed app may fragment > memory for example.Yes, the goal is to fully release the memory back to the OS.>> Why isn't malloc doing it already? > > Because it can't. sbrk() can only increase/decrease memory usage at the > end (like a stack), you can't release something in the middle. > Thats one of the reasons why we wrote a pool-based memory allocator for > ClamAV.Another thing malloc could do is to use madvise with MADV_DONTNEED to free the pages in the middle of t heap, but malloc can't read your mind, so it doesn't know that you aren't about to reallocate that region of the heap.>> >> > With malloc() no, it takes just 1 allocation at the end of the heap >> > to keep all the rest allocated. That wouldn't be a problem if libc >> > would use mmap() as the low-level allocator for malloc but it >> > doesn't. It uses sbrk() mostly for small (<128k) allocations, and >> > even with mmaps it caches them for a while. >> >> Recommended reading: >> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf > > If jemalloc provides same or better memory usage than > MMapAllocator, I think it'd be better to have a JEMallocAllocator > instead. > I think jemalloc is fairly portable (firefox uses it), isn't it?Reading the abstract, jemalloc seems like it has nothing to do with keeping the total heap usage low and everything to do with performance in a multithreaded app.>> > I think that is because mmap() is slow in multithreaded apps, since >> > it needs to take a process level lock, which also contends with the >> > lock taken by pagefaults from other existing mmaps (in fact that >> > lock is held during disk I/O!). >> >> Sounds awesome, let's do that ;-) > > Multithreaded performance should probably be benchmarked on a real app > with MMapAllocator, and with the MallocAllocator.I predict that mmap will be slower than malloc, for obvious reasons. The only way in which mmap could be better is that it reduces your steady state heap usage.>> You are also leaving a bunch of 4K holes in your address space. On >> 32-bit systems, address space is a scarce resource. > > Doesn't BumpPtrAllocator use a larger chunk size?Nope, it defaults to 4K. IMO that should be bumped up (pun wasn't intended, but then I left it in...). Especially if we want to use mmap as the allocator, increasing the slab size will reduce the number of expensive system calls that grab the process lock. Reid
So just to try to sum up, the goal of this kind of change is to reduce long-term heap usage in applications that JIT code once at startup, and then enter a steady state of doing work. Think of server applications written in any JITed language, or the finance applications I keep seeing pop up on this list. I would say that LLVM sees the *most* use as a static, ahead-of-time compiler, where code generation occurs continously throughout the process's life until it exits. In that kind of application, I would presume that malloc is preferable to mmap, for all of the reasons that Jakob described, ie avoiding address space fragmentation, avoiding heavy-weight system calls, and (in the future, maybe?) multi-threaded performance. What I'm asking is whether it's worth adding code to LLVM to support reducing memory usage for applications with the first use case. Reid
On Mon, 9 Aug 2010 11:00:03 -0700 Reid Kleckner <reid.kleckner at gmail.com> wrote:> 2010/8/9 Török Edwin <edwintorok at gmail.com>: > > On Mon, 9 Aug 2010 10:17:27 -0700 > > Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote: > > > >> > >> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote: > >> > >> > With mmap() it is always possible to fully release the memory > >> > once you are done using it. > >> > >> Sure. Is that the goal, though? > > > > If goal is to reduce fragmentation, possibly. You > > don't know if you have fragmentation or not, the JITed app may > > fragment memory for example. > > Yes, the goal is to fully release the memory back to the OS. > > >> Why isn't malloc doing it already? > > > > Because it can't. sbrk() can only increase/decrease memory usage at > > the end (like a stack), you can't release something in the middle. > > Thats one of the reasons why we wrote a pool-based memory allocator > > for ClamAV. > > Another thing malloc could do is to use madvise with MADV_DONTNEED to > free the pages in the middle of t heap, but malloc can't read your > mind, so it doesn't know that you aren't about to reallocate that > region of the heap. > > >> > >> > With malloc() no, it takes just 1 allocation at the end of the > >> > heap to keep all the rest allocated. That wouldn't be a problem > >> > if libc would use mmap() as the low-level allocator for malloc > >> > but it doesn't. It uses sbrk() mostly for small (<128k) > >> > allocations, and even with mmaps it caches them for a while. > >> > >> Recommended reading: > >> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf > > > > If jemalloc provides same or better memory usage than > > MMapAllocator, I think it'd be better to have a JEMallocAllocator > > instead. > > I think jemalloc is fairly portable (firefox uses it), isn't it? > > Reading the abstract, jemalloc seems like it has nothing to do with > keeping the total heap usage low and everything to do with performance > in a multithreaded app."In late 2007, the Mozilla Project was hard at work improving Firefox's memory usage for the 3.0 release, and jemalloc was used to solve fragmentation problems for Firefox on Microsoft Windows platforms. You can read here about the fruits of that labor."> > >> > I think that is because mmap() is slow in multithreaded apps, > >> > since it needs to take a process level lock, which also contends > >> > with the lock taken by pagefaults from other existing mmaps (in > >> > fact that lock is held during disk I/O!). > >> > >> Sounds awesome, let's do that ;-) > > > > Multithreaded performance should probably be benchmarked on a real > > app with MMapAllocator, and with the MallocAllocator. > > I predict that mmap will be slower than malloc, for obvious reasons. > The only way in which mmap could be better is that it reduces your > steady state heap usage.Did you try jemalloc though? AFAIK it can act as a drop-in replacement for malloc().> > >> You are also leaving a bunch of 4K holes in your address space. On > >> 32-bit systems, address space is a scarce resource. > > > > Doesn't BumpPtrAllocator use a larger chunk size? > > Nope, it defaults to 4K. IMO that should be bumped up (pun wasn't > intended, but then I left it in...). Especially if we want to use > mmap as the allocator, increasing the slab size will reduce the number > of expensive system calls that grab the process lock.Agreed. Best regards, --Edwin