On Mon, 9 Aug 2010 09:36:53 -0700 Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Aug 8, 2010, at 9:20 PM, Reid Kleckner wrote: > > > I thought I dug into the register allocation code, and found the > > VNInfo::Allocator typedef. I assumed that was getting the traffic > > we saw in Instruments, but I don't have the data to back that up. > > Are you using llvm from trunk? VNInfo is a lot smaller now than it > was in 2.7. I would guess about a third of the liveness memory usage > goes through the VNInfo BumpPtrAllocator. > > [...] > > >> By calling mmap directly, you are throwing all that system > >> specific knowledge away. > > > > So the goal of this particular modification was to find ways to > > return large, one-time allocations that happen during compilation > > back the OS. For unladen-swallow, we have a long-lived Python > > process where we JIT code every so often. We happen to generate an > > ungodly amount of code, which we're trying to reduce. However, > > this means that LLVM allocates a lot of memory for us, and it grows > > our heap by several MB over what it would normally be. The > > breakdown was roughly 8 MB gets allocated for this one compilation > > in the spam_bayes benchmark, with 2 MB coming form register > > allocation and 2 MB from SDNodes. > > > > We are looking at using mmap/munmap to avoid permanently growing > > the heap. > > Don't try to outsmart malloc, you are going to lose ;-) > > This all depends on specific kernel implementation details, but you > risk badly fragmenting your address space, and chances are the kernel > is not going to handle that well. You are using the kernel as a > memory manager, but the kernel wants to be used as a dumb slab > allocator for malloc. > > I assume that LLVM is properly freeing memory after jitting? > Otherwise, that should be looked at. > > So why isn't your malloc returning the memory to the OS? > > Is it because malloc thinks you might be needing that memory soon > anyway? Is it correct? > > Does your malloc know that you are running with very little memory, > and the system badly needs those 8MB? Maybe your malloc needs to be > tuned for a small device? > > Is LLVM leaving a fragmented heap behind.With mmap() it is always possible to fully release the memory once you are done using it. With malloc() no, it takes just 1 allocation at the end of the heap to keep all the rest allocated. That wouldn't be a problem if libc would use mmap() as the low-level allocator for malloc but it doesn't. It uses sbrk() mostly for small (<128k) allocations, and even with mmaps it caches them for a while. I think that is because mmap() is slow in multithreaded apps, since it needs to take a process level lock, which also contends with the lock taken by pagefaults from other existing mmaps (in fact that lock is held during disk I/O!). Best regards, --Edwin
On Aug 9, 2010, at 9:54 AM, Török Edwin wrote:> With mmap() it is always possible to fully release the memory once you > are done using it.Sure. Is that the goal, though? Why isn't malloc doing it already?> With malloc() no, it takes just 1 allocation at the end of the heap to > keep all the rest allocated. That wouldn't be a problem if libc would > use mmap() as the low-level allocator for malloc but it doesn't. > It uses sbrk() mostly for small (<128k) allocations, and even with > mmaps it caches them for a while.Recommended reading: http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf> I think that is because mmap() is slow in multithreaded apps, since it > needs to take a process level lock, which also contends with the lock > taken by pagefaults from other existing mmaps (in fact that lock is held > during disk I/O!).Sounds awesome, let's do that ;-) You are also leaving a bunch of 4K holes in your address space. On 32-bit systems, address space is a scarce resource. /jakob
On Mon, 9 Aug 2010 10:17:27 -0700 Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:> > On Aug 9, 2010, at 9:54 AM, Török Edwin wrote: > > > With mmap() it is always possible to fully release the memory once > > you are done using it. > > Sure. Is that the goal, though?If goal is to reduce fragmentation, possibly. You don't know if you have fragmentation or not, the JITed app may fragment memory for example.> Why isn't malloc doing it already?Because it can't. sbrk() can only increase/decrease memory usage at the end (like a stack), you can't release something in the middle. Thats one of the reasons why we wrote a pool-based memory allocator for ClamAV.> > > With malloc() no, it takes just 1 allocation at the end of the heap > > to keep all the rest allocated. That wouldn't be a problem if libc > > would use mmap() as the low-level allocator for malloc but it > > doesn't. It uses sbrk() mostly for small (<128k) allocations, and > > even with mmaps it caches them for a while. > > Recommended reading: > http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdfIf jemalloc provides same or better memory usage than MMapAllocator, I think it'd be better to have a JEMallocAllocator instead. I think jemalloc is fairly portable (firefox uses it), isn't it?> > > I think that is because mmap() is slow in multithreaded apps, since > > it needs to take a process level lock, which also contends with the > > lock taken by pagefaults from other existing mmaps (in fact that > > lock is held during disk I/O!). > > Sounds awesome, let's do that ;-)Multithreaded performance should probably be benchmarked on a real app with MMapAllocator, and with the MallocAllocator.> > You are also leaving a bunch of 4K holes in your address space. On > 32-bit systems, address space is a scarce resource.Doesn't BumpPtrAllocator use a larger chunk size? Best regards, --Edwin