Mark Martinec
2018-Aug-04 18:38 UTC
All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64
2018-08-04 19:01, Mark Johnston wrote:> I think running "zpool list" is adding a lot of noise to the output. > Could you retry without doing that?No, like I said previously, the "zpool list" (with one defunct zfs pool) *is* the sole culprit of the zfs memory leak. With each invocation of "zpool list" the "solaris" malloc jumps up by the same amount, and never ever drops. Without running it (like repeatedly under 'telegraf' monitoring of zfs), the machine runs normally and never runs out of memory, the "solaris" malloc count no longer grows steadily. This leak was introduced sometime between 10.3 and 11.1R-p11, and is still there with 11.2. Mark> On Fri, Aug 03, 2018 at 09:11:42PM +0200, Mark Martinec wrote: >> More attempts at tracking this down. The suggested dtrace command does >> usually abort with: >> >> Assertion failed: (buf->dtbd_timestamp >= first_timestamp), >> file >> /usr/src/cddl/contrib/opensolaris/lib/libdtrace/common/dt_consume.c, >> line 3330. > > Hrmm. As a workaround you can add "-x temporal=off" to the dtrace(1) > invocation. > >> but with some luck soon after each machine reboot I can leave the >> dtrace >> running for about 10 or 20 seconds (max) before terminating it with a >> ^C, >> and succeed in collecting the report. If I miss the opportunity to >> leave >> dtrace running just long enough to collect useful info, but not long >> enough for it to hit the assertion check, then any further attempt >> to run the dtrace script hits the assertion fault immediately. >> >> Btw, (just in case) I have recompiled kernel from source >> (base/release/11.2.0) >> with debugging symbols, although the behaviour has not changed: >> >> FreeBSD floki.ijs.si 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r337238: >> Fri Aug 3 17:29:42 CEST 2018 >> mark at xxx.ijs.si:/usr/obj/usr/src/sys/FLOKI amd64 >> >> >> Anyway, after several attempts I was able to collect a useful dtrace >> output from the suggested dtrace stript: >> >> # dtrace -n 'dtmalloc::solaris:malloc {@allocs[stack(), args[3]] >> count()} dtmalloc::solaris:free {@frees[stack(), args[3]] = >> count()}' >> >> while running "zpool list" repeatedly in another terminal screen: > > I think running "zpool list" is adding a lot of noise to the output. > Could you retry without doing that?
Mark Johnston
2018-Aug-04 19:47 UTC
All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64
On Sat, Aug 04, 2018 at 08:38:04PM +0200, Mark Martinec wrote:> 2018-08-04 19:01, Mark Johnston wrote: > > I think running "zpool list" is adding a lot of noise to the output. > > Could you retry without doing that? > > No, like I said previously, the "zpool list" (with one defunct > zfs pool) *is* the sole culprit of the zfs memory leak. > With each invocation of "zpool list" the "solaris" malloc > jumps up by the same amount, and never ever drops. Without > running it (like repeatedly under 'telegraf' monitoring > of zfs), the machine runs normally and never runs out of > memory, the "solaris" malloc count no longer grows steadily.Sorry, I missed that message. Given that information, it would be useful to see the output of the following script instead: # dtrace -c "zpool list -Hp" -x temporal=off -n ' dtmalloc::solaris:malloc /pid == $target/{@allocs[stack(), args[3]] = count()} dtmalloc::solaris:free /pid == $target/{@frees[stack(), args[3]] = count();}' This will record all allocations and frees from a single instance of "zpool list".