thr3ads.net - zfs discuss - [zfs-discuss] Re: Re: zfs/arc: system seems to be spinning in [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Jürgen Keil

2006-Mar-23 18:15 UTC

[zfs-discuss] Re: Re: zfs/arc: system seems to be spinning in

dnlc_reduce_cache(), bug
In-Reply-To: <20060323163121.GZ113291 at eng.sun.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-OpenSolaris-URL:
http://www.opensolaris.org/jive/message.jspa?messageID=29188&tstart=0#29188
> > The diff adds lots of entries to the kernel''s dnode_t
dmu_buf_impl_t
> > and zfs_znode_cache kmem_caches.
> >
> > And starting the nightly build needs more space in other kmem areas,
> > the ">75% heap full" condition is triggered, some
zio_bufs and
> > arc_buf* are returned to the heap arena. As a side effect dnode_t,
> > dmu_buf_impl_t and zfs_znode_cache entries are freed (entries are
> > intended to be reused in the same kmem_cache) but this free space is
> > *not* given back to the heap arena.
> 
> The problem is that there are 9 dmu_buf_impl_ts per "slab", and
17
> dnode''s per "slab";  a slab cannot be freed until *all*
of its nodes
> are free, and not cached in magazines.  I''ve heard this referred
to as
> a slab being "held hostage" by an outstanding allocation.
But I''ve observed cases where mdb "::kmastat" reported a
total of
1818 dmu_buf_impl_t in use, and 94290 dmu_buf_impl_t allocated.
The worst case would be that the 1818 entries use 1818 magazines
(with one entry still in use and 8 entries already freed).   So there must be
at least (94290/9 - 1818) magazines that are completely free and could
be returned to the heap_arena.

Same for the dnode_t and zfs_znode_cache.



The problem with an unmodified snv_37 on x86 32bit is that 
kmem_reap() might not be called at all, from 

  usr/src/uts/common/vm/vm_page.c
  usr/src/uts/common/vm/page_retire.c
  usr/src/uts/common/vm/seg_kmem.c
  usr/src/uts/common/os/vm_pageout.c

on systems that have too *much* physical memory installed.

Btw. a workaround to get the kernel out of the "poor nightly build
performance"
mode is to start a process that consumes all of free physical memory in the
system;
note the amount of free memory reported by "vmstat 5", and run a 
"dd if=/dev/zero of=/dev/null bs=${free_mem}k count=2" (where
${free_mem} is
slightly less than the the free memory reported by vmstat).

The vm code (usr/src/uts/common/vm/vm_page.c, page_resv()?
page_mem_avail() ? )  will detect a memory shortage and calls
kmem_reap(), which returns a significant amount of free memory
from kmem caches to the heap arena.

> kmem_reap() flushes the magazines to the slabs (except for a 15-second
> working set).  Usually, this frees up memory, but there aren''t any
> guarantees.
But arc_kmem_reap_now() already implements a "limited kmem_reap() "
for the zio_buf_cache[] caches, the buf_cache and the header cache, 
by calling kmem_cache_reap_now():

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/arc.c#1224

   1224 	for (i = 0; i < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT; i++) {
   1225 		if (zio_buf_cache[i] != prev_cache) {
   1226 			prev_cache = zio_buf_cache[i];
   1227 			kmem_cache_reap_now(zio_buf_cache[i]);
   1228 		}
   1229 	}
   1230 	kmem_cache_reap_now(buf_cache);
   1231 	kmem_cache_reap_now(hdr_cache);


Why not extend this to *all* kmem_caches that are in use in the system?
> It might make sense to invent a "desperate" reaping mode, which
kicks
> in only when the kmem_reap fails to free up any memory.  But this is still
> not guaranteed to free up more memory, so it might just slow things down
> more when we''re already under heavy load.This message posted from opensolaris.org

Jonathan Adams

2006-Mar-23 18:59 UTC

head link

[zfs-discuss] Re: Re: zfs/arc: system seems to be spinning in

On Thu, Mar 23, 2006 at 10:15:55AM -0800, J?rgen Keil
wrote:> 	dnlc_reduce_cache(), bug
> In-Reply-To: <20060323163121.GZ113291 at eng.sun.com>
> Mime-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 7bit
> X-OpenSolaris-URL:
http://www.opensolaris.org/jive/message.jspa?messageID=29188&tstart=0#29188
> 
> > > The diff adds lots of entries to the kernel''s dnode_t
dmu_buf_impl_t
> > > and zfs_znode_cache kmem_caches.
> > >
> > > And starting the nightly build needs more space in other kmem
areas,
> > > the ">75% heap full" condition is triggered, some
zio_bufs and
> > > arc_buf* are returned to the heap arena. As a side effect
dnode_t,
> > > dmu_buf_impl_t and zfs_znode_cache entries are freed (entries are
> > > intended to be reused in the same kmem_cache) but this free space
is
> > > *not* given back to the heap arena.
> > 
> > The problem is that there are 9 dmu_buf_impl_ts per "slab",
and 17
> > dnode''s per "slab";  a slab cannot be freed until
*all* of its nodes
> > are free, and not cached in magazines.  I''ve heard this
referred to as
> > a slab being "held hostage" by an outstanding allocation.
> 
> But I''ve observed cases where mdb "::kmastat" reported a
total of
> 1818 dmu_buf_impl_t in use, and 94290 dmu_buf_impl_t allocated.
> The worst case would be that the 1818 entries use 1818 magazines
> (with one entry still in use and 8 entries already freed).   So there must
be
> at least (94290/9 - 1818) magazines that are completely free and could
> be returned to the heap_arena.
> 
> Same for the dnode_t and zfs_znode_cache.
If you''ve got a dump, could you send the output of the following:
> ::kmastat ! grep dmu_bufdmu_buf_impl_t               408 188753 188757  85905408    295964     0
> ::kmem_cache ! grep dmu_buf         ffffffff8ebd2008 dmu_buf_impl_t            020f 000000      408  
188757> ffffffff8ebd2008::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0
    cache_full.ml_total = 0
    cache_full.ml_min = 0
    cache_full.ml_reaplimit = 0
    cache_full.ml_alloc = 0t5046
}
{
    cache_empty.ml_list = 0xfffffe839786edc0
    cache_empty.ml_total = 0t1015
    cache_empty.ml_min = 0t1015
    cache_empty.ml_reaplimit = 0t1015
    cache_empty.ml_alloc = 0t5049
}> 
along with the same for dnode_t and zfs_znode_cache, and the output of:
> kmem_reaping::print
(MIN(ml_min,ml_reaplimit) is the number of magazines which will be freed up
by a reap, ml_total is the total number of magazines in the cache.  cache_full
contains full magazines, which are the ones holding down buffers)

If kmem_reaping is 0, then no reap has occurred in the past 15 seconds.
> The problem with an unmodified snv_37 on x86 32bit is that 
> kmem_reap() might not be called at all, from 
> 
>   usr/src/uts/common/vm/vm_page.c
>   usr/src/uts/common/vm/page_retire.c
>   usr/src/uts/common/vm/seg_kmem.c
>   usr/src/uts/common/os/vm_pageout.c
> 
> on systems that have too *much* physical memory installed.
Are you saying this is because the low kernel-VA problem isn''t being
noticed?  Or because some failing codepath is missing a call to kmem_reap()?

Cheers,
- jonathan
> Btw. a workaround to get the kernel out of the "poor nightly build
performance"
> mode is to start a process that consumes all of free physical memory in the
system;
> note the amount of free memory reported by "vmstat 5", and run a 
> "dd if=/dev/zero of=/dev/null bs=${free_mem}k count=2" (where
${free_mem} is
> slightly less than the the free memory reported by vmstat).
> 
> The vm code (usr/src/uts/common/vm/vm_page.c, page_resv()?
> page_mem_avail() ? )  will detect a memory shortage and calls
> kmem_reap(), which returns a significant amount of free memory
> from kmem caches to the heap arena.
> 
> 
> > kmem_reap() flushes the magazines to the slabs (except for a 15-second
> > working set).  Usually, this frees up memory, but there
aren''t any
> > guarantees.
> 
> But arc_kmem_reap_now() already implements a "limited kmem_reap()
"
> for the zio_buf_cache[] caches, the buf_cache and the header cache, 
> by calling kmem_cache_reap_now():
> 
>
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/fs/zfs/arc.c#1224
> 
>    1224 	for (i = 0; i < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT;
i++) {
>    1225 		if (zio_buf_cache[i] != prev_cache) {
>    1226 			prev_cache = zio_buf_cache[i];
>    1227 			kmem_cache_reap_now(zio_buf_cache[i]);
>    1228 		}
>    1229 	}
>    1230 	kmem_cache_reap_now(buf_cache);
>    1231 	kmem_cache_reap_now(hdr_cache);
> 
> 
> Why not extend this to *all* kmem_caches that are in use in the system?
> 
> > It might make sense to invent a "desperate" reaping mode,
which kicks
> > in only when the kmem_reap fails to free up any memory.  But this is
still
> > not guaranteed to free up more memory, so it might just slow things
down
> > more when we''re already under heavy load.
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Jonathan Adams, Solaris Kernel Development

Jürgen Keil

2006-Mar-23 19:32 UTC

head link

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

tiger2 111# /usr/bin/savecore -L
dumping to /dev/dsk/c1d0s1, offset 65536, content: kernel
100% done: 71986 pages dumped, compression ratio 2.91, dump succeeded
System dump time: Thu Mar 23 20:18:02 2006
Constructing namelist /var/crash/tiger2/unix.4
Constructing corefile /var/crash/tiger2/vmcore.4
100% done: 71986 of 71986 pages saved
tiger2 112# cd /var/crash/tiger2/
tiger2 113# mdb -k 4
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp
usba uhci s1394 nca zfs random fcp fctl lofs audiosup md cpc fcip crypto
logindmux ptm sppp nfs ipc ]>
> ::kmastat ! grep dmu_bufmdb: slab e21d8860 isn''t in cache d06086f0 (in cache d060d6f0)
dmu_buf_impl_t               192   1605  69027  13463552   1068486    
0> ::kmem_cache ! grep dmu_bufd061e030 dmu_buf_impl_t            0000 000000      192   
69027> d061e030::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xe64ac500
    cache_full.ml_total = 0t618
    cache_full.ml_min = 0t612
    cache_full.ml_reaplimit = 0t615
    cache_full.ml_alloc = 0t77920
}
{
    cache_empty.ml_list = 0xe64ab100
    cache_empty.ml_total = 1
    cache_empty.ml_min = 0
    cache_empty.ml_reaplimit = 1
    cache_empty.ml_alloc = 0t84457
}> ::kmastat ! grep dnode_tmdb: slab e21d8860 isn''t in cache d06086f0 (in cache d060d6f0)
dnode_t                      388   1500  32890  13471744    372320    
0> ::kmem_cache ! grep dnode_td061e6f0 dnode_t                   0000 000000      388   
32890> d061e6f0::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xe483ce20
    cache_full.ml_total = 0t1083
    cache_full.ml_min = 0t1079
    cache_full.ml_reaplimit = 0t1081
    cache_full.ml_alloc = 0t85434
}
{
    cache_empty.ml_list = 0xe483da00
    cache_empty.ml_total = 1
    cache_empty.ml_min = 1
    cache_empty.ml_reaplimit = 1
    cache_empty.ml_alloc = 0t91537
}> ::kmastat ! grep zfs_znode_cachemdb: slab e21d8860 isn''t in cache d06086f0 (in cache d060d6f0)
zfs_znode_cache              104   1281 118560  12451840    306331    
0> ::kmem_cache ! grep zfs_znode_cached06206f0 zfs_znode_cache           0000 000000      104  
118560> d06206f0::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xda9ae020
    cache_full.ml_total = 0t16751
    cache_full.ml_min = 0t16747
    cache_full.ml_reaplimit = 0t16750
    cache_full.ml_alloc = 0t20763
}
{
    cache_empty.ml_list = 0xda9a4840
    cache_empty.ml_total = 1
    cache_empty.ml_min = 0
    cache_empty.ml_reaplimit = 0
    cache_empty.ml_alloc = 0t38543
}> kmem_reaping::print
0>
> dnlc_nentries::print -d0t23                                    
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<> arc::print -d{
    anon = ARC_anon
    mru = ARC_mru
    mru_ghost = ARC_mru_ghost
    mfu = ARC_mfu
    mfu_ghost = ARC_mfu_ghost
    size = 0t4038144                         
<<<<<<<<<<<<<<<<<<<<<<<<<<<
    p = 0t4495360
    c = 0t67108864
    c_min = 0t67108864
    c_max = 0t506456064
    hits = 0t8460648
    misses = 0t240886
    deleted = 0t208735
    skipped = 0t20
    hash_elements = 0t11151
    hash_elements_max = 0t26769
    hash_collisions = 0t179583
    hash_chains = 0t1502
    hash_chain_max = 7
    no_grow = 1
}>


At the same time, during the last 10 - 15 minutes, I also had this dtrace script
running,
it produced *no* output:

# more /rmdisk/flashdisk/dtrace/arc4.d
#!/usr/sbin/dtrace -s

fbt::kmem_reap:,
fbt::kmem_reap_idspace:
{
}
tiger2 112#  /rmdisk/flashdisk/dtrace/arc4.d
dtrace: script ''/rmdisk/flashdisk/dtrace/arc4.d'' matched 4
probes



# vmstat 10
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd f0 s0   in   sy   cs us sy id
 0 0 0 5172376 1284480 319 4341 10 0 0 0 5 14  1  0  0 1155 6155  795 23 12 66
 0 0 0 3292648 1420728 1  5  0  0  0  0  0  3  0  0  0  891  817  818  4  1 95
 0 0 0 3292804 1426056 0  0  0  0  0  0  0  0  0  0  0  940 1106  924  5  1 94
^C
This message posted from opensolaris.org

Jürgen Keil

2006-Mar-23 19:45 UTC

head link

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

> > The problem with an unmodified snv_37 on x86 32bit is that 
> > kmem_reap() might not be called at all, from 
> > 
> >   usr/src/uts/common/vm/vm_page.c
> >   usr/src/uts/common/vm/page_retire.c
> >   usr/src/uts/common/vm/seg_kmem.c
> >   usr/src/uts/common/os/vm_pageout.c
> > 
> > on systems that have too *much* physical memory installed.
> 
> Are you saying this is because the low kernel-VA problem isn''t
being
> noticed?  Or because some failing codepath is missing a call to
kmem_reap()?

Well, in usr/src/uts/common/vm/vm_page.c, function page_resv(), kmem_reap() is
called if
(availrmem < tune.t_minarmem + npages)
> availrmem::print -d
0t448298> tune::print -d t_minarmemt_minarmem = 0t25



I have to much free memory, so kmem_reap() isn''t called.


Or in page_mem_avail(),
if (availrmem < swapfs_reserve + npages && --count)
> availrmem::print -d
0t448267> swapfs_reserve::print -d0t4079


But it checks also checks this:

int
page_mem_avail(pgcnt_t npages)
{
        ulong_t count;

#if defined(__i386)
        if (freemem > desfree + npages &&
            availrmem > swapfs_reserve + npages &&
            btop(vmem_size(heap_arena, VMEM_FREE)) > tune.t_minarmem +
            npages)
                return (1);
#else

> freemem::print -d
0t352476> desfree::print -d
0t4079> availrmem::print -d
0t448327> swapfs_reserve::print -d
0t4079> tune::print -d t_minarmem
t_minarmem = 0t25>

Note that it doesn''t care about the heap unless free heap space is less
than 25*4KB, 100KB.
zfs/arc is testing against > 75% of used heap space (25% of 512 MB ~128MB)?

usr/src/uts/common/vm/seg_kmem.c, calls kmem_reap() when
large pages are used?  I''m probably not using large pages.


usr/src/uts/common/os/vm_pageout.c: schedpaging()  calls kmem_reap() when
(freemem < lotsfree + needfree + kmem_reapahead)
> freemem ::print -d
0t352773>  lotsfree ::print -d
0t8158> needfree::print -d
0> kmem_reapahead::print -d0
This message posted from opensolaris.org

Jürgen Keil

2006-Mar-23 20:19 UTC

head link

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

~ an hour later, after starting a nightly build, there are no big changes


> ::kmastat  !egrep ''dnode_t|zfs_znode_cache|dmu_buf_impl''dmu_buf_impl_t               192   2036  69027  13463552   1493017     0
dnode_t                      388   1955  32890  13471744    550720     0
zfs_znode_cache              104   1583 118560  12451840    481819    
0> ::kmem_cache !egrep
''dnode_t|zfs_znode_cache|dmu_buf_impl''d061e030 dmu_buf_impl_t            0000 000000      192    69027
d061e6f0 dnode_t                   0000 000000      388    32890
d06206f0 zfs_znode_cache           0000 000000      104  
118560> d061e030::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xe64a7c40
    cache_full.ml_total = 0t599
    cache_full.ml_min = 0t585
    cache_full.ml_reaplimit = 0t565
    cache_full.ml_alloc = 0t95201
}
{
    cache_empty.ml_list = 0xe64ace80
    cache_empty.ml_total = 0t20
    cache_empty.ml_min = 0t17
    cache_empty.ml_reaplimit = 0t19
    cache_empty.ml_alloc = 0t101719
}> d061e6f0::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xe483d140
    cache_full.ml_total = 0t1016
    cache_full.ml_min = 0t990
    cache_full.ml_reaplimit = 0t982
    cache_full.ml_alloc = 0t103087
}
{
    cache_empty.ml_list = 0xe48461e0
    cache_empty.ml_total = 0t68
    cache_empty.ml_min = 0t54
    cache_empty.ml_reaplimit = 0t58
    cache_empty.ml_alloc = 0t109123
}> d06206f0::print -d kmem_cache_t cache_full cache_empty{
    cache_full.ml_list = 0xecdbf580
    cache_full.ml_total = 0t16692
    cache_full.ml_min = 0t16683
    cache_full.ml_reaplimit = 0t16673
    cache_full.ml_alloc = 0t38205
}
{
    cache_empty.ml_list = 0xda9a50a0
    cache_empty.ml_total = 0t60
    cache_empty.ml_min = 0t29
    cache_empty.ml_reaplimit = 0t34
    cache_empty.ml_alloc = 0t55926
}> kmem_reaping::print0



Well ,the dtrace script has printed this:

tiger2 112#  /rmdisk/flashdisk/dtrace/arc4.d
dtrace: script ''/rmdisk/flashdisk/dtrace/arc4.d'' matched 4
probes
CPU     ID                    FUNCTION:NAME
  0  13952          kmem_reap_idspace:entry
  0  13953         kmem_reap_idspace:return


But still no kmem_reap() call.


vmstat performance data while compiling is horrible, note the idle column:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr cd cd f0 s0   in   sy   cs us sy id
 2 0 0 3177368 1329940 80 1343 0 0 0  0  0 86  0  0  0 2986 4203 7921 38 17 45
 0 0 0 3183468 1335836 24 409 0 0  0  0  0 90  0  0  0 3818 3076 10772 14 12 74
 0 0 0 3185744 1338072 44 689 0 0  0  0  0 84  0  0  0 4559 2712 11824 12 12 76
 0 0 0 3186336 1338620 23 442 0 0  0  0  0 89  0  0  0 3848 2758 12784 13 12 74
 0 0 0 3184832 1338664 57 883 0 0  0  0  0 95  0  0  0 3596 5758 11032 29 14 57
 2 0 0 3182396 1336872 26 477 0 0  0  0  0 90  0  0  0 4044 3956 10315 21 13 66
 0 0 0 3180804 1335520 20 302 0 0  0  0  0 89  0  0  0 4325 2262 11460 12 11 77
 0 0 0 3185996 1340152 125 1818 0 0 0 0  0 89  0  0  0 3672 3151 10929 13 14 72
 0 0 0 3181216 1335656 0 123 0  0  0  0  0 91  0  0  0 4097 1512 10723 8  9 82
 0 0 0 3180800 1336560 67 1070 0 0 0  0  0 89  0  0  0 3681 3367 11688 22 15 63
 0 0 0 3182056 1336252 9 176 0  0  0  0  0 99  0  0  0 3948 2565 11936 15 12 72
 0 0 0 3186884 1340984 45 787 0 0  0  0  0 87  0  0  0 4252 1798 10250 9 10 81
 0 0 0 3180516 1334976 34 548 0 0  0  0  0 98  0  0  0 4080 2002 10664 9 10 80
 0 0 0 3180716 1335328 32 433 0 0  0  0  0 90  0  0  0 4436 1450 10613 8  9 82
 0 0 0 3189224 1343052 39 577 0 0  0  0  0 84  0  0  0 4288 2767 11859 11 12 77
 1 0 0 3192568 1346768 63 1126 0 0 0  0  0 88  0  0  0 4225 2396 10767 9 12 78
This message posted from opensolaris.org

Jonathan Adams

2006-Mar-23 20:54 UTC

head link

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

On Thu, Mar 23, 2006 at 11:45:43AM -0800, J?rgen Keil
wrote:> > > The problem with an unmodified snv_37 on x86 32bit is that 
> > > kmem_reap() might not be called at all, from 
> > > 
> > >   usr/src/uts/common/vm/vm_page.c
> > >   usr/src/uts/common/vm/page_retire.c
> > >   usr/src/uts/common/vm/seg_kmem.c
> > >   usr/src/uts/common/os/vm_pageout.c
> > > 
> > > on systems that have too *much* physical memory installed.
> > 
> > Are you saying this is because the low kernel-VA problem
isn''t being
> > noticed?  Or because some failing codepath is missing a call to
kmem_reap()?
> 
> 
> Well, in usr/src/uts/common/vm/vm_page.c, function page_resv(),
> kmem_reap() is called if (availrmem < tune.t_minarmem + npages)
> 
> > availrmem::print -d
> 0t448298
> > tune::print -d t_minarmem
> t_minarmem = 0t25
> 
> 
> 
> I have to much free memory, so kmem_reap() isn''t called.
> 
> 
> Or in page_mem_avail(),
> if (availrmem < swapfs_reserve + npages && --count)
> 
> > availrmem::print -d
> 0t448267
> > swapfs_reserve::print -d
> 0t4079
> 
> 
> But it checks also checks this:
> 
> int
> page_mem_avail(pgcnt_t npages)
> {
>         ulong_t count;
> 
> #if defined(__i386)
>         if (freemem > desfree + npages &&
>             availrmem > swapfs_reserve + npages &&
>             btop(vmem_size(heap_arena, VMEM_FREE)) > tune.t_minarmem +
>             npages)
>                 return (1);
> #else
> 
> 
> > freemem::print -d
> 0t352476
> > desfree::print -d
> 0t4079
> > availrmem::print -d
> 0t448327
> > swapfs_reserve::print -d
> 0t4079
> > tune::print -d t_minarmem
> t_minarmem = 0t25
> >
> 
> 
> Note that it doesn''t care about the heap unless free heap space is
> less than 25*4KB, 100KB. zfs/arc is testing against > 75% of used heap
> space (25% of 512 MB ~128MB)?
Hrm.  If we ever *completely* run out of kernel VA, then uts/common/os/vmem.c
will call kmem_reap().  But the above code in page_mem_avail() is trying
to avoid ever reaching that condition.  The problem is that, instead of
calling kmem_reap() when VA space is low, it just fails the fork().

So I agree that we''re missing calls to kmem_reap() in the "low
VA" case.
There are a couple different places the call could go.

schedpaging() or page_mem_avail() seem like likely candidates.

Cheers,
- jonathan> 
> usr/src/uts/common/os/vm_pageout.c: schedpaging()  calls kmem_reap() when
> (freemem < lotsfree + needfree + kmem_reapahead)
> 
> > freemem ::print -d
> 0t352773
> >  lotsfree ::print -d
> 0t8158
> > needfree::print -d
> 0
> > kmem_reapahead::print -d
> 0
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Jonathan Adams, Solaris Kernel Development

zfs discuss - Mar 2006 - Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in

[zfs-discuss] Re: Re: Re: zfs/arc: system seems to be spinning in