Hello zfs-discuss,
I did try something like:
i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done
Looks like it hung on ''zfs create test/32545''
pstack results:
# pstack 3702
3702: zfs create test/32545
00000000 ???????? (0, 0, 0, 0, 0, 0, 0)
#
> ::ps!grep zfs
R 3702 359 3702 327 0 0x4a004000 0000060002818028
zfs> 0000060002818028::walk thread|::findstack
stack pointer for thread 30000e91960: 2a1004b0641
[ 000002a1004b0641 cv_wait+0x40() ]
000002a1004b06f1 vmem_xalloc+0x6ac()
000002a1004b0861 vmem_alloc+0x214()
000002a1004b0921 kmem_slab_create+0x44()
000002a1004b09f1 kmem_slab_alloc+0x3c()
000002a1004b0aa1 kmem_cache_alloc+0x148()
000002a1004b0b51 segkp_get_internal+0xf8()
000002a1004b0c71 segkp_cache_get+0xd4()
000002a1004b0d31 thread_create+0x40()
000002a1004b0df1 zfs_delete_thread_target+0xdc()
000002a1004b0eb1 zfs_mount+0x63c()
000002a1004b0fa1 domount+0x970()
000002a1004b1121 mount+0x110()
000002a1004b1221 syscall_ap+0x44()
000002a1004b12e1 syscall_trap32+0xcc()>
bash-3.00# mdb -k
Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba
fcp fctl md zfs random logindmux ptm cpc fcip c
rypto nfs ]> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 342443 2675 66%
Anon 13277 103 3%
Exec and libs 1501 11 0%
Page cache 495 3 0%
Free (cachelist) 1717 13 0%
Free (freelist) 160064 1250 31%
Total 519497 4058
Physical 511381 3995>
System behave strange - I can''t start any new program for 1-5 minutes,
then it works ok and then again.
System is v240 with snv_29.
--
Best regards,
Robert mailto:rmilkowski at task.gda.pl
Hi Robert, Robert Milkowski wrote:> I did try something like: > i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done > Looks like it hung on ''zfs create test/32545'' > pstack results: > # pstack 3702 > 3702: zfs create test/32545 > 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) > # >> ::ps!grep zfs > R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >> 0000060002818028::walk thread|::findstack > stack pointer for thread 30000e91960: 2a1004b0641 > [ 000002a1004b0641 cv_wait+0x40() ] > 000002a1004b06f1 vmem_xalloc+0x6ac() > 000002a1004b0861 vmem_alloc+0x214() > 000002a1004b0921 kmem_slab_create+0x44() > 000002a1004b09f1 kmem_slab_alloc+0x3c() > 000002a1004b0aa1 kmem_cache_alloc+0x148() > 000002a1004b0b51 segkp_get_internal+0xf8() > 000002a1004b0c71 segkp_cache_get+0xd4() > 000002a1004b0d31 thread_create+0x40() > 000002a1004b0df1 zfs_delete_thread_target+0xdc() > 000002a1004b0eb1 zfs_mount+0x63c() > 000002a1004b0fa1 domount+0x970() > 000002a1004b1121 mount+0x110() > 000002a1004b1221 syscall_ap+0x44() > 000002a1004b12e1 syscall_trap32+0xcc() > > bash-3.00# mdb -k > Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c > rypto nfs ] >> ::memstat > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 342443 2675 66%...> System behave strange - I can''t start any new program for 1-5 minutes, > then it works ok and then again.I think you''ve come up against one of several known issues with the way that zfs allocates and uses kernel memory. Have a look at this bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 for somewhere to get started. best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
Hello James, Friday, December 16, 2005, 11:12:56 AM, you wrote: JCM> Hi Robert, JCM> Robert Milkowski wrote:>> I did try something like: >> i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done >> Looks like it hung on ''zfs create test/32545'' >> pstack results: >> # pstack 3702 >> 3702: zfs create test/32545 >> 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) >> # >>> ::ps!grep zfs >> R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >>> 0000060002818028::walk thread|::findstack >> stack pointer for thread 30000e91960: 2a1004b0641 >> [ 000002a1004b0641 cv_wait+0x40() ] >> 000002a1004b06f1 vmem_xalloc+0x6ac() >> 000002a1004b0861 vmem_alloc+0x214() >> 000002a1004b0921 kmem_slab_create+0x44() >> 000002a1004b09f1 kmem_slab_alloc+0x3c() >> 000002a1004b0aa1 kmem_cache_alloc+0x148() >> 000002a1004b0b51 segkp_get_internal+0xf8() >> 000002a1004b0c71 segkp_cache_get+0xd4() >> 000002a1004b0d31 thread_create+0x40() >> 000002a1004b0df1 zfs_delete_thread_target+0xdc() >> 000002a1004b0eb1 zfs_mount+0x63c() >> 000002a1004b0fa1 domount+0x970() >> 000002a1004b1121 mount+0x110() >> 000002a1004b1221 syscall_ap+0x44() >> 000002a1004b12e1 syscall_trap32+0xcc() >> >> bash-3.00# mdb -k >> Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c >> rypto nfs ] >>> ::memstat >> Page Summary Pages MB %Tot >> ------------ ---------------- ---------------- ---- >> Kernel 342443 2675 66%JCM> ...>> System behave strange - I can''t start any new program for 1-5 minutes, >> then it works ok and then again.JCM> I think you''ve come up against one of several known issues with the JCM> way that zfs allocates and uses kernel memory. Have a look at this bug JCM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 JCM> for somewhere to get started. well, I''m not sure if it''s the case - as you can see there was over 1GB of free moemory in a system. Later I forced panic (issuing BUG request right now with a crashdump). I couldn''t also boot a system - I did go into single user and destroyed the pool. -- Best regards, Robert mailto:rmilkowski at task.gda.pl
Hello Robert, Friday, December 16, 2005, 11:40:51 AM, you wrote: RM> Hello James, RM> Friday, December 16, 2005, 11:12:56 AM, you wrote: JCM>> Hi Robert, JCM>> Robert Milkowski wrote:>>> I did try something like: >>> i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done >>> Looks like it hung on ''zfs create test/32545'' >>> pstack results: >>> # pstack 3702 >>> 3702: zfs create test/32545 >>> 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) >>> # >>>> ::ps!grep zfs >>> R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >>>> 0000060002818028::walk thread|::findstack >>> stack pointer for thread 30000e91960: 2a1004b0641 >>> [ 000002a1004b0641 cv_wait+0x40() ] >>> 000002a1004b06f1 vmem_xalloc+0x6ac() >>> 000002a1004b0861 vmem_alloc+0x214() >>> 000002a1004b0921 kmem_slab_create+0x44() >>> 000002a1004b09f1 kmem_slab_alloc+0x3c() >>> 000002a1004b0aa1 kmem_cache_alloc+0x148() >>> 000002a1004b0b51 segkp_get_internal+0xf8() >>> 000002a1004b0c71 segkp_cache_get+0xd4() >>> 000002a1004b0d31 thread_create+0x40() >>> 000002a1004b0df1 zfs_delete_thread_target+0xdc() >>> 000002a1004b0eb1 zfs_mount+0x63c() >>> 000002a1004b0fa1 domount+0x970() >>> 000002a1004b1121 mount+0x110() >>> 000002a1004b1221 syscall_ap+0x44() >>> 000002a1004b12e1 syscall_trap32+0xcc() >>> >>> bash-3.00# mdb -k >>> Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c >>> rypto nfs ] >>>> ::memstat >>> Page Summary Pages MB %Tot >>> ------------ ---------------- ---------------- ---- >>> Kernel 342443 2675 66%JCM>> ...>>> System behave strange - I can''t start any new program for 1-5 minutes, >>> then it works ok and then again.JCM>> I think you''ve come up against one of several known issues with the JCM>> way that zfs allocates and uses kernel memory. Have a look at this bug JCM>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 JCM>> for somewhere to get started. RM> well, I''m not sure if it''s the case - as you can see there was over RM> 1GB of free moemory in a system. Later I forced panic (issuing BUG RM> request right now with a crashdump). I couldn''t also boot a system - I RM> did go into single user and destroyed the pool. Additionally some of these bugs were fixed in snv_29 - and that is the release I''m using here. -- Best regards, Robert mailto:rmilkowski at task.gda.pl
Robert Milkowski wrote: .... > JCM = "James.McPherson at Sun.COM"> JCM>> I think you''ve come up against one of several known issues with the > JCM>> way that zfs allocates and uses kernel memory. Have a look at this bug > JCM>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 > JCM>> for somewhere to get started. > RM> well, I''m not sure if it''s the case - as you can see there was over > RM> 1GB of free moemory in a system. Later I forced panic (issuing BUG > RM> request right now with a crashdump). I couldn''t also boot a system - I > RM> did go into single user and destroyed the pool. > Additionally some of these bugs were fixed in snv_29 - and that is the > release I''m using here.Hi Robert, yes, I''m aware of that -- I did suggest that 6272062 was merely somewhere to start. There are a number of bugs listed in the ''see also'' field in that particular link. The issue as I understand it is that the memory-usage algoritms can be somewhat aggressive, and some aspects of these are still being tuned. I don''t have a currently referable bugid for you (ping Eric/Matt/Jeff!) If you could make your crompressed crash dump available to Team ZFS then I''m sure they''ll be happy to have a look at it for you. [I probably owe them a beer or two now]. You could also have a look at the memory usage in the crash yourself, using ::kmastat and ::vmem The zfs-related kernel caches should stand out fairly well :) best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems