Hello zfs-discuss, I did try something like: i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done Looks like it hung on ''zfs create test/32545'' pstack results: # pstack 3702 3702: zfs create test/32545 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) #> ::ps!grep zfsR 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs> 0000060002818028::walk thread|::findstackstack pointer for thread 30000e91960: 2a1004b0641 [ 000002a1004b0641 cv_wait+0x40() ] 000002a1004b06f1 vmem_xalloc+0x6ac() 000002a1004b0861 vmem_alloc+0x214() 000002a1004b0921 kmem_slab_create+0x44() 000002a1004b09f1 kmem_slab_alloc+0x3c() 000002a1004b0aa1 kmem_cache_alloc+0x148() 000002a1004b0b51 segkp_get_internal+0xf8() 000002a1004b0c71 segkp_cache_get+0xd4() 000002a1004b0d31 thread_create+0x40() 000002a1004b0df1 zfs_delete_thread_target+0xdc() 000002a1004b0eb1 zfs_mount+0x63c() 000002a1004b0fa1 domount+0x970() 000002a1004b1121 mount+0x110() 000002a1004b1221 syscall_ap+0x44() 000002a1004b12e1 syscall_trap32+0xcc()>bash-3.00# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c rypto nfs ]> ::memstatPage Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 342443 2675 66% Anon 13277 103 3% Exec and libs 1501 11 0% Page cache 495 3 0% Free (cachelist) 1717 13 0% Free (freelist) 160064 1250 31% Total 519497 4058 Physical 511381 3995>System behave strange - I can''t start any new program for 1-5 minutes, then it works ok and then again. System is v240 with snv_29. -- Best regards, Robert mailto:rmilkowski at task.gda.pl
Hi Robert, Robert Milkowski wrote:> I did try something like: > i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done > Looks like it hung on ''zfs create test/32545'' > pstack results: > # pstack 3702 > 3702: zfs create test/32545 > 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) > # >> ::ps!grep zfs > R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >> 0000060002818028::walk thread|::findstack > stack pointer for thread 30000e91960: 2a1004b0641 > [ 000002a1004b0641 cv_wait+0x40() ] > 000002a1004b06f1 vmem_xalloc+0x6ac() > 000002a1004b0861 vmem_alloc+0x214() > 000002a1004b0921 kmem_slab_create+0x44() > 000002a1004b09f1 kmem_slab_alloc+0x3c() > 000002a1004b0aa1 kmem_cache_alloc+0x148() > 000002a1004b0b51 segkp_get_internal+0xf8() > 000002a1004b0c71 segkp_cache_get+0xd4() > 000002a1004b0d31 thread_create+0x40() > 000002a1004b0df1 zfs_delete_thread_target+0xdc() > 000002a1004b0eb1 zfs_mount+0x63c() > 000002a1004b0fa1 domount+0x970() > 000002a1004b1121 mount+0x110() > 000002a1004b1221 syscall_ap+0x44() > 000002a1004b12e1 syscall_trap32+0xcc() > > bash-3.00# mdb -k > Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c > rypto nfs ] >> ::memstat > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 342443 2675 66%...> System behave strange - I can''t start any new program for 1-5 minutes, > then it works ok and then again.I think you''ve come up against one of several known issues with the way that zfs allocates and uses kernel memory. Have a look at this bug http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 for somewhere to get started. best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
Hello James, Friday, December 16, 2005, 11:12:56 AM, you wrote: JCM> Hi Robert, JCM> Robert Milkowski wrote:>> I did try something like: >> i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done >> Looks like it hung on ''zfs create test/32545'' >> pstack results: >> # pstack 3702 >> 3702: zfs create test/32545 >> 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) >> # >>> ::ps!grep zfs >> R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >>> 0000060002818028::walk thread|::findstack >> stack pointer for thread 30000e91960: 2a1004b0641 >> [ 000002a1004b0641 cv_wait+0x40() ] >> 000002a1004b06f1 vmem_xalloc+0x6ac() >> 000002a1004b0861 vmem_alloc+0x214() >> 000002a1004b0921 kmem_slab_create+0x44() >> 000002a1004b09f1 kmem_slab_alloc+0x3c() >> 000002a1004b0aa1 kmem_cache_alloc+0x148() >> 000002a1004b0b51 segkp_get_internal+0xf8() >> 000002a1004b0c71 segkp_cache_get+0xd4() >> 000002a1004b0d31 thread_create+0x40() >> 000002a1004b0df1 zfs_delete_thread_target+0xdc() >> 000002a1004b0eb1 zfs_mount+0x63c() >> 000002a1004b0fa1 domount+0x970() >> 000002a1004b1121 mount+0x110() >> 000002a1004b1221 syscall_ap+0x44() >> 000002a1004b12e1 syscall_trap32+0xcc() >> >> bash-3.00# mdb -k >> Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c >> rypto nfs ] >>> ::memstat >> Page Summary Pages MB %Tot >> ------------ ---------------- ---------------- ---- >> Kernel 342443 2675 66%JCM> ...>> System behave strange - I can''t start any new program for 1-5 minutes, >> then it works ok and then again.JCM> I think you''ve come up against one of several known issues with the JCM> way that zfs allocates and uses kernel memory. Have a look at this bug JCM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 JCM> for somewhere to get started. well, I''m not sure if it''s the case - as you can see there was over 1GB of free moemory in a system. Later I forced panic (issuing BUG request right now with a crashdump). I couldn''t also boot a system - I did go into single user and destroyed the pool. -- Best regards, Robert mailto:rmilkowski at task.gda.pl
Hello Robert, Friday, December 16, 2005, 11:40:51 AM, you wrote: RM> Hello James, RM> Friday, December 16, 2005, 11:12:56 AM, you wrote: JCM>> Hi Robert, JCM>> Robert Milkowski wrote:>>> I did try something like: >>> i=1; while [ $i -lt 50000 ]; do zfs create test/$i. i=$(($i+1));done >>> Looks like it hung on ''zfs create test/32545'' >>> pstack results: >>> # pstack 3702 >>> 3702: zfs create test/32545 >>> 00000000 ???????? (0, 0, 0, 0, 0, 0, 0) >>> # >>>> ::ps!grep zfs >>> R 3702 359 3702 327 0 0x4a004000 0000060002818028 zfs >>>> 0000060002818028::walk thread|::findstack >>> stack pointer for thread 30000e91960: 2a1004b0641 >>> [ 000002a1004b0641 cv_wait+0x40() ] >>> 000002a1004b06f1 vmem_xalloc+0x6ac() >>> 000002a1004b0861 vmem_alloc+0x214() >>> 000002a1004b0921 kmem_slab_create+0x44() >>> 000002a1004b09f1 kmem_slab_alloc+0x3c() >>> 000002a1004b0aa1 kmem_cache_alloc+0x148() >>> 000002a1004b0b51 segkp_get_internal+0xf8() >>> 000002a1004b0c71 segkp_cache_get+0xd4() >>> 000002a1004b0d31 thread_create+0x40() >>> 000002a1004b0df1 zfs_delete_thread_target+0xdc() >>> 000002a1004b0eb1 zfs_mount+0x63c() >>> 000002a1004b0fa1 domount+0x970() >>> 000002a1004b1121 mount+0x110() >>> 000002a1004b1221 syscall_ap+0x44() >>> 000002a1004b12e1 syscall_trap32+0xcc() >>> >>> bash-3.00# mdb -k >>> Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch ip sctp usba fcp fctl md zfs random logindmux ptm cpc fcip c >>> rypto nfs ] >>>> ::memstat >>> Page Summary Pages MB %Tot >>> ------------ ---------------- ---------------- ---- >>> Kernel 342443 2675 66%JCM>> ...>>> System behave strange - I can''t start any new program for 1-5 minutes, >>> then it works ok and then again.JCM>> I think you''ve come up against one of several known issues with the JCM>> way that zfs allocates and uses kernel memory. Have a look at this bug JCM>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 JCM>> for somewhere to get started. RM> well, I''m not sure if it''s the case - as you can see there was over RM> 1GB of free moemory in a system. Later I forced panic (issuing BUG RM> request right now with a crashdump). I couldn''t also boot a system - I RM> did go into single user and destroyed the pool. Additionally some of these bugs were fixed in snv_29 - and that is the release I''m using here. -- Best regards, Robert mailto:rmilkowski at task.gda.pl
Robert Milkowski wrote: .... > JCM = "James.McPherson at Sun.COM"> JCM>> I think you''ve come up against one of several known issues with the > JCM>> way that zfs allocates and uses kernel memory. Have a look at this bug > JCM>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6272062 > JCM>> for somewhere to get started. > RM> well, I''m not sure if it''s the case - as you can see there was over > RM> 1GB of free moemory in a system. Later I forced panic (issuing BUG > RM> request right now with a crashdump). I couldn''t also boot a system - I > RM> did go into single user and destroyed the pool. > Additionally some of these bugs were fixed in snv_29 - and that is the > release I''m using here.Hi Robert, yes, I''m aware of that -- I did suggest that 6272062 was merely somewhere to start. There are a number of bugs listed in the ''see also'' field in that particular link. The issue as I understand it is that the memory-usage algoritms can be somewhat aggressive, and some aspects of these are still being tuned. I don''t have a currently referable bugid for you (ping Eric/Matt/Jeff!) If you could make your crompressed crash dump available to Team ZFS then I''m sure they''ll be happy to have a look at it for you. [I probably owe them a beer or two now]. You could also have a look at the memory usage in the crash yourself, using ::kmastat and ::vmem The zfs-related kernel caches should stand out fairly well :) best regards, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems