Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
Thanks for a very detailed answer, see inline comments below On Friday 03 March 2006 18:39, Andreas Dilger wrote:> On Mar 03, 2006 12:13 +0100, Peter Kjellstr?m wrote: > > * I have OSTs larger than 2 TiB (the whole point with this test), I will > > soon rerun this with identical config and limited OSTs.This I have now done, the setup was identical but OSTs 1.98 TiB. Results were the same, one OSS went to pathetic performance after ~600 gigs written.> > However, I sent > > this directly since I don''t think it''s a 2 TiB problem, because: 1) it > > stopped after just about 200 gigs (per OST) 2) >2 TiB ext3 was tested > > before lustre. > > Even with < 200GB written, it is possible that the files are being written > beyond the 2TB limit. If you run "lfs getstripe" on the file that was > causing the slowness and find the objid for the slow OST, then on that > OST run ''OBJ={objid} debugfs -R "stat O/0/d$((OBJ % 32))/$OBJ" > /dev/{ostdev}'' this will dump the block allocation for that object.neat trick, I''ll save this one. I have however allready overwritten that fs with a new one (see above).> > > 1) A lustre process spins like crazy when the OSS stops: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 2996 root 15 0 0 0 0 R 99.9 0.0 13:44.89 ll_ost_21 > > It appears from the logs that the "slow" process is stuck allocating > blocks. The below trace is typical of the stuck processes (all of the other > ones are blocked waiting on an allocation semaphore that this one is > holding). > > ll_ost_00 R running task 0 3605 1 3606 3604 > (L-TLB) 000001000000d780 000001000000db10 0000000000000000 0000000000000246 > 0000000000000046 0000000000000246 0000010075f368c0 00000100610d0240 > 00000100044d9090 00000000ffffffff > Call Trace:<ffffffff80156c9c>{find_get_page+65} > <ffffffff80156c9c>{find_get_page+65} > <ffffffff8017721b>{__find_get_block_slow+62} > <ffffffff801779be>{__find_get_block+162} > <ffffffffa0048852>{:jbd:journal_get_undo_access+258} > <ffffffff80179d71>{__getblk+20} > <ffffffff80179dae>{__bread+6} > <ffffffffa0710d10>{:ldiskfs:read_block_bitmap+50} > <ffffffffa0711d8f>{:ldiskfs:ldiskfs_new_block_old+631} > <ffffffffa072ad63>{:ldiskfs:ldiskfs_new_block+27} > <ffffffffa07145fe>{:ldiskfs:ldiskfs_alloc_block+7} > <ffffffffa0716078>{:ldiskfs:ldiskfs_get_block_handle+881} > <ffffffffa0717f5f>{:ldiskfs:ldiskfs_map_inode_page+340} > > <ffffffffa07436cd>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_bm_inode_pages+101} > <ffffffffa0743975>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_inode_pages+122} > <ffffffffa077de17>{:obdfilter:filter_direct_io+1765} > <ffffffffa078018e>{:obdfilter:filter_commitrw_write+6367} > <ffffffffa06eb978>{:ost:obd_commitrw+1110} > <ffffffffa06f37c8>{:ost:ost_brw_write+14405} > <ffffffff80132599>{default_wake_function+0} > <ffffffffa06ebace>{:ost:ost_bulk_timeout+0} > <ffffffffa06fb75f>{:ost:ost_handle+25399} > <ffffffff80131379>{finish_task_switch+55} > <ffffffff80302918>{thread_return+42} > <ffffffffa0666963>{:ptlrpc:ptlrpc_main+7123} > > It would probably be useful to get an oprofile dumpI''ll try to get you one during the day. Thanks once again for your detailed answer, my next plan after ooprofile is to try 1.4.6. Regards, Peter> while the system is > so slow to see what is actually consuming so much CPU. The routines in > question are really part of ext3 itself so it is a bit puzzling why this > would show up under Lustre and not ext3. The thread is allocating 256 > blocks at one time, but this should not take 120s. It appears that > "find_get_page" is always at the top of the stack for the process that > is actually running, so this is suspicious. > > > 2) dmesg is filled with lines like: > > ll_ost_29 D 0000000102378fc4 0 3634 1 3635 3633 > > (L-TLB) > > and: > > LustreError: dumping log to /tmp/lustre-log-d9.1141280676.3634 > > This is lustre diagnostics dumping the stacks of processes that are > not responsive (i.e. taking > 100s to process a request). This allows > debugging, such as we are doing here, instead of "system is hung". > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc.-- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060306/93e73a8d/attachment.bin
Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Mar 03, 2006 12:13 +0100, Peter Kjellstr?m wrote:> * I have OSTs larger than 2 TiB (the whole point with this test), I will soon > rerun this with identical config and limited OSTs. However, I sent this > directly since I don''t think it''s a 2 TiB problem, because: 1) it stopped > after just about 200 gigs (per OST) 2) >2 TiB ext3 was tested before lustre.Even with < 200GB written, it is possible that the files are being written beyond the 2TB limit. If you run "lfs getstripe" on the file that was causing the slowness and find the objid for the slow OST, then on that OST run ''OBJ={objid} debugfs -R "stat O/0/d$((OBJ % 32))/$OBJ" /dev/{ostdev}'' this will dump the block allocation for that object.> 1) A lustre process spins like crazy when the OSS stops: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2996 root 15 0 0 0 0 R 99.9 0.0 13:44.89 ll_ost_21It appears from the logs that the "slow" process is stuck allocating blocks. The below trace is typical of the stuck processes (all of the other ones are blocked waiting on an allocation semaphore that this one is holding). ll_ost_00 R running task 0 3605 1 3606 3604 (L-TLB) 000001000000d780 000001000000db10 0000000000000000 0000000000000246 0000000000000046 0000000000000246 0000010075f368c0 00000100610d0240 00000100044d9090 00000000ffffffff Call Trace:<ffffffff80156c9c>{find_get_page+65} <ffffffff80156c9c>{find_get_page+65} <ffffffff8017721b>{__find_get_block_slow+62} <ffffffff801779be>{__find_get_block+162} <ffffffffa0048852>{:jbd:journal_get_undo_access+258} <ffffffff80179d71>{__getblk+20} <ffffffff80179dae>{__bread+6} <ffffffffa0710d10>{:ldiskfs:read_block_bitmap+50} <ffffffffa0711d8f>{:ldiskfs:ldiskfs_new_block_old+631} <ffffffffa072ad63>{:ldiskfs:ldiskfs_new_block+27} <ffffffffa07145fe>{:ldiskfs:ldiskfs_alloc_block+7} <ffffffffa0716078>{:ldiskfs:ldiskfs_get_block_handle+881} <ffffffffa0717f5f>{:ldiskfs:ldiskfs_map_inode_page+340} <ffffffffa07436cd>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_bm_inode_pages+101} <ffffffffa0743975>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_inode_pages+122} <ffffffffa077de17>{:obdfilter:filter_direct_io+1765} <ffffffffa078018e>{:obdfilter:filter_commitrw_write+6367} <ffffffffa06eb978>{:ost:obd_commitrw+1110} <ffffffffa06f37c8>{:ost:ost_brw_write+14405} <ffffffff80132599>{default_wake_function+0} <ffffffffa06ebace>{:ost:ost_bulk_timeout+0} <ffffffffa06fb75f>{:ost:ost_handle+25399} <ffffffff80131379>{finish_task_switch+55} <ffffffff80302918>{thread_return+42} <ffffffffa0666963>{:ptlrpc:ptlrpc_main+7123} It would probably be useful to get an oprofile dump while the system is so slow to see what is actually consuming so much CPU. The routines in question are really part of ext3 itself so it is a bit puzzling why this would show up under Lustre and not ext3. The thread is allocating 256 blocks at one time, but this should not take 120s. It appears that "find_get_page" is always at the top of the stack for the process that is actually running, so this is suspicious.> 2) dmesg is filled with lines like: > ll_ost_29 D 0000000102378fc4 0 3634 1 3635 3633 > (L-TLB) > and: > LustreError: dumping log to /tmp/lustre-log-d9.1141280676.3634This is lustre diagnostics dumping the stacks of processes that are not responsive (i.e. taking > 100s to process a request). This allows debugging, such as we are doing here, instead of "system is hung". Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
Hello Short description of problem: After successful creation of a new lustre filesystem on my testrig I started writing 100 gig files from the single configured client. After seven files one OSS dumped ugly stuff on dmesg and almost died (write speed now <<1 MiB/s). After restart the write process continued ok until file 35 then the other OSS did a similar thing. Things I did that could be part of the problem: * I have OSTs larger than 2 TiB (the whole point with this test), I will soon rerun this with identical config and limited OSTs. However, I sent this directly since I don''t think it''s a 2 TiB problem, because: 1) it stopped after just about 200 gigs (per OST) 2) >2 TiB ext3 was tested before lustre. * I have rebuilt both the lustre kernel and userspace+modules, I had to since otherwise I can''t build my 3ware driver (see thread "lustre kernel package is misbuilt for RHEL4/centos-4") Config: 1 client, 1 mds, 2 oss, all four machines with the same software os: centos-4.2 (rhel4u2 rebuild) x86_64 hw: dual xeon 3.2 gigabit eth only, 2-4 GiB ram per node oss1 has: (ost1) /dev/vg0/lv0 5.09 TiB striped over two 3ware 9500 cards oss2 has: (ost2) /dev/vg0/lv0 3.18 TiB single 3ware 9550-SX card (ost3) /dev/vg0/lv1 3.18 TiB single 3ware 9550-SX card kernel: direct rebuild of shipped 1.4.5.2 lustre kernel lustre modules+userspace: rebuilt against my kernel 1.4.5.2 lfs used to make files striped over all OSTs (see details) Details (in no perticular order): 1) A lustre process spins like crazy when the OSS stops: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2996 root 15 0 0 0 0 R 99.9 0.0 13:44.89 ll_ost_21 2) dmesg is filled with lines like: ll_ost_29 D 0000000102378fc4 0 3634 1 3635 3633 (L-TLB) and: LustreError: dumping log to /tmp/lustre-log-d9.1141280676.3634 3) full dmesg, lustre-log files (and for oss2 vmstat, top, config.xml) available as tar.gz files at: oss1/incident1: www.nsc.liu.se/~cap/lustre-error-2006-03-02-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix.tar.gz oss2/incident2: www.nsc.liu.se/~cap/lustre-error-2006-03-03-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix.tar.gz 4) lfs getstrip for the directory with the testfiles: OBDS: 0: ost1_UUID ACTIVE 1: ost2_UUID ACTIVE 2: ost3_UUID ACTIVE sob2/ default stripe_count: -1 stripe_size: 0 stripe_offset: -1 sob2/testfile.0 obdidx objid objid group 2 3 0x3 0 0 3 0x3 0 1 3 0x3 0 sob2/testfile.1 obdidx objid objid group 0 4 0x4 0 1 4 0x4 0 2 4 0x4 0 ... and so on for about 35 files Maybe this makes sense to somebody, Peter (who still hopes to be able to run lustre on his new servers and not nfs...) -- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060303/b7426d37/attachment.bin
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Friday 03 March 2006 18:39, Andreas Dilger wrote:> On Mar 03, 2006 12:13 +0100, Peter Kjellstr?m wrote: > ... > It would probably be useful to get an oprofile dump while the system is > so slow to see what is actually consuming so much CPU.here''s an oprofile dump with only kernel symbols (I''ll see if I can dig up lustre symbols too and jbd): CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % app name symbol name 59788 51.5640 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix mwait_idle 15213 13.1204 jbd (no symbols) 14579 12.5736 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix find_get_page 10001 8.6253 ldiskfs (no symbols) 5160 4.4502 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __find_get_block 2934 2.5304 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __find_get_block_slow 2798 2.4131 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __brelse 2288 1.9733 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix wake_up_buffer 987 0.8512 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix unlock_buffer 867 0.7477 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix put_page 398 0.3433 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __might_sleep 181 0.1561 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __getblk 131 0.1130 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __bread 94 0.0811 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix bh_waitq_head 76 0.0655 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix kfree 74 0.0638 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix mark_page_accessed 49 0.0423 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix buffered_rmqueue 27 0.0233 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix __kmalloc 22 0.0190 e1000 (no symbols) 22 0.0190 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix kmem_cache_alloc 18 0.0155 libcrypto.so.0.9.7a (no symbols) 18 0.0155 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix kmem_cache_free 17 0.0147 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix copy_user_generic 15 0.0129 oprofiled (no symbols) 13 0.0112 vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix cache_alloc_refill ... output from opreport -l after a 60 sec sampling window. /Peter> ...-- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060306/3539ba40/attachment.bin
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Monday 13 March 2006 18:49, Andreas Dilger wrote:> On Mar 13, 2006 08:35 +0100, Peter Kjellstr?m wrote: > > I''m gearing up for a bugreport to redhat and/or lkml and/or ext2-devel > > but I''ve yet to decide what to include and such... The two worst things > > right now is that 1) it''s not 100% reproducible, the best I have is 9 out > > of 10 on a specific setup 2) I can''t reproduce at all on a 2.6.15.6 > > kernel.org. (I''ve been running 2.6.9-22smp, 2.6.9-22.0.2smp with and > > without lustre patches, all on x86_x64) > > This sounds just like a bug previously filed against the RHEL kernel, > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156437It does indeed, too bad I didn''t find that bgz when I did my initial trawl :-/ I''ll verify it asap and report back. /Peter> > That was fixed in the just-released 2.6.9-34 kernel I now see, with patches > backported from the vanilla kernel. As a workaround you can also mount the > filesystem with the "noreservation" option. > > > Some things I now know: > > * I trigger the bug by writing lots of data to one directory (typically > > as many 100G files as I can fit) > > * the bug can bite after as little as 38G written > > * when the bug hits performance is reduced with ~ a factor of 1000 > > * when in bug mode the oprofile allways looks as previously reported > > * the bug seems to ignore file boundaries but it''s hard to tell > > * when in bug mode there''s something wrong with block allocations, they > > seem to take a loooong time > > * when in bug mode if you write a 2nd file in that dir then the two files > > takes every 2nd block on disk (block rsv very broken) > > * when in bug mode if you write a 2nd file in another dir it runs > > normally * if you are patient the bug disappears, that is, there is a > > limited part of the fs that is "cursed" > > * if you remove the files and try again the bug hits at the same block > > * if you remove the files, reboot and try again the bug hits at the same > > block * if you recreate the fs the bug hits at another place > > * depending on where the bug hits you get different performance (I''ve > > seen everything from 50 KiB/s to 500 KiB/s) > > * there is never any file corruption, never any problems reported to > > dmesg > > Cheers, Andreas-- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060313/510fba46/attachment.bin
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Monday 13 March 2006 19:22, Peter Kjellstr?m wrote:> On Monday 13 March 2006 18:49, Andreas Dilger wrote: > > On Mar 13, 2006 08:35 +0100, Peter Kjellstr?m wrote: > > > I''m gearing up for a bugreport to redhat and/or lkml and/or ext2-devel > > > but I''ve yet to decide what to include and such... The two worst things > > > right now is that 1) it''s not 100% reproducible, the best I have is 9 > > > out of 10 on a specific setup 2) I can''t reproduce at all on a 2.6.15.6 > > > kernel.org. (I''ve been running 2.6.9-22smp, 2.6.9-22.0.2smp with and > > > without lustre patches, all on x86_x64) > > > > This sounds just like a bug previously filed against the RHEL kernel, > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156437 > > It does indeed, too bad I didn''t find that bgz when I did my initial trawl > :-/ I''ll verify it asap and report back.It has been verified that this bug is what makes lustre unusable on these machines. The followup on this is of course: * when will there be a 1.4.x release for/with the 2.6.9-34 (update3) kernel? or * how does one tell lustre to use -o noreservation? /Peter> > /Peter > > > That was fixed in the just-released 2.6.9-34 kernel I now see, with > > patches backported from the vanilla kernel. As a workaround you can also > > mount the filesystem with the "noreservation" option. > >-- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060320/17100a67/attachment.bin
Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Mar 20, 2006 14:42 +0100, Peter Kjellstr?m wrote:> * when will there be a 1.4.x release for/with the 2.6.9-34 (update3) kernel?This is already in progress, and will be released shortly.> * how does one tell lustre to use -o noreservation?lmc --add ost ... --mountfsoptions "noreservation" ... Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Mar 13, 2006 08:35 +0100, Peter Kjellstr?m wrote:> I''m gearing up for a bugreport to redhat and/or lkml and/or ext2-devel but > I''ve yet to decide what to include and such... The two worst things right now > is that 1) it''s not 100% reproducible, the best I have is 9 out of 10 on a > specific setup 2) I can''t reproduce at all on a 2.6.15.6 kernel.org. (I''ve > been running 2.6.9-22smp, 2.6.9-22.0.2smp with and without lustre patches, > all on x86_x64)This sounds just like a bug previously filed against the RHEL kernel, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=156437 That was fixed in the just-released 2.6.9-34 kernel I now see, with patches backported from the vanilla kernel. As a workaround you can also mount the filesystem with the "noreservation" option.> Some things I now know: > * I trigger the bug by writing lots of data to one directory (typically as > many 100G files as I can fit) > * the bug can bite after as little as 38G written > * when the bug hits performance is reduced with ~ a factor of 1000 > * when in bug mode the oprofile allways looks as previously reported > * the bug seems to ignore file boundaries but it''s hard to tell > * when in bug mode there''s something wrong with block allocations, they seem > to take a loooong time > * when in bug mode if you write a 2nd file in that dir then the two files > takes every 2nd block on disk (block rsv very broken) > * when in bug mode if you write a 2nd file in another dir it runs normally > * if you are patient the bug disappears, that is, there is a limited part of > the fs that is "cursed" > * if you remove the files and try again the bug hits at the same block > * if you remove the files, reboot and try again the bug hits at the same block > * if you recreate the fs the bug hits at another place > * depending on where the bug hits you get different performance (I''ve seen > everything from 50 KiB/s to 500 KiB/s) > * there is never any file corruption, never any problems reported to dmesgCheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
On Monday 06 March 2006 14:05, Peter Kjellstr?m wrote:> On Friday 03 March 2006 18:39, Andreas Dilger wrote: > > On Mar 03, 2006 12:13 +0100, Peter Kjellstr?m wrote: > > ... > > It would probably be useful to get an oprofile dump while the system is > > so slow to see what is actually consuming so much CPU. > > here''s an oprofile dump with only kernel symbols (I''ll see if I can dig up > lustre symbols too and jbd):I did manage to get debugsymbols right for modules too. comment1: 50% idle is expected since it''s an smp system comment2: I cut the output when it went below 0.01% comment3: 60 sec sample window, report by "opreport -l -p /lib/mod..." comment4: vmlinux-lustre is edited form for: vmlinux-2.6.9-22.0.2.EL_lustre.1.4.5.2_cfix CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name app name symbol name 59759 49.8395 vmlinux-lustre vmlinux-lustre mwait_idle 15347 12.7995 vmlinux-lustre vmlinux-lustre find_get_page 5638 4.7021 vmlinux-lustre vmlinux-lustre __find_get_block 5379 4.4861 ldiskfs.ko ldiskfs ldiskfs_try_to_allocate_with_rsv 4423 3.6888 jbd.ko jbd do_get_write_access 4170 3.4778 ldiskfs.ko ldiskfs ldiskfs_get_group_desc 3972 3.3127 jbd.ko jbd journal_add_journal_head 3292 2.7456 vmlinux-lustre vmlinux-lustre __find_get_block_slow 3037 2.5329 jbd.ko jbd journal_get_undo_access 2894 2.4136 vmlinux-lustre vmlinux-lustre __brelse 2497 2.0825 jbd.ko jbd journal_put_journal_head 2439 2.0341 vmlinux-lustre vmlinux-lustre wake_up_buffer 2380 1.9849 jbd.ko jbd journal_cancel_revoke 1141 0.9516 vmlinux-lustre vmlinux-lustre unlock_buffer 890 0.7423 vmlinux-lustre vmlinux-lustre put_page 551 0.4595 ldiskfs.ko ldiskfs ldiskfs_new_block_old 423 0.3528 vmlinux-lustre vmlinux-lustre __might_sleep 305 0.2544 ldiskfs.ko ldiskfs goal_in_my_reservation 220 0.1835 ldiskfs.ko ldiskfs read_block_bitmap 184 0.1535 vmlinux-lustre vmlinux-lustre __getblk 150 0.1251 vmlinux-lustre vmlinux-lustre __bread 101 0.0842 vmlinux-lustre vmlinux-lustre bh_waitq_head 80 0.0667 vmlinux-lustre vmlinux-lustre kfree 75 0.0626 vmlinux-lustre vmlinux-lustre mark_page_accessed 50 0.0417 ldiskfs.ko ldiskfs ldiskfs_group_sparse 33 0.0275 vmlinux-lustre vmlinux-lustre __kmalloc 33 0.0275 vmlinux-lustre vmlinux-lustre buffered_rmqueue 26 0.0217 jbd.ko jbd journal_commit_transaction 24 0.0200 vmlinux-lustre vmlinux-lustre kmem_cache_free 23 0.0192 jbd.ko jbd __journal_file_buffer 22 0.0183 vmlinux-lustre vmlinux-lustre copy_user_generic 21 0.0175 jbd.ko jbd journal_release_buffer 21 0.0175 vmlinux-lustre vmlinux-lustre cache_alloc_refill 18 0.0150 vmlinux-lustre vmlinux-lustre kmem_cache_alloc 17 0.0142 vmlinux-lustre vmlinux-lustre kmem_getpages 14 0.0117 jbd.ko jbd journal_refile_buffer 13 0.0108 jbd.ko jbd __journal_remove_journal_head 13 0.0108 jbd.ko jbd __journal_unfile_buffer /Peter -- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060306/05cd1953/attachment.bin
Peter Kjellström
2006-May-19 07:36 UTC
[Lustre-discuss] OSS hangs (almost) after a few hours
Thought it was time for a status update here. I''ve been hard at work trying to figure out when I can and when I can''t reproduce this. So far I''ve atleast been able to reproduce it with devices <2.0 TiB and without lustre (on plain ext3). I''m gearing up for a bugreport to redhat and/or lkml and/or ext2-devel but I''ve yet to decide what to include and such... The two worst things right now is that 1) it''s not 100% reproducible, the best I have is 9 out of 10 on a specific setup 2) I can''t reproduce at all on a 2.6.15.6 kernel.org. (I''ve been running 2.6.9-22smp, 2.6.9-22.0.2smp with and without lustre patches, all on x86_x64) If you have any recommendations on where to send it and what to include I''d be grateful. Some things I now know: * I trigger the bug by writing lots of data to one directory (typically as many 100G files as I can fit) * the bug can bite after as little as 38G written * when the bug hits performance is reduced with ~ a factor of 1000 * when in bug mode the oprofile allways looks as previously reported * the bug seems to ignore file boundaries but it''s hard to tell * when in bug mode there''s something wrong with block allocations, they seem to take a loooong time * when in bug mode if you write a 2nd file in that dir then the two files takes every 2nd block on disk (block rsv very broken) * when in bug mode if you write a 2nd file in another dir it runs normally * if you are patient the bug disappears, that is, there is a limited part of the fs that is "cursed" * if you remove the files and try again the bug hits at the same block * if you remove the files, reboot and try again the bug hits at the same block * if you recreate the fs the bug hits at another place * depending on where the bug hits you get different performance (I''ve seen everything from 50 KiB/s to 500 KiB/s) * there is never any file corruption, never any problems reported to dmesg I also have lots of data from lots of experiments (vmstat, debugfs output, oprofile, timelines, etc.) Best Regards, Peter -- ------------------------------------------------------------ Peter Kjellstr?m | National Supercomputer Centre | Sweden | http://www.nsc.liu.se -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 191 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060313/9e6debd3/attachment.bin