Daniel Leaberry
2007-Aug-17 07:28 UTC
[Lustre-discuss] Sudden ost crashing appears to take > 100s to find free extents
I have an interesting problem. I''ve made no changes to the IB DDN storage yet I''m finding OST''s crashing left and right. The thread watchdog gets triggered and the most relevant part of the dump is the following. It appears that it took > 100s to find a free extent. On the OSS I watch with iostat as the lun is saturated with small read requests. We''ve just hit 80% full (we planned on going to 90% full) and we do have a lot of small files (~75 million ) Is there anyway to tune the extent searching code? Does my analysis seem likely? Is this fixed in 1.6.1 such that I should upgrade immediately? Thanks, Daniel Call Trace:<ffffffffa0024125>{:sd_mod:sd_iostats_bump+147} <ffffffffa031429a>{:ib_srp:srp_host_qcommand+399} <ffffffff80253ebf>{deadline_next_request+34} <ffffffff8024b329>{elv_next_request+238} <ffffffff80309843>{io_schedule+38} <ffffffff8017843c>{__wait_on_buffer+125} <ffffffff801782c2>{bh_wake_function+0} <ffffffff801782c2>{bh_wake_function+0} <ffffffffa05771d9>{:ldiskfs:ldiskfs_mb_init_cache+469} <ffffffff80157ba2>{add_to_page_cache+167} <ffffffffa0577792>{:ldiskfs:ldiskfs_mb_load_buddy+257} <ffffffffa057a89f>{:ldiskfs:ldiskfs_mb_new_blocks+1946} <ffffffffa05b480e>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+729} <ffffffffa0574362>{:ldiskfs:ldiskfs_ext_find_extent+205} <ffffffffa0575a69>{:ldiskfs:ldiskfs_ext_walk_space+535} <ffffffffa05b4535>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0} <ffffffffa05b4b56>{:fsfilt_ldiskfs:fsfilt_map_nblocks+236} <ffffffffa05b4d50>{:fsfilt_ldiskfs:fsfilt_ldiskfs_map_ext_inode_pages+457} <ffffffffa05d59dc>{:obdfilter:filter_direct_io+892} <ffffffffa05b36f2>{:fsfilt_ldiskfs:fsfilt_ldiskfs_brw_start+649} <ffffffffa05d6fb9>{:obdfilter:filter_commitrw_write+3494} <ffffffff80308ecd>{thread_return+0} <ffffffff80308f25>{thread_return+88} <ffffffffa0378bf2>{:lnet:lnet_send+2251} <ffffffffa05d100e>{:obdfilter:filter_commitrw+84} <ffffffff8013f23f>{del_timer+107} <ffffffff8013f2fc>{del_singleshot_timer_sync+9} <ffffffff803099f7>{schedule_timeout+375} <ffffffffa05a2c47>{:ost:ost_brw_write+5119} <ffffffff801331a5>{default_wake_function+0} <ffffffffa059f513>{:ost:ost_bulk_timeout+0} <ffffffffa043269f>{:ptlrpc:lustre_msg_get_version+64} <ffffffffa05a637e>{:ost:ost_handle+6987} <ffffffffa034cf41>{:libcfs:libcfs_debug_vmsg2+1713} <ffffffff801e9c83>{vsnprintf+1406} <ffffffff801e9d66>{snprintf+131} <ffffffffa043906b>{:ptlrpc:ptlrpc_server_handle_request+2336} <ffffffff8013f100>{__mod_timer+293} <ffffffffa043ad29>{:ptlrpc:ptlrpc_main+2018} <ffffffff801331a5>{default_wake_function+0} <ffffffffa0439a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0439a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110e23>{child_rip+8} <ffffffffa043a547>{:ptlrpc:ptlrpc_main+0} <ffffffff80110e1b>{child_rip+0} -- Daniel Leaberry Systems Administrator iArchives Tel: 801-494-6528 Cell: 801-376-6411
Andreas Dilger
2007-Aug-17 12:37 UTC
[Lustre-discuss] Sudden ost crashing appears to take > 100s to find free extents
On Aug 17, 2007 07:28 -0600, Daniel Leaberry wrote:> I have an interesting problem. I''ve made no changes to the IB DDN > storage yet I''m finding OST''s crashing left and right. The thread > watchdog gets triggered,Note that a watchdog thread stack dump is NOT a crash, but rather a debugging mechanism so we can see where the thread is stuck for such a long time. It should be able to continue working even after this happens.> Is there anyway to tune the extent searching code? Does my analysis seem > likely? Is this fixed in 1.6.1 such that I should upgrade immediately?You could increase the watchdog thread timeout (this is currently a compile time constant), but that won''t remove the fact that it is taking 100s to find free space. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Fegan, Joe
2007-Aug-20 07:19 UTC
[Lustre-discuss] Sudden ost crashing appears to take > 100s tofind free extents
Hi Andreas, When the watchdog fires (prints the stack of a thread) does this mean that the thread has been hogging the cpu for 100 seconds? Or that the thread has been *sleeping* for 100 seconds? Or something else? The stack Daniel posted shows io_schedule called from __wait_on_buffer which would be typical of a thread waiting for disk i/o completion. Call Trace:<ffffffffa0024125>{:sd_mod:sd_iostats_bump+147} <ffffffffa031429a>{:ib_srp:srp_host_qcommand+399} <ffffffff80253ebf>{deadline_next_request+34} <ffffffff8024b329>{elv_next_request+238} <ffffffff80309843>{io_schedule+38} <ffffffff8017843c>{__wait_on_buffer+125} <ffffffff801782c2>{bh_wake_function+0} <ffffffff801782c2>{bh_wake_function+0} <ffffffffa05771d9>{:ldiskfs:ldiskfs_mb_init_cache+469} <ffffffff80157ba2>{add_to_page_cache+167} <ffffffffa0577792>{:ldiskfs:ldiskfs_mb_load_buddy+257} <ffffffffa057a89f>{:ldiskfs:ldiskfs_mb_new_blocks+1946} <ffffffffa05b480e>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+729} <ffffffffa0574362>{:ldiskfs:ldiskfs_ext_find_extent+205} <ffffffffa0575a69>{:ldiskfs:ldiskfs_ext_walk_space+535} <ffffffffa05b4535>{:fsfilt_ldiskfs:ldiskfs_ext_new_extent_cb+0} <ffffffffa05b4b56>{:fsfilt_ldiskfs:fsfilt_map_nblocks+236} And yes ldiskfs_ext_new_extent_cb is higher up the stack, which means that in the bigger picture this thread is searching for free blocks, but did the watchdog appear because the thread has been searching for free blocks for more than 100 seconds or because *this specific i/o request* has been pending for more than 100 seconds? Joe. Postal Address: Hewlett Packard Galway Ltd., Ballybrit Business Park, Galway, Ireland Registered Office: 63-74 Sir John Rogerson''s Quay, Dublin 2, Ireland. Registered Number: 361933 The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP: unless otherwise stated you should consider this message and attachments as "HP CONFIDENTIAL". -----Original Message----- From: Andreas Dilger [mailto:adilger@clusterfs.com] Sent: 17 August 2007 19:38 To: Daniel Leaberry Cc: lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] Sudden ost crashing appears to take > 100s tofind free extents On Aug 17, 2007 07:28 -0600, Daniel Leaberry wrote:> I have an interesting problem. I''ve made no changes to the IB DDN > storage yet I''m finding OST''s crashing left and right. The thread > watchdog gets triggered,Note that a watchdog thread stack dump is NOT a crash, but rather a debugging mechanism so we can see where the thread is stuck for such a long time. It should be able to continue working even after this happens.> Is there anyway to tune the extent searching code? Does my analysisseem> likely? Is this fixed in 1.6.1 such that I should upgrade immediately?You could increase the watchdog thread timeout (this is currently a compile time constant), but that won''t remove the fact that it is taking 100s to find free space. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Daniel Leaberry
2007-Aug-20 08:15 UTC
[Lustre-discuss] Sudden ost crashing appears to take > 100s to find free extents
Thanks Andreas. We hit this problem at 79% full on each ost. After deleting we got the ost''s down to 77% full and the problem subsided. I haven''t found any information or rumors regarding full filesystem lustre performance but I know for our workload we''re setting a 75% hard limit on used space to avoid these issues. The biggest surprise for me was not that it slowed down (all filesystems get slower as they approach 100% full) but how sudden the wall seemed to be hit. Joe, I can''t definitively answer your question but I can tell you that what I saw on the luns was one io thread would dominate the lun for 100s. No other read/write requests would get through. This was with the deadline scheduler. We tried with cfq as well and the same behavior was exhibited. That indicates to me that the thread was *active* for 100 seconds. Daniel Andreas Dilger wrote:> On Aug 17, 2007 07:28 -0600, Daniel Leaberry wrote: > >> I have an interesting problem. I''ve made no changes to the IB DDN >> storage yet I''m finding OST''s crashing left and right. The thread >> watchdog gets triggered, >> > > Note that a watchdog thread stack dump is NOT a crash, but rather a > debugging mechanism so we can see where the thread is stuck for such > a long time. It should be able to continue working even after this > happens. > > >> Is there anyway to tune the extent searching code? Does my analysis seem >> likely? Is this fixed in 1.6.1 such that I should upgrade immediately? >> > > You could increase the watchdog thread timeout (this is currently a > compile time constant), but that won''t remove the fact that it is taking > 100s to find free space. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > >
Oleg Drokin
2007-Aug-20 08:39 UTC
[Lustre-discuss] Sudden ost crashing appears to take > 100s tofind free extents
Hello! On Mon, Aug 20, 2007 at 02:18:43PM +0100, Fegan, Joe wrote:> When the watchdog fires (prints the stack of a thread) does this mean > that the thread has been hogging the cpu for 100 seconds? Or that the > thread has been *sleeping* for 100 seconds? Or something else?It can be both. What it actually means is that between watchdog timer was primed and now process did not reprimed the timer for 100 seconds, but was it eatign cpu or sleeping or any combination of those states does not matter. Bye, Oleg