I''m new to Lustre and am investigating what would be required for production. One of the things that we would require would be backup of the MDS and OSS data. For the OSSs, from what I can see, this can be achieved on a per-volume basis using LVM and snapshotting. However, I feel uncomfortable with this approach and would prefer something where I can restore an individual file from backup if lost. Is there another way other than find with mtime to find the list of modified files in the file system ? For the MDSs, I cannot see how to back up an online MDS. If I do a split mirror without quiesce and backup the ext3 file system of the split-off mirror, I get an inconsistent state of database. I''ve seen various references to Amanda based systems but I''ve not been able to find the code that implements it so I could evaluate implementing the same thing with our TSM system. Tim
On Fri, 2008-10-31 at 19:38 +0100, Tim Bell wrote:> > For the OSSs, from what I can see, this can be achieved on a per-volume > basis using LVM and snapshotting.Yes.> However, I feel uncomfortable with > this approach and would prefer something where I can restore an > individual file from backup if lost.Then you need to do filesystem backups, not device backups. You do filesystem backups from a lustre client rather than on individual server nodes. This is no different than how you would back up any other filesystem. I believe there is coverage in the manual about filesystem level backups. You need to pay attention to being able to backup EAs used for striping and so on if you wish to restore files with their striping attributes intact.> Is there another way other than > find with mtime to find the list of modified files in the file system ?Some tools, such as gnutar have a "find all files newer than" feature built into them.> For the MDSs, I cannot see how to back up an online MDS. If I do a > split mirror without quiesce and backup the ext3 file system of the > split-off mirror, I get an inconsistent state of database.Right. The closest you can come is to synchronize as best you can taking the LVM snapshot of the MDT and all of the OSTs and then backing each of the snapshots up at your leisure.> I''ve seen various references to Amanda based systems but I''ve not been > able to find the code that implements it so I could evaluate > implementing the same thing with our TSM system.Indeed, there was an announcement here a while ago from zamanda about supporting Lustre but I have not seen nor heard anything since. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081031/56ac67e2/attachment.bin
Steden Klaus
2008-Oct-31 20:02 UTC
[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
Hi folks, Our Lustre started exhibiting some curious performance issues today ... basically, it slowed down dramatically and reliable I/O performance became impossible. I looked through the output of dmesg and saw a number of kernel ''oops'' messages, but not being a Lustre kernel expert, I''m not exactly sure what they indicate. I stopped the OSTs on the node in question and ran e2fsck on the OST drives, but they''ve come up clean so I don''t think it''s a hardware problem. I don''t have physical access to the machine right now so it may in fact be something on the back end, but I''m working on verifying that with a technician on site. In the meantime ... can anyone help decipher this for me? There are a couple of messages like it: -- cut -- ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB) 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0 0000000000000013 0000000000000000 Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932} <ffffffffa054227d>{:ptlrpc:l_has_lock+77} <ffffffffa056f76c>{:ptlrpc:ldlm_add_waiting_lock+2156} <ffffffff80137be0>{default_wake_function+0} <ffffffffa0592620>{:ptlrpc:expired_request+0} <ffffffffa05926e0>{:ptlrpc:interrupted_request+0} <ffffffffa0574ae8>{:ptlrpc:ldlm_server_blocking_ast+4072} <ffffffffa0547f5a>{:ptlrpc:ldlm_run_ast_work+234} <ffffffffa05421f8>{:ptlrpc:l_unlock+248} <ffffffffa055ac73>{:ptlrpc:ldlm_process_extent_lock+995} <ffffffffa054bb4f>{:ptlrpc:ldlm_lock_enqueue+1087} <ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0} <ffffffffa0575b50>{:ptlrpc:ldlm_server_glimpse_ast+0} <ffffffffa0578694>{:ptlrpc:ldlm_handle_enqueue+8836} <ffffffffa0677485>{:obdfilter:filter_fmd_expire+53} <ffffffffa0611dc0>{:kviblnd:kibnal_cq_callback+0} <ffffffffa0573b00>{:ptlrpc:ldlm_server_blocking_ast+0} <ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0} <ffffffffa062cbb2>{:ost:ost_handle+20738} <ffffffff801f78d9>{number+233} <ffffffff801f7f09>{vsnprintf+1321} <ffffffffa048e022>{:libcfs:libcfs_debug_msg+1554} <ffffffffa05a593a>{:ptlrpc:ptlrpc_server_handle_request+3066} <ffffffff80116d35>{do_gettimeofday+101} <ffffffffa049014e>{:libcfs:lcw_update_time+30} <ffffffff801454bd>{__mod_timer+317} <ffffffffa05a6d17>{:ptlrpc:ptlrpc_main+2375} <ffffffff80137be0>{default_wake_function+0} <ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff801114ab>{child_rip+8} <ffffffffa05a63d0>{:ptlrpc:ptlrpc_main+0} <ffffffff801114a3>{child_rip+0} -- cut -- I can include debug logs as well, if needed. Any help is greatly appreciated. cheers, Klaus
Andreas Dilger
2008-Nov-03 17:47 UTC
[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
On Oct 31, 2008 13:02 -0700, Steden Klaus wrote:> Our Lustre started exhibiting some curious performance issues today > ... basically, it slowed down dramatically and reliable I/O performance > became impossible. I looked through the output of dmesg and saw a number > of kernel ''oops'' messages, but not being a Lustre kernel expert, I''m > not exactly sure what they indicate. I stopped the OSTs on the node in > question and ran e2fsck on the OST drives, but they''ve come up clean so > I don''t think it''s a hardware problem. I don''t have physical access to > the machine right now so it may in fact be something on the back end, > but I''m working on verifying that with a technician on site. In the > meantime ... can anyone help decipher this for me? There are a couple > of messages like it:These kind of messages are of relatively little use unless they include some of the preceding lines. Are you sure this is an oops, and not a watchdog timeout that is dumping the stack?> -- cut -- > ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB) > 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8 > 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0 > 0000000000000013 0000000000000000 > Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} > <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} > <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932}This looks like a network problem, but hard to say without more information. If you are a supported customer, you will have better service by filing a bugzilla bug. This list only gets "as available" replies and that is doubly true for old 1.4 Lustre installations. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Steden Klaus
2008-Nov-03 19:39 UTC
[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
Hello Andreas, Thanks for the info. Here''s some more information (basically cut and pasted from before and after a typical stack dump message): -- cut -- LustreError: 20438:0:(service.c:648:ptlrpc_server_handle_request()) request 4777659 opc 101 from 12345-10.0.0.249 at vib processed in 118s trans 0 rc 0/0 Lustre: 20438:0:(watchdog.c:302:lcw_update_time()) Expired watchdog for pid 20438 disabled after 118.9719s LustreError: 20257:0:(ldlm_lockd.c:579:ldlm_server_completion_ast()) ### enqueue wait took 118984007us from 1225493396 ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85be5c78b145 lrc: 2/0,0 mode: PW/PW res: 15229751/0 rrc: 3 type: EXT [0->18446744073709551615] (req 8482816->8503295) flags: 20 remote: 0x84d9c08a99ff4b23 expref: 113 pid: 20438 LustreError: 20500:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225493514, 20s ago) req at 00000100cb900e00 x1151/t0 o104->@NET_0xb00000a0000fa_UUID:15 lens 176/64 ref 2 fl Rpc:/0/0 rc 0/0 LustreError: A client on nid 10.0.0.250 at vib was evicted from service ost5. LustreError: 20500:0:(ldlm_lockd.c:423:ldlm_failed_ast()) ### blocking AST failed (-110): evicting client 36604_tiger_lov_7a82cb6234 at NET_0xb00000a0000fa_UUID NID 10.0.0.250 at vib (10.0.0.250 at vib) ns: filter-ost5_UUID lock: 000001008f840c 40/0x7bef85be5c78c80c lrc: 2/0,0 mode: PW/PW res: 15158480/0 rrc: 57 type: EXT [4651483136->4802478079] (req 4651483136->4654473215) flags: 20 remote: 0x789bd2e5e1ada437 expref: 120 pid: 20579 Lustre: 20107:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from 12345-10.0.0.250 at vib portal 16 match 1151 offset 0 length 64: 2 LustreError: 20269:0:(ldlm_lockd.c:1417:ldlm_cancel_handler()) operation 103 from 12345-10.0.0.250 at vib with bad export cookie 8930503638740276498 LustreError: 20580:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req at 0000010060cc0000 x7055836/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0 LustreError: 0:0:(ldlm_lockd.c:205:waiting_locks_callback()) ### lock callback timer expired: evicting client 7cb21_tiger_lov_7afb88fa94 at NET_0xb00000a0000f9_UUID nid 10.0.0.249 at vib ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85 be5c78b145 lrc: 1/0,0 mode: PW/PW res: 15229751/0 rrc: 2 type: EXT [0->18446744073709551615] (req 8482816->8503295) flags: 20 remote: 0x84d9c08a99ff4b23 expref: 117 pid: 20438 LustreError: 20452:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req at 0000010064e22200 x4778674/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0 Lustre: 0:0:(watchdog.c:121:lcw_cb()) Watchdog triggered for pid 20559: it was inactive for 100000ms Lustre: 0:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing stack for process 20559 ll_ost_220 S 000001006e854008 0 20559 1 20560 20558 (L-TLB) 000001009739f3e8 0000000000000046 000000000000000f ffffffffa05b33b8 0000000000000548 0000000100000000 0000000000000000 000001006e8540b0 0000000000000013 0000000000000000 Call Trace:<ffffffffa05b33b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} <ffffffffa05ade84>{:ptlrpc:ptlrpc_queue_wait+6932} <ffffffffa055927d>{:ptlrpc:l_has_lock+77} <ffffffffa058676c>{:ptlrpc:ldlm_add_waiting_lock+2156} <ffffffff80137be0>{default_wake_function+0} <ffffffffa05a9620>{:ptlrpc:expired_request+0} <ffffffffa05a96e0>{:ptlrpc:interrupted_request+0} <ffffffffa058bae8>{:ptlrpc:ldlm_server_blocking_ast+4072} <ffffffffa055ef5a>{:ptlrpc:ldlm_run_ast_work+234} <ffffffffa05591f8>{:ptlrpc:l_unlock+248} <ffffffffa0571c73>{:ptlrpc:ldlm_process_extent_lock+995} <ffffffffa0562b4f>{:ptlrpc:ldlm_lock_enqueue+1087} <ffffffffa058bbb0>{:ptlrpc:ldlm_server_completion_ast+0} <ffffffffa058cb50>{:ptlrpc:ldlm_server_glimpse_ast+0} <ffffffffa058f694>{:ptlrpc:ldlm_handle_enqueue+8836} <ffffffffa068c485>{:obdfilter:filter_fmd_expire+53} <ffffffffa058ab00>{:ptlrpc:ldlm_server_blocking_ast+0} <ffffffffa058bbb0>{:ptlrpc:ldlm_server_completion_ast+0} <ffffffffa0641bb2>{:ost:ost_handle+20738} <ffffffff801f787e>{number+142} <ffffffff801f7f09>{vsnprintf+1321} <ffffffffa049f022>{:libcfs:libcfs_debug_msg+1554} <ffffffffa05bc93a>{:ptlrpc:ptlrpc_server_handle_request+3066} <ffffffff80116d35>{do_gettimeofday+101} <ffffffffa04a114e>{:libcfs:lcw_update_time+30} <ffffffff801454bd>{__mod_timer+317} <ffffffffa05bdd17>{:ptlrpc:ptlrpc_main+2375} <ffffffff80137be0>{default_wake_function+0} <ffffffffa05bd3c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa05bd3c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff801114ab>{child_rip+8} <ffffffffa05bd3d0>{:ptlrpc:ptlrpc_main+0} <ffffffff801114a3>{child_rip+0} LustreError: dumping log to /tmp/lustre-log-tiger-oss-0-0.local.1225493885.20559 LustreError: 20526:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225493865, 20s ago) req at 000001001986c000 x1346/t0 o106->@NET_0xb00000a0000ed_UUID:15 lens 176/112 ref 2 fl Rpc:/0/0 rc 0/0 LustreError: A client on nid 10.0.0.237 at vib was evicted from service ost1. LustreError: 20526:0:(ldlm_lockd.c:423:ldlm_failed_ast()) ### glimpse AST failed (-110): evicting client 82ed7_tiger_lov_eea68aa0e0 at NET_0xb00000a0000ed_UUID NID 10.0.0.237 at vib (10.0.0.237 at vib) ns: filter-ost1_UUID lock: 000001010f17dac 0/0x7bef85be5c792519 lrc: 2/0,0 mode: PW/PW res: 15168088/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0 remote: 0x3dd1a37046787550 expref: 182 pid: 20464 LustreError: 20702:0:(ost_handler.c:955:ost_brw_write()) @@@ Eviction on bulk GET req at 0000010061543400 x4033002/t0 o4->82ed7_tiger_lov_eea68aa0e0 at NET_0xb00000a0000ed_UUID:-1 lens 328/288 ref 0 fl Interpret:/0/0 rc 0/0 LustreError: 20866:0:(ost_handler.c:747:ost_brw_read()) @@@ timeout on bulk PUT req at 0000010055099050 x4032963/t0 o3->82ed7_tiger_lov_eea68aa0e0 at NET_0xb00000a0000ed_UUID:-1 lens 328/280 ref 0 fl Interpret:/0/0 rc 0/0 LustreError: 20866:0:(ost_handler.c:747:ost_brw_read()) Skipped 4 previous similar messages Lustre: 20866:0:(ost_handler.c:813:ost_brw_read()) ost13: ignoring bulk IO comm error with 82ed7_tiger_lov_eea68aa0e0 at NET_0xb00000a0000ed_UUID id 12345-10.0.0.237 at vib - client will retry Lustre: 20866:0:(ost_handler.c:813:ost_brw_read()) Skipped 4 previous similar messages Lustre: 20702:0:(ost_handler.c:1050:ost_brw_write()) ost1: ignoring bulk IO comm error with 82ed7_tiger_lov_eea68aa0e0 at NET_0xb00000a0000ed_UUID id 12345-10.0.0.237 at vib - client will retry Lustre: 20106:0:(lib-move.c:1728:lnet_parse_reply()) 10.0.0.230 at vib: Dropping REPLY from 12345-10.0.0.237 at vib for invalid MD 0x45a932db4532a.0xb4ef9 Lustre: 20106:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from 12345-10.0.0.237 at vib portal 16 match 1346 offset 0 length 112: 2 LustreError: 20592:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req at 000001010209d800 x4033039/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0 LustreError: 20559:0:(service.c:648:ptlrpc_server_handle_request()) request 4510790 opc 101 from 12345-10.0.0.232 at vib processed in 109s trans 0 rc 0/0 -- cut -- I''m working on urging the clients to upgrade to 1.6 but it''s slow going. Any insight would be very helpful ... I can probably get on top of the fix if I have an inkling of where to start looking ... at this point, I don''t, I just get reports that it''s "slow". thanks, Klaus PS Sorry for the massive paste to everyone else on the list. -----Original Message----- From: Andreas.Dilger at sun.com on behalf of Andreas Dilger Sent: Mon 11/3/2008 9:47 AM To: Steden Klaus Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Odd performance issue with 1.4.x OSS ... On Oct 31, 2008 13:02 -0700, Steden Klaus wrote:> Our Lustre started exhibiting some curious performance issues today > ... basically, it slowed down dramatically and reliable I/O performance > became impossible. I looked through the output of dmesg and saw a number > of kernel ''oops'' messages, but not being a Lustre kernel expert, I''m > not exactly sure what they indicate. I stopped the OSTs on the node in > question and ran e2fsck on the OST drives, but they''ve come up clean so > I don''t think it''s a hardware problem. I don''t have physical access to > the machine right now so it may in fact be something on the back end, > but I''m working on verifying that with a technician on site. In the > meantime ... can anyone help decipher this for me? There are a couple > of messages like it:These kind of messages are of relatively little use unless they include some of the preceding lines. Are you sure this is an oops, and not a watchdog timeout that is dumping the stack?> -- cut -- > ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB) > 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8 > 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0 > 0000000000000013 0000000000000000 > Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} > <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} > <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932}This looks like a network problem, but hard to say without more information. If you are a supported customer, you will have better service by filing a bugzilla bug. This list only gets "as available" replies and that is doubly true for old 1.4 Lustre installations. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Oct 31, 2008 19:38 +0100, Tim Bell wrote:> For the OSSs, from what I can see, this can be achieved on a per-volume > basis using LVM and snapshotting. However, I feel uncomfortable with > this approach and would prefer something where I can restore an > individual file from backup if lost.I would strongly recommend filesystem-level backups instead of doing device-level backups. This gives you full control of which files to backup, and allows restoring individual files as needed. It is also easiest to integrate into existing backup solutions. It is probably also desirable to do a device-level backup of the MDS, because it is the central part of the Lustre filesystem, and if this device fails permanently then the entire filesystem would need to be restored. Lustre can generally handle if the MDS backup is slightly out of date w.r.t. the OSTs, though care must be taken when restoring in such a scenario.> Is there another way other than > find with mtime to find the list of modified files in the file system ?There is the "e2scan" program that is included with recent versions of the lustre e2fsprogs RPM. It is a fast scanner of the MDS filesystem that returns a list of files that have changed, like "find". You can also use "find" or "lfs find" to traverse the filesystem, and this can be very fast if run on a client mounted on the MDS node. In the 2.0 release there will be a changelog exported from Lustre which will catalog files that have changed in the filesystem since the last time an application processed the log.> For the MDSs, I cannot see how to back up an online MDS. If I do a > split mirror without quiesce and backup the ext3 file system of the > split-off mirror, I get an inconsistent state of database.You should use LVM mirroring to ensure the filesystem is coherent when the snapshot is made. There are patches going into the upstream kernel by Takashi Sato of NEC to allow a userspace process to freeze the filesystem while doing this sort of hardware mirror split (http://marc.info/?l=linux-fsdevel&m=122511248322724&w=4), but as yet this is not in the upstream kernel or any Lustre kernel. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger
2008-Nov-07 22:46 UTC
[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
On Nov 03, 2008 11:39 -0800, Steden Klaus wrote:> Hello Andreas, > > Thanks for the info. Here''s some more information (basically cut and pasted from before and after a typical stack dump message): > > -- cut -- > Lustre: 0:0:(watchdog.c:121:lcw_cb()) Watchdog triggered for pid 20559: it was inactive for 100000ms > Lustre: 0:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing stack for process 20559 > ll_ost_220 S 000001006e854008 0 20559 1 20560 20558 (L-TLB) > 000001009739f3e8 0000000000000046 000000000000000f ffffffffa05b33b8 > 0000000000000548 0000000100000000 0000000000000000 000001006e8540b0 > 0000000000000013 0000000000000000 > Call Trace:<ffffffffa05b33b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} > <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}This is a watchdog timeout, and should not be thought of as an oops or panic as you mentioned in your earlier comment. It is debugging for Lustre.> LustreError: 20438:0:(service.c:648:ptlrpc_server_handle_request()) request 4777659 opc 101 from 12345-10.0.0.249 at vib processed in 118s trans 0 rc 0/0 > Lustre: 20438:0:(watchdog.c:302:lcw_update_time()) Expired watchdog for pid 20438 disabled after 118.9719s > LustreError: 20257:0:(ldlm_lockd.c:579:ldlm_server_completion_ast()) ### enqueue wait took 118984007us from 1225493396 ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85be5c78b145 lrc: 2/0,0 mode: PW/PW res: 15229751/0 rrc: 3 type:This looks like slow request processing on the OST. I would suggest to increase the lustre timeout value to 150s or more. Timeout values of 300s is fairly common in larger installations (1000 nodes or more). In the 1.8 release Lustre will automatically tune this value so that it can adapt to larger-scale systems better.> I''m working on urging the clients to upgrade to 1.6 but it''s slow going. > > Any insight would be very helpful ... I can probably get on top of the fix if I have an inkling of where to start looking ... at this point, I don''t, I just get reports that it''s "slow". > > thanks, > Klaus > > PS Sorry for the massive paste to everyone else on the list. > > -----Original Message----- > From: Andreas.Dilger at sun.com on behalf of Andreas Dilger > Sent: Mon 11/3/2008 9:47 AM > To: Steden Klaus > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Odd performance issue with 1.4.x OSS ... > > On Oct 31, 2008 13:02 -0700, Steden Klaus wrote: > > Our Lustre started exhibiting some curious performance issues today > > ... basically, it slowed down dramatically and reliable I/O performance > > became impossible. I looked through the output of dmesg and saw a number > > of kernel ''oops'' messages, but not being a Lustre kernel expert, I''m > > not exactly sure what they indicate. I stopped the OSTs on the node in > > question and ran e2fsck on the OST drives, but they''ve come up clean so > > I don''t think it''s a hardware problem. I don''t have physical access to > > the machine right now so it may in fact be something on the back end, > > but I''m working on verifying that with a technician on site. In the > > meantime ... can anyone help decipher this for me? There are a couple > > of messages like it: > > These kind of messages are of relatively little use unless they include > some of the preceding lines. Are you sure this is an oops, and not a > watchdog timeout that is dumping the stack? > > > -- cut -- > > ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB) > > 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8 > > 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0 > > 0000000000000013 0000000000000000 > > Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317} > > <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0} > > <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932} > > This looks like a network problem, but hard to say without more information. > If you are a supported customer, you will have better service by filing a > bugzilla bug. This list only gets "as available" replies and that is > doubly true for old 1.4 Lustre installations. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Klaus Steden
2008-Nov-07 23:03 UTC
[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
On 11/7/08 2:46 PM, "Andreas Dilger" <adilger at sun.com> etched on stone tablets:> On Nov 03, 2008 11:39 -0800, Steden Klaus wrote: >> Hello Andreas, >> >> Thanks for the info. Here''s some more information (basically cut and pasted >> from before and after a typical stack dump message): >> >> -- cut -- >> Lustre: 0:0:(watchdog.c:121:lcw_cb()) Watchdog triggered for pid 20559: it >> was inactive for 100000ms >> Lustre: 0:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing stack for >> process 20559 >> ll_ost_220 S 000001006e854008 0 20559 1 20560 20558 >> (L-TLB) >> 000001009739f3e8 0000000000000046 000000000000000f ffffffffa05b33b8 >> 0000000000000548 0000000100000000 0000000000000000 000001006e8540b0 >> 0000000000000013 0000000000000000 >> Call Trace:<ffffffffa05b33b8>{:ptlrpc:ptl_send_buf+824} >> <ffffffff801454bd>{__mod_timer+317} >> <ffffffff8033860d>{schedule_timeout+381} >> <ffffffff801460a0>{process_timeout+0} > > This is a watchdog timeout, and should not be thought of as an oops or panic > as you mentioned in your earlier comment. It is debugging for Lustre. > >> LustreError: 20438:0:(service.c:648:ptlrpc_server_handle_request()) request >> 4777659 opc 101 from 12345-10.0.0.249 at vib processed in 118s trans 0 rc 0/0 >> Lustre: 20438:0:(watchdog.c:302:lcw_update_time()) Expired watchdog for pid >> 20438 disabled after 118.9719s >> LustreError: 20257:0:(ldlm_lockd.c:579:ldlm_server_completion_ast()) ### >> enqueue wait took 118984007us from 1225493396 ns: filter-ost13_UUID lock: >> 0000010036031dc0/0x7bef85be5c78b145 lrc: 2/0,0 mode: PW/PW res: 15229751/0 >> rrc: 3 type: > > This looks like slow request processing on the OST. I would suggest to > increase the lustre timeout value to 150s or more. Timeout values of > 300s is fairly common in larger installations (1000 nodes or more). In > the 1.8 release Lustre will automatically tune this value so that it > can adapt to larger-scale systems better. >Hi Andreas, That is good to know. I thought it was an oops message, not normal debugging. I''m used to Lustre "just working" without this kind of output, so I assumed the worst when I saw them. As far as the issue itself, apparently the Voltaire switch needed a restart, which has cleared whatever condition was causing the problem. thanks, Klaus