Christopher Mason
2007-Nov-17 04:43 UTC
[Lustre-discuss] BUG on lustre patchless client 1.6.3; many small files
I''ve had the following BUG using lustre patchless client 1.6.3 on linux 2.6.20 (Fedora Core 6). This hard-locked the machine; I was unable to tell if there was a subsequent panic. This was while copying approx 2.7 TB from ext3 to lustre; it had copied about 2.6 TB; I haven''t verified if the data made it across okay. There were a ton (> 1 M) of tiny files in this copy (which took about 60 hours over gigE); these cause a tremendous performance hit. This is not at all surprising, I just wonder if it''s related to the bug. I''m trying to get access to the lustre OSTs and MDTs and will post logs if they exist. I''m fairly new to lustre; are issues like this common when using a somewhat odd kernel? Thanks, -c Linux rome.mayo.edu 2.6.20-1.2952.fc6 #1 SMP Wed May 16 18:18:22 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Nov 16 05:28:34 rome kernel: BUG: soft lockup detected on CPU#0! Nov 16 05:28:34 rome kernel: Nov 16 05:28:34 rome kernel: Call Trace: Nov 16 05:28:34 rome kernel: <IRQ> [<ffffffff802b0bdb>] softlockup_tick+0xdb/0 xf6 Nov 16 05:28:34 rome kernel: [<ffffffff8028f5d0>] update_process_times +0x42/0x6 8 Nov 16 05:28:34 rome kernel: [<ffffffff80271f0c>] smp_local_timer_interrupt+0x3 4/0x55 Nov 16 05:28:34 rome kernel: [<ffffffff802725e8>] smp_apic_timer_interrupt+0x51 /0x69 Nov 16 05:28:34 rome kernel: [<ffffffff8025ace6>] apic_timer_interrupt +0x66/0x7 0 Nov 16 05:28:34 rome kernel: <EOI> [<ffffffff8022bde9>] dummy_inode_permission +0x0/0x3 Nov 16 05:28:34 rome kernel: [<ffffffff8020933c>] __d_lookup+0xdd/ 0x110 Nov 16 05:28:34 rome kernel: [<ffffffff8020ca8f>] do_lookup+0x2a/ 0x1ae Nov 16 05:28:34 rome kernel: [<ffffffff80209c72>] __link_path_walk +0x903/0xdb0 Nov 16 05:28:34 rome kernel: [<ffffffff8020e78d>] link_path_walk +0x55/0xd7 Nov 16 05:28:34 rome kernel: [<ffffffff8020c8f7>] do_path_lookup +0x1b5/0x217 Nov 16 05:28:34 rome kernel: [<ffffffff802123d6>] getname+0x152/0x1b8 Nov 16 05:28:34 rome kernel: [<ffffffff802237fb>] __user_walk_fd +0x37/0x4c Nov 16 05:28:34 rome kernel: [<ffffffff8023dc58>] vfs_lstat_fd +0x18/0x47 Nov 16 05:28:34 rome kernel: [<ffffffff8022a50f>] sys_newlstat +0x19/0x31 Nov 16 05:28:34 rome kernel: [<ffffffff8025a231>] tracesys+0x71/0xe1 Nov 16 05:28:34 rome kernel: [<ffffffff8025a29c>] tracesys+0xdc/0xe1 Nov 16 05:28:34 rome kernel: Nov 16 05:30:04 rome kernel: LustreError: 19267:0:(client.c: 969:ptlrpc_expire_on e_request()) @@@ timeout (sent at 1195212504, 100s ago) req at ffff8100bd777a00 x6 6226136/t0 o4->protfs-OST0003_UUID at 129.176.249.193@tcp:28 lens 384/352 ref 2 fl Rpc:/0/0 rc 0/-22 Nov 16 05:30:04 rome kernel: Lustre: protfs-OST0003-osc- ffff8100e50a5c00: Connec tion to service protfs-OST0003 via nid 129.176.249.193 at tcp was lost; in progress operations using this service will wait for recovery to complete. Nov 16 05:30:09 rome kernel: LustreError: 19267:0:(client.c: 969:ptlrpc_expire_on e_request()) @@@ timeout (sent at 1195212509, 100s ago) req at ffff8100c98afa00 x6 6226138/t0 o4->protfs-OST0001_UUID at 129.176.249.201@tcp:28 lens 384/352 ref 3 fl Rpc:/0/0 rc 0/-22 etc, etc.