Jakob Goldbach
2009-Jan-06 22:09 UTC
[Lustre-discuss] MDS crash during mount, last_rcvd trick not working
Hi, My MDS crashed during MDT mount. The last_rcvd trick described in the knowledge base is not working -kernel still crashes after truncating last_rcvd to 8k. (I have used it successfully before). Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running again ? Thanks /Jakob [ 344.935438] BUG: scheduling while atomic: mount.lustre/0xffff8101/2024 [ 344.936754] [ 344.936755] Call Trace: [ 344.937738] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 344.939092] ----------- [cut here ] --------- [please bite here ] --------- [ 344.940751] Kernel BUG at kernel/sched.c:1008 [ 344.941801] invalid opcode: 0000 [1] SMP [ 344.942784] CPU 0 [ 344.943308] Modules linked in: osc mds fsfilt_ldiskfs mgs mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16 ipmi_devintf ipmi_si ipmi_msghandler bonding dm_snapshot dm_mirror dm_mod generic serio_raw piix ehci_hcd uhci_hcd ide_core [ 344.949927] Pid: 2024, comm: mount.lustre Not tainted 2.6.18.8-bnx2-1.6.7b-cciss-3.6.18-5-lustre-1.6.4.3 #2 [ 344.951972] RIP: 0010:[<ffffffff80274371>] [<ffffffff80274371>] resched_task+0x24/0x65 [ 344.953893] RSP: 0018:ffffffff804ccdc0 EFLAGS: 00010002 [ 344.955099] RAX: 0000000000000001 RBX: 000000504ff8c8da RCX: ffff810124422000 [ 344.956687] RDX: ffff81012bd3bbc0 RSI: ffff810001023bf8 RDI: ffff81012b06a180 [ 344.958253] RBP: ffffffff804ccdc0 R08: 000000000000000d R09: 000000000000007f [ 344.959865] R10: ffff81012baec420 R11: 0000000000000000 R12: ffff81012b8dd810 [ 344.961259] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8100010232a0 [ 344.962946] FS: 00002ac6e3d176d0(0000) GS:ffffffff8051a000(0000) knlGS:0000000000000000 [ 344.964530] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 344.965871] CR2: 00002b233f140160 CR3: 0000000124240000 CR4: 00000000000006e0 [ 344.967261] Process mount.lustre (pid: 2024, threadinfo ffff810124422000, task ffff81012b06a180) [ 344.968992] Stack: ffffffff804cce20 ffffffff8024232e 0000000000000000 0000000000000001 [ 344.970865] 0000000000000001 0000000000000002 0000000000000082 ffff81012b8dd810 [ 344.972743] 000000000000000e 0000000000000001 ffff810001024d04 0000000000000000 [ 344.974502] Call Trace: [ 344.975203] <IRQ> [<ffffffff8024232e>] try_to_wake_up+0x2e3/0x353 [ 344.976561] [<ffffffff8027fab7>] signal_wake_up+0x1e/0x2d [ 344.977835] [<ffffffff8027fdcc>] __group_send_sig_info+0x89/0x94 [ 344.979030] [<ffffffff802551cf>] group_send_sig_info+0x4e/0x75 [ 344.980414] [<ffffffff80280cf3>] send_group_sig_info+0x28/0x35 [ 344.981591] [<ffffffff8027a99d>] it_real_fn+0x23/0x4f [ 344.982775] [<ffffffff8027a97a>] it_real_fn+0x0/0x4f [ 344.983792] [<ffffffff80249dbb>] hrtimer_run_queues+0x107/0x16d [ 344.984974] [<ffffffff8027e434>] run_timer_softirq+0x21/0x1b0 [ 344.986369] [<ffffffff802101e5>] __do_softirq+0x5e/0xd6 [ 344.987602] [<ffffffff80305e65>] end_msi_irq_w_maskbit+0xf/0x1c [ 344.994691] [<ffffffff80257f58>] call_softirq+0x1c/0x28 [ 344.996209] [<ffffffff802610a6>] do_softirq+0x2c/0x7d [ 344.997383] [<ffffffff80261071>] do_IRQ+0x6a/0x73 [ 344.998472] [<ffffffff8025727d>] ret_from_intr+0x0/0xa [ 344.999537] <EOI> [<ffffffff8027918c>] vprintk+0x29e/0x2ea [ 345.000844] [<ffffffff80286a6c>] autoremove_wake_function+0x9/0x2e [ 345.002332] [<ffffffff80273dbf>] __wake_up_common+0x3e/0x68 [ 345.003612] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.004946] [<ffffffff80279226>] printk+0x4e/0x56 [ 345.006061] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.007403] [<ffffffff8027918c>] vprintk+0x29e/0x2ea [ 345.008607] [<ffffffff8028e1bc>] kallsyms_lookup+0xe7/0x1af [ 345.009948] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.011277] [<ffffffff8025f832>] printk_address+0x9f/0xac [ 345.012519] [<ffffffff80279226>] printk+0x4e/0x56 [ 345.013507] [<ffffffff802f1216>] elv_insert+0xc9/0x192 [ 345.014549] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.015890] [<ffffffff8025fa38>] show_trace+0x1f9/0x21f [ 345.016965] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.018125] [<ffffffff8025fa70>] dump_stack+0x12/0x17 [ 345.019361] [<ffffffff8804a2bf>] :dm_mod:__map_bio+0x47/0x9b [ 345.020664] [<ffffffff8025973a>] __sched_text_start+0x7a/0x769 [ 345.021999] [<ffffffff8023ab95>] lock_timer_base+0x1b/0x3c [ 345.023258] [<ffffffff8022f226>] del_timer+0x4e/0x57 [ 345.024442] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.025663] [<ffffffff8025a59a>] io_schedule+0x28/0x34 [ 345.026919] [<ffffffff802130e3>] sync_buffer+0x3b/0x3f [ 345.028132] [<ffffffff8025a8f5>] __wait_on_bit+0x40/0x6f [ 345.029198] [<ffffffff802130a8>] sync_buffer+0x0/0x3f [ 345.030400] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78 [ 345.031660] [<ffffffff80286a91>] wake_bit_function+0x0/0x23 [ 345.032977] [<ffffffff80222c9f>] __bread+0x62/0x77 [ 345.034066] [<ffffffff880a1de2>] :ldiskfs:read_block_bitmap +0xa2/0xf0 [ 345.035359] [<ffffffff880a2695>] :ldiskfs:ldiskfs_free_blocks_sb +0x115/0x510 [ 345.036986] [<ffffffff880a2b21>] :ldiskfs:ldiskfs_free_blocks +0x91/0xe0 [ 345.038504] [<ffffffff880a7d1a>] :ldiskfs:ldiskfs_free_data +0x8a/0x110 [ 345.039828] [<ffffffff880a819c>] :ldiskfs:ldiskfs_truncate +0x20c/0x650 [ 345.041133] [<ffffffff802dbeab>] start_this_handle+0x355/0x405 [ 345.042556] [<ffffffff880a8bb4>] :ldiskfs:ldiskfs_delete_inode +0x84/0xf0 [ 345.044197] [<ffffffff880a8b30>] :ldiskfs:ldiskfs_delete_inode +0x0/0xf0 [ 345.045501] [<ffffffff8022c804>] generic_delete_inode+0x8e/0x10b [ 345.046728] [<ffffffff883ed891>] :mds:mds_obd_destroy+0xa11/0xad0 [ 345.048128] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b [ 345.049525] [<ffffffff8814961b>] :obdclass:llog_lvfs_close +0x6b/0x130 [ 345.051039] [<ffffffff8814a6c1>] :obdclass:llog_lvfs_destroy +0x841/0xa10 [ 345.052386] [<ffffffff88146a0f>] :obdclass:llog_cat_id2handle +0x4cf/0x5f0 [ 345.053994] [<ffffffff8021557d>] cache_grow+0x2ee/0x343 [ 345.055074] [<ffffffff881509c5>] :obdclass:cat_cancel_cb+0x405/0x630 [ 345.056634] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20 [ 345.058192] [<ffffffff8020c894>] dput+0x23/0x152 [ 345.059280] [<ffffffff881505c0>] :obdclass:cat_cancel_cb+0x0/0x630 [ 345.060717] [<ffffffff881503b3>] :obdclass:llog_obd_origin_setup +0x773/0x980 [ 345.062330] [<ffffffff8027486e>] find_busiest_group+0x20d/0x634 [ 345.063694] [<ffffffff8021819f>] vsnprintf+0x55e/0x5a3 [ 345.064967] [<ffffffff8815137d>] :obdclass:llog_setup+0x78d/0x860 [ 345.066364] [<ffffffff8842da94>] :osc:osc_llog_init+0x104/0x390 [ 345.067748] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.069099] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.070579] [<ffffffff882b92ca>] :lov:lov_llog_init+0x2ca/0x400 [ 345.071958] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.073485] [<ffffffff8022a2d7>] mntput_no_expire+0x19/0x8b [ 345.074837] [<ffffffff883b31ad>] :mds:mds_llog_init+0x1ad/0x270 [ 345.076015] [<ffffffff8029abcb>] map_vm_area+0x229/0x2a8 [ 345.077175] [<ffffffff8814e979>] :obdclass:obd_llog_init+0x179/0x210 [ 345.078448] [<ffffffff8029af5b>] __vmalloc_area_node+0x12b/0x153 [ 345.079650] [<ffffffff8814edc5>] :obdclass:llog_cat_initialize +0x3b5/0x670 [ 345.081268] [<ffffffff882cdc61>] :lov:lov_get_info+0x9f1/0xaa0 [ 345.082616] [<ffffffff8025a990>] out_of_line_wait_on_bit+0x6c/0x78 [ 345.083841] [<ffffffff80286a91>] wake_bit_function+0x0/0x23 [ 345.085059] [<ffffffff883bc5ac>] :mds:mds_lov_update_desc +0xbcc/0xd30 [ 345.086619] [<ffffffff883c0e21>] :mds:mds_lov_connect+0x12c1/0x2020 [ 345.088059] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.089271] [<ffffffff8815135e>] :obdclass:llog_setup+0x76e/0x860 [ 345.090497] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.091872] [<ffffffff880f9db8>] :lvfs:upcall_cache_init+0x2f8/0x3a0 [ 345.093153] [<ffffffff883ce381>] :mds:mds_setup+0x10a1/0x1bd0 [ 345.094315] [<ffffffff8021557d>] cache_grow+0x2ee/0x343 [ 345.095371] [<ffffffff802562d7>] cache_alloc_refill+0xde/0x1da [ 345.096740] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.098050] [<ffffffff8815a5cd>] :obdclass:class_new_export +0x52d/0x5b0 [ 345.099458] [<ffffffff8816fcdb>] :obdclass:class_setup+0x8bb/0xbe0 [ 345.100697] [<ffffffff8817236a>] :obdclass:class_process_config +0x14ca/0x19f0 [ 345.102340] [<ffffffff881756da>] :obdclass:class_config_llog_handler +0x153a/0x1990 [ 345.104079] [<ffffffff80224869>] do_filp_open+0x2d/0x3d [ 345.105317] [<ffffffff8814bcfc>] :obdclass:llog_lvfs_next_block +0x2ac/0x710 [ 345.106876] [<ffffffff88146129>] :obdclass:llog_process+0xa09/0xe20 [ 345.108321] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.109465] [<ffffffff881741a0>] :obdclass:class_config_llog_handler +0x0/0x1990 [ 345.111169] [<ffffffff8817402f>] :obdclass:class_config_parse_llog +0x43f/0x5b0 [ 345.112828] [<ffffffff8020c8a5>] dput+0x34/0x152 [ 345.113868] [<ffffffff880f9052>] :lvfs:lustre_rename+0x482/0x530 [ 345.115157] [<ffffffff88143fea>] :obdclass:llog_close+0x1aa/0x230 [ 345.116668] [<ffffffff8836fe03>] :mgc:mgc_process_log+0x20f3/0x2640 [ 345.117916] [<ffffffff88370b90>] :mgc:mgc_blocking_ast+0x0/0x450 [ 345.119221] [<ffffffff881ddeb0>] :ptlrpc:ldlm_completion_ast +0x0/0x6a0 [ 345.120556] [<ffffffff8836d85c>] :mgc:config_log_find+0x19c/0x340 [ 345.121954] [<ffffffff88373fc2>] :mgc:mgc_process_config +0xe02/0x1280 [ 345.123472] [<ffffffff881795bc>] :obdclass:lustre_process_log +0xb2c/0xee0 [ 345.125033] [<ffffffff88179a40>] :obdclass:server_find_mount +0x80/0x190 [ 345.126421] [<ffffffff8817f7a6>] :obdclass:server_start_targets +0xb36/0x17e0 [ 345.127819] [<ffffffff8022d4ac>] __up_write+0x21/0x10d [ 345.128871] [<ffffffff88183c27>] :obdclass:server_fill_super +0x18c7/0x1ee0 [ 345.130308] [<ffffffff80208d6d>] __d_lookup+0xb0/0x100 [ 345.131812] [<ffffffff880d7f48>] :libcfs:cfs_alloc+0x28/0x60 [ 345.132994] [<ffffffff881778bf>] :obdclass:lustre_init_lsi +0x29f/0x660 [ 345.134301] [<ffffffff88184240>] :obdclass:lustre_fill_super +0x0/0x1ae0 [ 345.135680] [<ffffffff88185ba3>] :obdclass:lustre_fill_super +0x1963/0x1ae0 [ 345.137254] [<ffffffff802a95f5>] set_anon_super+0x3c/0xab [ 345.138372] [<ffffffff802a95b9>] set_anon_super+0x0/0xab [ 345.139609] [<ffffffff88184240>] :obdclass:lustre_fill_super +0x0/0x1ae0 [ 345.141115] [<ffffffff802a9805>] get_sb_nodev+0x4f/0x97 [ 345.142318] [<ffffffff802a910b>] vfs_kern_mount+0x93/0x11a [ 345.143573] [<ffffffff802a91d4>] do_kern_mount+0x36/0x4d [ 345.144754] [<ffffffff802b1982>] do_mount+0x68c/0x6ff [ 345.145930] [<ffffffff802088d3>] __handle_mm_fault+0x530/0x91a [ 345.147288] [<ffffffff80218776>] remove_vma+0x55/0x5c [ 345.148307] [<ffffffff8021f84a>] __up_read+0x13/0x8a [ 345.149455] [<ffffffff8020a6af>] do_page_fault+0x3d1/0x706 [ 345.150715] [<ffffffff8020c2e4>] do_path_lookup+0x268/0x28c [ 345.151992] [<ffffffff80297807>] zone_statistics+0x3e/0x6d [ 345.153145] [<ffffffff8020dcbc>] __alloc_pages+0x5c/0x29b [ 345.154399] [<ffffffff802472dd>] sys_mount+0x8a/0xd7 [ 345.155550] [<ffffffff80256d82>] system_call+0x7e/0x83 [ 345.156591] [ 345.156966] [ 345.156967] Code: 0f 0b 68 aa 6c 3f 80 c2 f0 03 8b 41 10 a8 08 75 2e f0 0f ba [ 345.159822] RIP [<ffffffff80274371>] resched_task+0x24/0x65 [ 345.161214] RSP <ffffffff804ccdc0> [ 345.161948] <0>Kernel panic - not syncing: Aiee, killing interrupt handler! [ 345.163565]
Jakob Goldbach
2009-Jan-07 06:02 UTC
[Lustre-discuss] MDS crash during mount, last_rcvd trick not working
On Tue, 2009-01-06 at 23:09 +0100, Jakob Goldbach wrote:> > Any ideas (other than upgrading from 1.6.4.3) on getting my MDT running > again ? >I tried upgrading to linux 2.6.22.19 with lustre 1.6.5.1 as I''m running this on a different setup. Panic as well. Any ideas? Thanks, /Jakob Lustre: Enabling user_xattr BUG: scheduling while atomic: mount.lustre/0xffff8101/19019 Call Trace: [<ffffffff804d83b7>] schedule+0x5f/0x8df [<ffffffff8023b08d>] del_timer+0x57/0x62 [<ffffffff80313e72>] __generic_unplug_device+0x25/0x29 [<ffffffff80314236>] generic_unplug_device+0x20/0x32 [<ffffffff804d8d58>] io_schedule+0x2d/0x39 [<ffffffff8029bc3e>] sync_buffer+0x3b/0x3f [<ffffffff804d9349>] __wait_on_bit+0x47/0x79 [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff804d93e5>] out_of_line_wait_on_bit+0x6a/0x77 [<ffffffff802451dc>] wake_bit_function+0x0/0x2a [<ffffffff8029c095>] ll_rw_block+0x95/0xbc [<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22 [<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80 [<ffffffff883ce506>] :fsfilt_ldiskfs:fsfilt_ldiskfs_read_record +0x106/0x210 [<ffffffff8025f1bc>] __alloc_pages+0x83/0x2c6 [<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220 [<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee [<ffffffff8027a74b>] __dentry_open+0x111/0x1bd [<ffffffff88111811>] :obdclass:llog_lvfs_read_header+0x1a1/0x440 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff8810d183>] :obdclass:llog_init_handle+0xe3/0x8a0 [<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22 [<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80 [<ffffffff8810dd6f>] :obdclass:llog_cat_id2handle+0x18f/0x630 [<ffffffff8811762b>] :obdclass:cat_cancel_cb+0x5b/0x6a0 [<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0 [<ffffffff802909d3>] mntput_no_expire+0x20/0x7d [<ffffffff881175d0>] :obdclass:cat_cancel_cb+0x0/0x6a0 [<ffffffff880bc10e>] :lvfs:pop_ctxt+0xae/0x2e0 [<ffffffff88118644>] :obdclass:llog_obd_origin_setup+0x684/0xb00 [<ffffffff88118f8e>] :obdclass:llog_setup+0x4ce/0x840 [<ffffffff8826f84f>] :osc:osc_llog_init+0x12f/0x410 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff882d8014>] :lov:lov_llog_init+0x274/0x440 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff883d95b5>] :mds:mds_llog_init+0x1d5/0x280 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff8026febc>] __vmalloc_node+0x58/0x65 [<ffffffff88116a55>] :obdclass:llog_cat_initialize+0x1c5/0x690 [<ffffffff882f13d8>] :lov:lov_get_info+0x98/0xbf0 [<ffffffff883e2aec>] :mds:mds_lov_update_desc+0x25c/0x9f0 [<ffffffff883ea462>] :mds:mds_lov_connect+0x7e2/0x1b70 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff88118f66>] :obdclass:llog_setup+0x4a6/0x840 [<ffffffff88137ab7>] :obdclass:class_get_profile+0x67/0x1d0 [<ffffffff883f0e5b>] :mds:mds_setup+0x10fb/0x1bf0 [<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff88124766>] :obdclass:class_new_export+0x1f6/0x550 [<ffffffff8813b271>] :obdclass:class_setup+0x7f1/0xcd0 [<ffffffff88122009>] :obdclass:class_name2dev+0x59/0xe0 [<ffffffff8813e1ab>] :obdclass:class_process_config+0x147b/0x1c70 [<ffffffff881406ef>] :obdclass:class_config_llog_handler+0xdef/0x1c50 [<ffffffff8027a8da>] do_filp_open+0x39/0x4b [<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220 [<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff8813f900>] :obdclass:class_config_llog_handler+0x0/0x1c50 [<ffffffff88136a9d>] :obdclass:class_config_parse_llog+0x18d/0x5e0 [<ffffffff8028b96c>] dput+0x35/0x116 [<ffffffff880bb01c>] :lvfs:lustre_rename+0x16c/0x580 [<ffffffff88394ae8>] :mgc:mgc_process_log+0x278/0x2780 [<ffffffff88397910>] :mgc:mgc_blocking_ast+0x0/0x4b0 [<ffffffff881b0260>] :ptlrpc:ldlm_completion_ast+0x0/0x770 [<ffffffff8031fdaf>] vsnprintf+0x54d/0x593 [<ffffffff8839a0a1>] :mgc:mgc_name2resid+0xd1/0x190 [<ffffffff8839455c>] :mgc:config_log_find+0x6c/0x380 [<ffffffff8839adfa>] :mgc:mgc_process_config+0xc2a/0x1130 [<ffffffff88144524>] :obdclass:lustre_process_log+0x3b4/0xfe0 [<ffffffff88145208>] :obdclass:server_find_mount+0x48/0x1c0 [<ffffffff881498c1>] :obdclass:server_start_targets+0xcc1/0x1ab0 [<ffffffff8814f7f2>] :obdclass:server_fill_super+0x14a2/0x22d0 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff881514ee>] :obdclass:lustre_fill_super+0xece/0x17d4 [<ffffffff8027cff2>] set_anon_super+0x4b/0xb4 [<ffffffff8027d898>] sget+0x378/0x38a [<ffffffff8027cfa7>] set_anon_super+0x0/0xb4 [<ffffffff88150620>] :obdclass:lustre_fill_super+0x0/0x17d4 [<ffffffff8027de2a>] get_sb_nodev+0x57/0x97 [<ffffffff88141676>] :obdclass:lustre_get_sb+0x16/0x20 [<ffffffff8027ce77>] vfs_kern_mount+0x52/0x8e [<ffffffff8027cf0c>] do_kern_mount+0x47/0xe2 [<ffffffff80292573>] do_mount+0x671/0x6cb [<ffffffff8031e8eb>] __up_read+0x8f/0x98 [<ffffffff80247b63>] up_read+0x9/0xb [<ffffffff804dc675>] do_page_fault+0x447/0x7a8 [<ffffffff802841d6>] release_open_intent+0x17/0x20 [<ffffffff8025f49d>] __get_free_pages+0x32/0x6b [<ffffffff80290747>] copy_mount_options+0x2f/0x136 [<ffffffff80292656>] sys_mount+0x89/0xd7 [<ffffffff8027a969>] do_sys_open+0x7d/0x8d [<ffffffff802095fe>] system_call+0x7e/0x83 Unable to handle kernel paging request at fffffffff4482d60 RIP: [<ffffffff804d893a>] schedule+0x5e2/0x8df PGD 203067 PUD 529f067 PMD 0 Oops: 0000 [1] SMP CPU 3 Modules linked in: mds fsfilt_ldiskfs mgs mgc lustre lov mdc lquota osc ksocklnd ptlrpc obdclass lnet lvfs libcfs ldiskfs crc16 ipmi_devintf bonding dm_snapshot dm_mirror dm_mod ipmi_si ipmi_msghandler Pid: 19019, comm: mount.lustre Not tainted 2.6.22.19-lustre-1.6.5.1 #2 RIP: 0010:[<ffffffff804d893a>] [<ffffffff804d893a>] schedule +0x5e2/0x8df RSP: 0000:ffff81010e7ba448 EFLAGS: 00010083 RAX: 000000000e7ba028 RBX: ffff81012f4a8b20 RCX: ffff81012f4a8b20 RDX: ffffffff806fb0c0 RSI: 0000000000000000 RDI: ffff81012f4a8b20 RBP: ffff81010e7ba528 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000080 R11: ffffffff805ae781 R12: ffff8100052c5a4c R13: ffff81010e7ba5b8 R14: ffff8100052c4800 R15: 00000eb61b0422d7 FS: 00002b7f59abe6d0(0000) GS:ffff81012ff073c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: fffffffff4482d60 CR3: 000000010257a000 CR4: 00000000000006e0 Process mount.lustre (pid: 19019, threadinfo ffff81010e7ba000, task ffff81012f4a8b20) Stack: ffff81012fa1d868 0000000000000000 ffff81010e7ba5b8 ffff81010e7ba4d8 ffff81010e7ba498 ffffffff806f8800 ffff81010e7ba498 0000000000000086 ffff81012fa1d7a8 ffff81012fa1d7a8 0000000000000004 ffff81012f4a8b20 Call Trace: [<ffffffff80314236>] generic_unplug_device+0x20/0x32 [<ffffffff804d8d58>] io_schedule+0x2d/0x39 [<ffffffff8029bc3e>] sync_buffer+0x3b/0x3f [<ffffffff804d9349>] __wait_on_bit+0x47/0x79 [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff804d93e5>] out_of_line_wait_on_bit+0x6a/0x77 [<ffffffff802451dc>] wake_bit_function+0x0/0x2a [<ffffffff8029c095>] ll_rw_block+0x95/0xbc [<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22 [<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80 [<ffffffff883ce506>] :fsfilt_ldiskfs:fsfilt_ldiskfs_read_record +0x106/0x210 [<ffffffff8025f1bc>] __alloc_pages+0x83/0x2c6 [<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220 [<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee [<ffffffff8027a74b>] __dentry_open+0x111/0x1bd [<ffffffff88111811>] :obdclass:llog_lvfs_read_header+0x1a1/0x440 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff8810d183>] :obdclass:llog_init_handle+0xe3/0x8a0 [<ffffffff8029bbc1>] __wait_on_buffer+0x20/0x22 [<ffffffff88065fc9>] :ldiskfs:ldiskfs_bread+0x59/0x80 [<ffffffff8810dd6f>] :obdclass:llog_cat_id2handle+0x18f/0x630 [<ffffffff8811762b>] :obdclass:cat_cancel_cb+0x5b/0x6a0 [<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0 [<ffffffff802909d3>] mntput_no_expire+0x20/0x7d [<ffffffff881175d0>] :obdclass:cat_cancel_cb+0x0/0x6a0 [<ffffffff880bc10e>] :lvfs:pop_ctxt+0xae/0x2e0 [<ffffffff88118644>] :obdclass:llog_obd_origin_setup+0x684/0xb00 [<ffffffff88118f8e>] :obdclass:llog_setup+0x4ce/0x840 [<ffffffff8826f84f>] :osc:osc_llog_init+0x12f/0x410 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff8029bc03>] sync_buffer+0x0/0x3f [<ffffffff882d8014>] :lov:lov_llog_init+0x274/0x440 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff883d95b5>] :mds:mds_llog_init+0x1d5/0x280 [<ffffffff881166fe>] :obdclass:obd_llog_init+0xae/0x240 [<ffffffff8026febc>] __vmalloc_node+0x58/0x65 [<ffffffff88116a55>] :obdclass:llog_cat_initialize+0x1c5/0x690 [<ffffffff882f13d8>] :lov:lov_get_info+0x98/0xbf0 [<ffffffff883e2aec>] :mds:mds_lov_update_desc+0x25c/0x9f0 [<ffffffff883ea462>] :mds:mds_lov_connect+0x7e2/0x1b70 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff88118f66>] :obdclass:llog_setup+0x4a6/0x840 [<ffffffff88137ab7>] :obdclass:class_get_profile+0x67/0x1d0 [<ffffffff883f0e5b>] :mds:mds_setup+0x10fb/0x1bf0 [<ffffffff80278b92>] cache_alloc_refill+0x84/0x4ee [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff88124766>] :obdclass:class_new_export+0x1f6/0x550 [<ffffffff8813b271>] :obdclass:class_setup+0x7f1/0xcd0 [<ffffffff88122009>] :obdclass:class_name2dev+0x59/0xe0 [<ffffffff8813e1ab>] :obdclass:class_process_config+0x147b/0x1c70 [<ffffffff881406ef>] :obdclass:class_config_llog_handler+0xdef/0x1c50 [<ffffffff8027a8da>] do_filp_open+0x39/0x4b [<ffffffff88110da8>] :obdclass:llog_lvfs_read_blob+0x58/0x220 [<ffffffff8810c96e>] :obdclass:llog_process+0x69e/0xdd0 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff8813f900>] :obdclass:class_config_llog_handler+0x0/0x1c50 [<ffffffff88136a9d>] :obdclass:class_config_parse_llog+0x18d/0x5e0 [<ffffffff8028b96c>] dput+0x35/0x116 [<ffffffff880bb01c>] :lvfs:lustre_rename+0x16c/0x580 [<ffffffff88394ae8>] :mgc:mgc_process_log+0x278/0x2780 [<ffffffff88397910>] :mgc:mgc_blocking_ast+0x0/0x4b0 [<ffffffff881b0260>] :ptlrpc:ldlm_completion_ast+0x0/0x770 [<ffffffff8031fdaf>] vsnprintf+0x54d/0x593 [<ffffffff8839a0a1>] :mgc:mgc_name2resid+0xd1/0x190 [<ffffffff8839455c>] :mgc:config_log_find+0x6c/0x380 [<ffffffff8839adfa>] :mgc:mgc_process_config+0xc2a/0x1130 [<ffffffff88144524>] :obdclass:lustre_process_log+0x3b4/0xfe0 [<ffffffff88145208>] :obdclass:server_find_mount+0x48/0x1c0 [<ffffffff881498c1>] :obdclass:server_start_targets+0xcc1/0x1ab0 [<ffffffff8814f7f2>] :obdclass:server_fill_super+0x14a2/0x22d0 [<ffffffff8809a2ee>] :libcfs:cfs_alloc+0x5e/0x90 [<ffffffff881514ee>] :obdclass:lustre_fill_super+0xece/0x17d4 [<ffffffff8027cff2>] set_anon_super+0x4b/0xb4 [<ffffffff8027d898>] sget+0x378/0x38a [<ffffffff8027cfa7>] set_anon_super+0x0/0xb4 [<ffffffff88150620>] :obdclass:lustre_fill_super+0x0/0x17d4 [<ffffffff8027de2a>] get_sb_nodev+0x57/0x97 [<ffffffff88141676>] :obdclass:lustre_get_sb+0x16/0x20 [<ffffffff8027ce77>] vfs_kern_mount+0x52/0x8e [<ffffffff8027cf0c>] do_kern_mount+0x47/0xe2 [<ffffffff80292573>] do_mount+0x671/0x6cb [<ffffffff8031e8eb>] __up_read+0x8f/0x98 [<ffffffff80247b63>] up_read+0x9/0xb [<ffffffff804dc675>] do_page_fault+0x447/0x7a8 [<ffffffff802841d6>] release_open_intent+0x17/0x20 [<ffffffff8025f49d>] __get_free_pages+0x32/0x6b [<ffffffff80290747>] copy_mount_options+0x2f/0x136 [<ffffffff80292656>] sys_mount+0x89/0xd7 [<ffffffff8027a969>] do_sys_open+0x7d/0x8d [<ffffffff802095fe>] system_call+0x7e/0x83 Code: 48 8b 04 c5 20 2c 6b 80 48 8b 40 08 c7 44 02 08 01 00 00 00 RIP [<ffffffff804d893a>] schedule+0x5e2/0x8df RSP <ffff81010e7ba448> CR2: fffffffff4482d60 Kernel panic - not syncing: Aiee, killing interrupt handler!
Jakob Goldbach
2009-Jan-07 21:24 UTC
[Lustre-discuss] Fixed: MDS crash during mount, last_rcvd trick not working
> I tried upgrading to linux 2.6.22.19 with lustre 1.6.5.1 as I''m running > this on a different setup. Panic as well. >Fixed by upgrading to 2.6.22.19 and lustre-1.6.6 /Jakob
Andreas Dilger
2009-Jan-08 12:18 UTC
[Lustre-discuss] Fixed: MDS crash during mount, last_rcvd trick not working
On Jan 07, 2009 22:24 +0100, Jakob Goldbach wrote:> > > I tried upgrading to linux 2.6.22.19 with lustre 1.6.5.1 as I''m running > > this on a different setup. Panic as well. > > Fixed by upgrading to 2.6.22.19 and lustre-1.6.6I think this was the MDS stack overflow during recovery problem, which was fixed as you observe. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.