Hello, Following the procedure outlined in the Lustre Manual Chapter 15 on backup and restore, I have tried two approaches to backing up a single MDT disk. One was the getfattr and then tar -cvf to get the data and restore it via the tar -xvf and setfattr method. The other method was the rsync -aXS of a mounted MDT to a recipient disk. I tried each method both keeping the OBJECTS/* and CATALOG and removing the aforementioned files. When I try to use the recipient disk I am able to mount it but not actually use it. The error from the rsync -aXS method is included below but the other errors were similar. Am I receiving the errors because I am not going to the OSS machine and for each OST running the tunefs.lustre --erase-param {params} --writeconf /dev/sdX command to clear things on the OSTs to accept use of the new MDT disk? I found this information on Lustre Discuss Archive under the title "problem moving mdt to a new node". Ch. 15 does not show/indicate performing a tunefs.lustre command on the OSTs as part of an MDT restore procedure. Thanks, megan ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 23 March 2009 This is the LBUG I got every time I mounted the restored LVM disk for crew8-MDT0000. The initial mount command returned the CLI prompt quickly but then the error msg started piling up in messages.>From /var/log/messages:Mar 20 10:53:57 mds1 kernel: kjournald starting. Commit interval 5 seconds Mar 20 10:53:57 mds1 kernel: LDISKFS FS on dm-1, internal journal Mar 20 10:53:57 mds1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Mar 20 10:53:57 mds1 kernel: kjournald starting. Commit interval 5 seconds Mar 20 10:53:57 mds1 kernel: LDISKFS FS on dm-1, internal journal Mar 20 10:53:57 mds1 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Mar 20 10:53:57 mds1 kernel: Lustre: Enabling user_xattr Mar 20 10:53:57 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev (f8a0e9b5-c2f1-8297 -4ead-e34c9680b3cf) with recovery enabled Mar 20 10:53:57 mds1 kernel: Lustre: Server crew8-MDT0000 on device /dev/METADATA2/LV2 has started Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(llog_lvfs.c:597:llog_lvfs_create()) err or looking up logfile 0xa65662:0x9c30d2f6: rc -2 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(osc_request.c:3446:osc_llog_init()) fai led LLOG_MDS_OST_ORIG_CTXT Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(osc_request.c:3457:osc_llog_init()) osc ''crew8-OST0000-osc'' tgt ''crew8-MDT0000'' cnt 1 catid ffffc2000510c000 rc=-2 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(osc_request.c:3459:osc_llog_init()) log id 0xa65662:0x9c30d2f6 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(lov_log.c:214:lov_llog_init()) error os c_llog_init idx 0 osc ''crew8-OST0000-osc'' tgt ''crew8-MDT0000'' (rc=-2) Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(mds_log.c:207:mds_llog_init()) lov_llog _init err -2 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(llog_obd.c:392:llog_cat_initialize()) r c: -2 Mar 20 10:53:57 mds1 kernel: LustreError: 23341:0:(lustre_log.h:316:llog_get_context()) AS SERTION(atomic_read(&ctxt_->loc_refcount) > 0) failed Mar 20 10:53:57 mds1 kernel: LustreError: 23341:0:(tracefile.c:431:libcfs_assertion_failed ()) LBUG Mar 20 10:53:57 mds1 kernel: Lustre: 23341:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 23341 Mar 20 10:53:57 mds1 kernel: ll_sync_01 R running task 0 23341 1 23 343 23339 (L-TLB) Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(osc_request.c:3457:osc_llog_init()) osc ''crew8-OST0000-osc'' tgt ''crew8-MDT0000'' cnt 1 catid ffffc2000510c000 rc=-2 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(osc_request.c:3459:osc_llog_init()) logid 0xa65662:0x9c30d2f6 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(lov_log.c:214:lov_llog_init()) error osc_llog_init idx 0 osc ''crew8-OST0000-osc'' tgt ''crew8-MDT0000'' (rc=-2) Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(mds_log.c:207:mds_llog_init()) lov_llog_init err -2 Mar 20 10:53:57 mds1 kernel: LustreError: 23339:0:(llog_obd.c:392:llog_cat_initialize()) rc: -2 Mar 20 10:53:57 mds1 kernel: LustreError: 23341:0:(lustre_log.h:316:llog_get_context()) ASSERTION(atomic_read(&ctxt_->loc_refcount) > 0) failed Mar 20 10:53:57 mds1 kernel: LustreError: 23341:0:(tracefile.c:431:libcfs_assertion_failed()) LBUG Mar 20 10:53:57 mds1 kernel: Lustre: 23341:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process 23341 Mar 20 10:53:57 mds1 kernel: ll_sync_01 R running task 0 23341 1 23343 23339 (L-TLB) Mar 20 10:53:57 mds1 kernel: 0000000000000001 ffffffff800c76d5 80000000c0ffeeaa 0000000000000180 Mar 20 10:53:57 mds1 kernel: ffff810004d1bb48 0000000000000000 000000000000000c ffff810025176178 Mar 20 10:53:57 mds1 kernel: 0000000000000180 ffffc2000510c000 ffff81004ff35d20 ffffffff883a7dc5 Mar 20 10:53:57 mds1 kernel: Call Trace: Mar 20 10:53:57 mds1 kernel: [<ffffffff800c76d5>] __vmalloc_area_node+0x12b/0x153 Mar 20 10:53:57 mds1 kernel: [<ffffffff883a7dc5>] :obdclass:llog_cat_initialize+0x3b5/0x670 Mar 20 10:53:57 mds1 kernel: [<ffffffff88834d57>] :lov:lov_get_info+0xa57/0xb20 Mar 20 10:53:57 mds1 kernel: [<ffffffff887b162a>] :mds:mds_lov_update_desc+0xc3a/0xe20 Mar 20 10:53:57 mds1 kernel: [<ffffffff887b1cee>] :mds:__mds_lov_synchronize+0x4de/0x2060 Mar 20 10:53:57 mds1 kernel: [<ffffffff8000cead>] dput+0x23/0x10a Mar 20 10:53:57 mds1 kernel: [<ffffffff887b4858>] :mds:mds_lov_synchronize+0x38/0xb0 Mar 20 10:53:57 mds1 kernel: [<ffffffff8000cead>] dput+0x23/0x10a Mar 20 10:53:57 mds1 kernel: [<ffffffff887b4858>] :mds:mds_lov_synchronize+0x38/0xb0 Mar 20 10:53:57 mds1 kernel: [<ffffffff800b296c>] audit_syscall_exit+0x2fb/0x319 Mar 20 10:53:57 mds1 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 Mar 20 10:53:57 mds1 kernel: [<ffffffff887b4820>] :mds:mds_lov_synchronize+0x0/0xb0 Mar 20 10:53:57 mds1 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 Mar 20 10:53:57 mds1 kernel: Mar 20 10:53:57 mds1 kernel: LustreError: dumping log to /tmp/lustre-log.1237560837.23341 Mar 20 10:54:07 mds1 kernel: BUG: soft lockup detected on CPU#0! Mar 20 10:54:07 mds1 kernel: Mar 20 10:54:07 mds1 kernel: Call Trace: Mar 20 10:54:07 mds1 kernel: <IRQ> [<ffffffff800b4f75>] softlockup_tick+0xdb/0xed Mar 20 10:54:07 mds1 kernel: [<ffffffff8009306a>] update_process_times+0x42/0x68 Mar 20 10:54:07 mds1 kernel: [<ffffffff8007464a>] smp_local_timer_interrupt+0x2c/0x61 Mar 20 10:54:07 mds1 kernel: [<ffffffff80074d12>] smp_apic_timer_interrupt+0x41/0x47 Mar 20 10:54:07 mds1 kernel: [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c Mar 20 10:54:07 mds1 kernel: <EOI> [<ffffffff88874f10>] :osc:osc_setinfo_mds_conn_interpret+0x0/0x3f0 Mar 20 10:54:07 mds1 kernel: [<ffffffff80062b1c>] .text.lock.spinlock+0x2/0x30 Mar 20 10:54:07 mds1 kernel: [<ffffffff88874ff4>] :osc:osc_setinfo_mds_conn_interpret+0xe4/0x3f0 Mar 20 10:54:07 mds1 kernel: [<ffffffff886b136a>] :ptlrpc:ptlrpc_check_set+0x9aa/0xb60 Mar 20 10:54:07 mds1 kernel: [<ffffffff80048b8a>] try_to_del_timer_sync+0x51/0x5a Mar 20 10:54:07 mds1 kernel: [<ffffffff886b3fea>] :ptlrpc:ptlrpc_set_wait+0x36a/0x520 Mar 20 10:54:07 mds1 kernel: [<ffffffff88316fe8>] :libcfs:cfs_alloc+0x28/0x60 Mar 20 10:54:07 mds1 kernel: [<ffffffff80088431>] default_wake_function+0x0/0xe Mar 20 10:54:07 mds1 kernel: [<ffffffff8882869e>] :lov:lov_set_info_async+0x5ae/0x660 Mar 20 10:54:07 mds1 kernel: [<ffffffff887b2927>] :mds:__mds_lov_synchronize+0x1117/0x2060 Mar 20 10:54:07 mds1 kernel: [<ffffffff8000cead>] dput+0x23/0x10a Mar 20 10:54:07 mds1 kernel: [<ffffffff887b4858>] :mds:mds_lov_synchronize+0x38/0xb0 Mar 20 10:54:07 mds1 kernel: [<ffffffff800b296c>] audit_syscall_exit+0x2fb/0x319 Mar 20 10:54:07 mds1 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 Mar 20 10:54:07 mds1 kernel: [<ffffffff887b4820>] :mds:mds_lov_synchronize+0x0/0xb0 Mar 20 10:54:07 mds1 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 Mar 20 10:54:07 mds1 kernel: Mar 20 11:02:01 mds1 kernel: Lustre: Failing over crew8-MDT0000 Mar 20 11:02:01 mds1 kernel: Lustre: Skipped 13 previous similar messages
On Tue, 2009-03-24 at 16:01 -0400, Ms. Megan Larko wrote:> Hello, > > Following the procedure outlined in the Lustre Manual Chapter 15 on > backup and restore, I have tried two approaches to backing up a single > MDT disk.Hrm. Can you file a bug on this? If you can, please provide a transcript (i.e. use the script command) of your entire process so that somebody can see if we have a procedural error in the manual or if there is a regression that has been introduced. Thanx, b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090325/61c2468d/attachment.bin