We lost our MDS/MGS to a power failure yesterday evening. Just to be safe, we ran e2fsck on the combined MDT/MGT and there were only a couple of minor complaints about HTREE issues that it fixed. The MDT/MGT now fsck''s cleanly. The problem is that, despite the clean e2fsck, the MGS is crashing in the lustre mount code when attempting to mount the MDT. It is a scratch file system so it is not backed up. Still, it is a pain to lose the data. I''m assuming this is not normal and there is not much in the manual about doing anything more than e2fsck but I want to ask if anyone else has seem something like this before and might have some additional suggestions before I trash the data and reformat the file system. Thanks, Charlie Taylor UF HPC Center
On Mon, Jun 02, 2008 at 11:02:11AM -0400, Charles Taylor wrote:> We lost our MDS/MGS to a power failure yesterday evening. Just to > be safe, we ran e2fsck on the combined MDT/MGT and there were only a > couple of minor complaints about HTREE issues that it fixed. The > MDT/MGT now fsck''s cleanly. The problem is that, despite the clean > e2fsck, the MGS is crashing in the lustre mount code when attempting > to mount the MDT.Where is it crashing exactly? Any stack traces, assertion failures ... on the console? Johann
Well, I figured someone would ask that. :) The last messages that make it to syslog prior to the crash are.... Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: recovery complete. Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Jun 2 10:29:54 hpcmds kernel: kjournald starting. Commit interval 5 seconds Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Jun 2 10:29:54 hpcmds kernel: Lustre: MGS MGS started Jun 2 10:29:54 hpcmds kernel: Lustre: Enabling user_xattr Jun 2 10:29:54 hpcmds kernel: Lustre: 4540:0:(mds_fs.c: 446:mds_init_server_data()) RECOVERY: service ufhpc-MDT0000, 100 recoverable clients, last_transno 9412464331 Jun 2 10:29:54 hpcmds kernel: Lustre: MDT ufhpc-MDT0000 now serving dev (ufhpc-MDT0000/cac99db5-a66a-a6ac-4649-6ec8cc2dc0e7), but will be in recovery until 100 clients reconnect, or if no clients reconnect for 4:10; during that time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/ ufhpc-MDT0000/recovery_status. Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting orphans on ufhpc-OST0004_UUID Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting orphans on ufhpc-OST0005_UUID Note that all of the clients are powered off and the OSS''s are currently unmounted (though they appear to be fine). Unfortunately, getting the messages off the console (in the machine room) means using a pencil and paper (you''d think we have something as fancy as a ip-kvm console server, but alas, we do things, ahem, "inexpensively" here. I''m going to let the md mirrors resync before I try it again (although I don''t think that should be an issue). If it crashes a third time, and I suspect it will, I''ll include some of the stack trace. Of course, part of the problem is that it is deep enough that it goes off screen and we can''t see the top of the trace (which is kind of useful). :) I was hoping for a silver bullet, but... Thanks, Charlie Taylor UF HPC Center On Jun 2, 2008, at 11:16 AM, Johann Lombardi wrote:> On Mon, Jun 02, 2008 at 11:02:11AM -0400, Charles Taylor wrote: >> We lost our MDS/MGS to a power failure yesterday evening. Just to >> be safe, we ran e2fsck on the combined MDT/MGT and there were only a >> couple of minor complaints about HTREE issues that it fixed. The >> MDT/MGT now fsck''s cleanly. The problem is that, despite the >> clean >> e2fsck, the MGS is crashing in the lustre mount code when attempting >> to mount the MDT. > > Where is it crashing exactly? Any stack traces, assertion failures ... > on the console? > > Johann
On Mon, 2008-06-02 at 11:35 -0400, Charles Taylor wrote:> > Well, I figured someone would ask that. :) The last messages that > make it to syslog prior to the crash are.... > > Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal > Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: recovery complete. > Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Jun 2 10:29:54 hpcmds kernel: kjournald starting. Commit interval 5 > seconds > Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal > Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Jun 2 10:29:54 hpcmds kernel: Lustre: MGS MGS started > Jun 2 10:29:54 hpcmds kernel: Lustre: Enabling user_xattr > Jun 2 10:29:54 hpcmds kernel: Lustre: 4540:0:(mds_fs.c: > 446:mds_init_server_data()) RECOVERY: service ufhpc-MDT0000, 100 > recoverable clients, last_transno 9412464331 > Jun 2 10:29:54 hpcmds kernel: Lustre: MDT ufhpc-MDT0000 now serving > dev (ufhpc-MDT0000/cac99db5-a66a-a6ac-4649-6ec8cc2dc0e7), but will be > in recovery until 100 clients reconnect, or if no clients reconnect > for 4:10; during that time new clients will not be allowed to connect. > Recovery progress can be monitored by watching /proc/fs/lustre/mds/ > ufhpc-MDT0000/recovery_status. > Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: > 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting > orphans on ufhpc-OST0004_UUID > Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: > 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting > orphans on ufhpc-OST0005_UUIDThis is all perfectly normal. Is there anything else or does this amount to all that you are seeing?> Note that all of the clients are powered off and the OSS''s are > currently unmounted (though they appear to be fine).Does anything bad happen when you bring up the OSSes? Ideally, OSTs should be brought up before the MDT but there is no requirement for that.> If it crashesDo you have messages from a crash?> a third time, and I suspect it will, I''ll include some > of the stack trace.Unless you are getting some kind of kernel panic, that stack trace should be in the syslog. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080602/4997330c/attachment.bin
Todd, Does this make sense? He is saying that OSTs need to be mounted first? I thought that they sould not connect if the MDT is not mounted. On 6/2/08 10:45 AM, "Brian J. Murrell" <Brian.Murrell at Sun.COM> wrote:> On Mon, 2008-06-02 at 11:35 -0400, Charles Taylor wrote: >> >> Well, I figured someone would ask that. :) The last messages that >> make it to syslog prior to the crash are.... >> >> Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal >> Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: recovery complete. >> Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with >> ordered data mode. >> Jun 2 10:29:54 hpcmds kernel: kjournald starting. Commit interval 5 >> seconds >> Jun 2 10:29:54 hpcmds kernel: LDISKFS FS on md2, internal journal >> Jun 2 10:29:54 hpcmds kernel: LDISKFS-fs: mounted filesystem with >> ordered data mode. >> Jun 2 10:29:54 hpcmds kernel: Lustre: MGS MGS started >> Jun 2 10:29:54 hpcmds kernel: Lustre: Enabling user_xattr >> Jun 2 10:29:54 hpcmds kernel: Lustre: 4540:0:(mds_fs.c: >> 446:mds_init_server_data()) RECOVERY: service ufhpc-MDT0000, 100 >> recoverable clients, last_transno 9412464331 >> Jun 2 10:29:54 hpcmds kernel: Lustre: MDT ufhpc-MDT0000 now serving >> dev (ufhpc-MDT0000/cac99db5-a66a-a6ac-4649-6ec8cc2dc0e7), but will be >> in recovery until 100 clients reconnect, or if no clients reconnect >> for 4:10; during that time new clients will not be allowed to connect. >> Recovery progress can be monitored by watching /proc/fs/lustre/mds/ >> ufhpc-MDT0000/recovery_status. >> Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: >> 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting >> orphans on ufhpc-OST0004_UUID >> Jun 2 10:29:55 hpcmds kernel: Lustre: 4540:0:(mds_lov.c: >> 858:mds_notify()) MDS ufhpc-MDT0000: in recovery, not resetting >> orphans on ufhpc-OST0005_UUID > > This is all perfectly normal. Is there anything else or does this > amount to all that you are seeing? > >> Note that all of the clients are powered off and the OSS''s are >> currently unmounted (though they appear to be fine). > > Does anything bad happen when you bring up the OSSes? Ideally, OSTs > should be brought up before the MDT but there is no requirement for > that. > >> If it crashes > > Do you have messages from a crash? > >> a third time, and I suspect it will, I''ll include some >> of the stack trace. > > Unless you are getting some kind of kernel panic, that stack trace > should be in the syslog. > > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080602/510a31b4/attachment.html
On Mon, Jun 02, 2008 at 11:35:35AM -0400, Charles Taylor wrote:> Well, I figured someone would ask that. :) The last messages that > make it to syslog prior to the crash are....[...] As expected, there is nothing wrong here.> Unfortunately, getting the messages off the console (in the machine > room) means using a pencil and paper (you''d think we have something as > fancy as a ip-kvm console server, but alas, we do things, ahem, > "inexpensively" here.Unfortunately, we cannot really help without more information ... You can still try to abort recovery (-o abort_recov) when mounting the MDS. Johann
On Mon, 2008-06-02 at 08:49 -0700, Dennis Nelson wrote:> Todd, > > Does this make sense? He is saying that OSTs need to be mounted > first?Not *need*, but rather, ideally, should. The reason is that when the MDS comes up, the opportunity for clients to get object pointers exists. It''s better that the OSTs are up to serve the expected object requests when that happens. Conversely, when shutting down, ideally you shut down the MDS first so that the ability for clients to get object pointers goes away before the OSTs serving them go away. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080602/283fc2b1/attachment.bin
On Jun 2, 2008, at 11:49 AM, Dennis Nelson wrote:> > > > > Unless you are getting some kind of kernel panic, that stack trace > > should be in the syslog.No, it is going down hard in a kernel panic. All of the stack trace I can see at the moment looks like (scribbled by hand... so forgive me for leaving off the addresses and offsets). :libcfs:cfs_alloc :obdclass:lustre_init_lsi :obdclass:lustre_fill_super :obdclass::lustre_fill_super set_anon_super set_anon_super :obd_class:lustre_fill_super et_sb_nodev vfs_kern_mount do_kern_mount do_mount __handle_mm_fault __up_read do_page_fault zone_statistics __alloc_pages sys_mount system_call RIP < ..... > resched_task I wish I could get the whole trace to you. We might try to get kdump on there but my luck with kdump has been mixed. It seems to work with some chipsets and not with others. Anyway, we may just be out of luck. I just hate to give up too easily because it seems like everything is solid yet we crash on or just after the mount. This is on a MDS that has been running without a problem for 5 months (lustre 1.6.4.2 ). uname -a Linux hpcmds 2.6.18-8.1.14.el5.L-1642 #2 SMP Thu Feb 21 15:42:14 EST 2008 x86_64 x86_64 x86_64 GNU/Linux I don''t know if that trace is a lot of help to you since it is not complete (which is why I didn''t post it initially) but maybe there is something there of use. Regards, Charlie Taylor UF HPC Center -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080602/f6631c74/attachment.html
On Monday 02 June 2008 08:35:35 am Charles Taylor wrote:> Unfortunately, getting the messages off the console (in the machine > room) means using a pencil and paper (you''d think we have something > as fancy as a ip-kvm console server, but alas, we do things, ahem, > "inexpensively" here.There are a couple solutions to help you there: * using a serial console connected to a remote machine (costs a serial cable and some configuration). * having an IPMI-enabled BMC, or any sort of remote-controler card should give you easy access to the machine''s console, remotely. Those cards ain''t cheap, but if you already got them in your servers, that''s the good occasion to put them in use. * and maybe the easiest, most inexpensive (no hardware involved) and most convenient one: using netdump [1]. You configure a netdump client on the machine you want to gather logs and traces from, and a netdump-server on an other host, to receive those messages. This solution proved to be really efficient in gathering Lustre''s debug logs and crash dumps. [1] http://www.redhat.com/support/wpapers/redhat/netdump/ and http://docs.freevps.com/doku.php?id=how-to:netdump HTH, -- Kilian
On Mon, 2008-06-02 at 12:58 -0400, Charles Taylor wrote:> > No, it is going down hard in a kernel panic. All of the stack > trace I can see at the moment looks like (scribbled by hand... so > forgive me for leaving off the addresses and offsets). > > > > > :libcfs:cfs_alloc > :obdclass:lustre_init_lsi > :obdclass:lustre_fill_super > :obdclass::lustre_fill_super > set_anon_super > set_anon_super > :obd_class:lustre_fill_super > et_sb_nodev > vfs_kern_mount > do_kern_mount > do_mount > __handle_mm_fault > __up_read > do_page_fault > zone_statistics > __alloc_pages > sys_mount > system_call > > > RIP < ..... > resched_taskI''m afraid that is too vague. Can you plug the serial port from this crashing machine into another (like a laptop) and use minicom on a serial directed console to capture it? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080602/e54e2d0a/attachment.bin
On Jun 02, 2008 10:05 -0700, Kilian CAVALOTTI wrote:> On Monday 02 June 2008 08:35:35 am Charles Taylor wrote: > > Unfortunately, getting the messages off the console (in the machine > > room) means using a pencil and paper (you''d think we have something > > as fancy as a ip-kvm console server, but alas, we do things, ahem, > > "inexpensively" here. > > There are a couple solutions to help you there: > * using a serial console connected to a remote machine (costs a serial > cable and some configuration).One very practical and low-cost mechanism is to cross-cable the serial console from one machine to its neighbour. Most server-class machines have 2 serial consoles, so you can have an inbound port for the console of the neighbour, and an outbound port configured to be the serial console of that machine.> * and maybe the easiest, most inexpensive (no hardware involved) and > most convenient one: using netdump [1]. You configure a netdump client > on the machine you want to gather logs and traces from, and a > netdump-server on an other host, to receive those messages. This > solution proved to be really efficient in gathering Lustre''s debug > logs and crash dumps. > > [1] http://www.redhat.com/support/wpapers/redhat/netdump/ > and http://docs.freevps.com/doku.php?id=how-to:netdumpYes, LLNL has been using netdump to good effect. It works with the "normal" crashdump utilities like "crash" (modified gdb). It isn''t in all kernels, however. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
We appreciate all the suggestions and help. Just for the record, we''ve used netdump successfully for a long time up through CentOS/RH 4.5. However, it seems that support for it, in favor of kdump (as far as we can tell) has been deprecated in RH/CentOS 5 and above. If we are wrong about that let us know since we had more success with netdump than we are having with kdump. Thanks, Charlie Taylor UF HPC Center On Jun 2, 2008, at 3:30 PM, Andreas Dilger wrote:>> >> [1] http://www.redhat.com/support/wpapers/redhat/netdump/ >> and http://docs.freevps.com/doku.php?id=how-to:netdump > > Yes, LLNL has been using netdump to good effect. It works with the > "normal" crashdump utilities like "crash" (modified gdb). It isn''t > in all kernels, however. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
On Jun 02, 2008 12:58 -0400, Charles Taylor wrote:> No, it is going down hard in a kernel panic. All of the stack trace I > can see at the moment looks like (scribbled by hand... so forgive me for > leaving off the addresses and offsets). > > > :libcfs:cfs_alloc > :obdclass:lustre_init_lsi > :obdclass:lustre_fill_super > :obdclass::lustre_fill_super > set_anon_super > set_anon_super > :obd_class:lustre_fill_super > et_sb_nodev > vfs_kern_mount > do_kern_mount > do_mount > __handle_mm_fault > __up_read > do_page_fault > zone_statistics > __alloc_pages > sys_mount > system_call > > RIP < ..... > resched_taskHmm, this doesn''t seem very useful. The callpath shown: lustre_fill_super->lustre_init_lsi->cfs_alloc() is _really_ early in the mount and either memory has been corrupted before this point (causing cfs_alloc() to crash) or you are missing some part of the stack at the top?> I wish I could get the whole trace to you. We might try to get kdump on > there but my luck with kdump has been mixed. It seems to work with some > chipsets and not with others.> Anyway, we may just be out of luck. I just hate to give up too easily > because it seems like everything is solid yet we crash on or just after the > mount. This is on a MDS that has been running without a problem for 5 > months (lustre 1.6.4.2 ). > > uname -a > Linux hpcmds 2.6.18-8.1.14.el5.L-1642 #2 SMP Thu Feb 21 15:42:14 EST 2008 > x86_64 x86_64 x86_64 GNU/LinuxIf mounting with "-o abort_recovery" doesn''t solve the problem, are you able to mount the MDT filesystem as "-t ldiskfs" instead of "-t lustre"? Try that, then copy and truncate the last_rcvd file: mount -t ldiskfs /dev/MDSDEV /mnt/mds cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1 umount /mnt/mds mount -t lustre /dev/MSDDEV /mnt/mds Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Right, netdump support was dropped in favor of kdump in RHEL 5. As a result netdump is past tense for us at LLNL. Jim On Mon, Jun 02, 2008 at 03:35:04PM -0400, Charles Taylor wrote:> We appreciate all the suggestions and help. Just for the record, > we''ve used netdump successfully for a long time up through CentOS/RH > 4.5. However, it seems that support for it, in favor of kdump (as > far as we can tell) has been deprecated in RH/CentOS 5 and above. > If we are wrong about that let us know since we had more success with > netdump than we are having with kdump. > > Thanks, > > Charlie Taylor > UF HPC Center > > > On Jun 2, 2008, at 3:30 PM, Andreas Dilger wrote: > > >> > >> [1] http://www.redhat.com/support/wpapers/redhat/netdump/ > >> and http://docs.freevps.com/doku.php?id=how-to:netdump > > > > Yes, LLNL has been using netdump to good effect. It works with the > > "normal" crashdump utilities like "crash" (modified gdb). It isn''t > > in all kernels, however. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Wow, you are one powerful witch doctor. So we rebuilt our system disk (just to be sure) and that made no difference we still panicked as soon as mounted the MDT. The "-o abort_recov" did not help either. However, your recipe below worked wonders....almost. Now we can mount the MDT but it does not go into recovery. It just shows as "inactive". We are so close, I can taste it but what are we doing wrong now? [root at hpcmds lustre]# cat /proc/fs/lustre/mds/ufhpc-MDT0000/ recovery_status status: INACTIVE Which tire do we kick now? :) Thanks, Charlie Taylor UF HPC Center On Jun 2, 2008, at 3:36 PM, Andreas Dilger wrote:>> > > If mounting with "-o abort_recovery" doesn''t solve the problem, > are you able to mount the MDT filesystem as "-t ldiskfs" instead of > "-t lustre"? Try that, then copy and truncate the last_rcvd file: > > mount -t ldiskfs /dev/MDSDEV /mnt/mds > cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav > cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav > dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1 > umount /mnt/mds > > mount -t lustre /dev/MSDDEV /mnt/mds > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
On Jun 02, 2008 19:51 -0400, Charles Taylor wrote:> Wow, you are one powerful witch doctor. So we rebuilt our system disk > (just to be sure) and that made no difference we still panicked as soon as > mounted the MDT. The "-o abort_recov" did not help either. However, > your recipe below worked wonders....almost. Now we can mount the MDT > but it does not go into recovery. It just shows as "inactive". We > are so close, I can taste it but what are we doing wrong now? > > > [root at hpcmds lustre]# cat /proc/fs/lustre/mds/ufhpc-MDT0000/recovery_status > status: INACTIVE > > > Which tire do we kick now? :)Well, deleting the tail of the last_rcvd file is the "hard" way to tell the MDT/OST it is no longer in recovery... The deleted part of the file is where the per-client state is kept, so when it is removed the MDT decides no recovery is needed. The "recovery_status" being "INACTIVE" is somewhat misleading. It means "no recovery is currently active", but the MDT is up and you should be able to use it, with the caveat that clients previously doing operations will get an IO error for in-flight operations before they start afresh... However, you said the clients are powered off, so they probably aren''t busy doing anything... If you had a more complete stack trace it would be useful to determine what is actually going wrong with the mount.> On Jun 2, 2008, at 3:36 PM, Andreas Dilger wrote: >> If mounting with "-o abort_recovery" doesn''t solve the problem, >> are you able to mount the MDT filesystem as "-t ldiskfs" instead of >> "-t lustre"? Try that, then copy and truncate the last_rcvd file: >> >> mount -t ldiskfs /dev/MDSDEV /mnt/mds >> cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav >> cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav >> dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1 >> umount /mnt/mds >> >> mount -t lustre /dev/MSDDEV /mnt/mds >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >>Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
I''m sorry, I should have updated you. You are right, it was misleading. The MDS/MDT was fine and after about twenty minutes or so everything became active and we now have a working file system with data that we can access so we can''t *thank you* enough. BTW, That''s a pretty obscure "fix". I was going to ask for an explanation but we''ve been pretty busy doing fsck''s and lfsck''s (which we are still working up to since it takes a while to generate the db''s). It is a pretty slow process but things are looking relatively good. Of course, when you go from thinking you just lost all your data to having almost all of it, anything looks pretty good. :) Thanks again for your help, Charlie Taylor UF HPC Center PS - we know refer to your commands to truncate the last_rcvd file as the "Dilger Procedure" (with great reverence). :) ct On Jun 3, 2008, at 4:20 PM, Andreas Dilger wrote:> On Jun 02, 2008 19:51 -0400, Charles Taylor wrote: >> Wow, you are one powerful witch doctor. So we rebuilt our >> system disk >> (just to be sure) and that made no difference we still panicked as >> soon as >> mounted the MDT. The "-o abort_recov" did not help either. >> However, >> your recipe below worked wonders....almost. Now we can mount >> the MDT >> but it does not go into recovery. It just shows as >> "inactive". We >> are so close, I can taste it but what are we doing wrong now? >> >> >> [root at hpcmds lustre]# cat /proc/fs/lustre/mds/ufhpc-MDT0000/ >> recovery_status >> status: INACTIVE >> >> >> Which tire do we kick now? :) > > Well, deleting the tail of the last_rcvd file is the "hard" way to > tell > the MDT/OST it is no longer in recovery... The deleted part of the > file > is where the per-client state is kept, so when it is removed the MDT > decides no recovery is needed. > > The "recovery_status" being "INACTIVE" is somewhat misleading. It > means > "no recovery is currently active", but the MDT is up and you should be > able to use it, with the caveat that clients previously doing > operations > will get an IO error for in-flight operations before they start > afresh... > However, you said the clients are powered off, so they probably aren''t > busy doing anything... > > If you had a more complete stack trace it would be useful to determine > what is actually going wrong with the mount. > >> On Jun 2, 2008, at 3:36 PM, Andreas Dilger wrote: >>> If mounting with "-o abort_recovery" doesn''t solve the problem, >>> are you able to mount the MDT filesystem as "-t ldiskfs" instead of >>> "-t lustre"? Try that, then copy and truncate the last_rcvd file: >>> >>> mount -t ldiskfs /dev/MDSDEV /mnt/mds >>> cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav >>> cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav >>> dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1 >>> umount /mnt/mds >>> >>> mount -t lustre /dev/MSDDEV /mnt/mds >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Sr. Staff Engineer, Lustre Group >>> Sun Microsystems of Canada, Inc. >>> > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
On Jun 03, 2008 16:37 -0400, Charles Taylor wrote:> I''m sorry, I should have updated you. You are right, it was > misleading. The MDS/MDT was fine and after about twenty minutes or > so everything became active and we now have a working file system with > data that we can access so we can''t *thank you* enough.You''re welcome.> BTW, That''s a pretty obscure "fix". I was going to ask for an > explanation but we''ve been pretty busy doing fsck''s and lfsck''s (which > we are still working up to since it takes a while to generate the > db''s). It is a pretty slow process but things are looking > relatively good. Of course, when you go from thinking you just lost > all your data to having almost all of it, anything looks pretty > good. :) > > PS - we know refer to your commands to truncate the last_rcvd file as > the "Dilger Procedure" (with great reverence). :)Well, by no means should this be a normal process. If you can spare the time after your system is back in shape, then copying the last_rcvd.sav file to a test MDT and mounting it with a serial console enabled would help track down what the root cause of this is. The fewer people that have to perform the "Dilger Procedure" the better.> On Jun 3, 2008, at 4:20 PM, Andreas Dilger wrote: > > On Jun 02, 2008 19:51 -0400, Charles Taylor wrote: > >> Wow, you are one powerful witch doctor. So we rebuilt our > >> system disk > >> (just to be sure) and that made no difference we still panicked as > >> soon as > >> mounted the MDT. The "-o abort_recov" did not help either. > >> However, > >> your recipe below worked wonders....almost. Now we can mount > >> the MDT > >> but it does not go into recovery. It just shows as > >> "inactive". We > >> are so close, I can taste it but what are we doing wrong now? > >> > >> > >> [root at hpcmds lustre]# cat /proc/fs/lustre/mds/ufhpc-MDT0000/ > >> recovery_status > >> status: INACTIVE > >> > >> > >> Which tire do we kick now? :) > > > > Well, deleting the tail of the last_rcvd file is the "hard" way to > > tell > > the MDT/OST it is no longer in recovery... The deleted part of the > > file > > is where the per-client state is kept, so when it is removed the MDT > > decides no recovery is needed. > > > > The "recovery_status" being "INACTIVE" is somewhat misleading. It > > means > > "no recovery is currently active", but the MDT is up and you should be > > able to use it, with the caveat that clients previously doing > > operations > > will get an IO error for in-flight operations before they start > > afresh... > > However, you said the clients are powered off, so they probably aren''t > > busy doing anything... > > > > If you had a more complete stack trace it would be useful to determine > > what is actually going wrong with the mount. > > > >> On Jun 2, 2008, at 3:36 PM, Andreas Dilger wrote: > >>> If mounting with "-o abort_recovery" doesn''t solve the problem, > >>> are you able to mount the MDT filesystem as "-t ldiskfs" instead of > >>> "-t lustre"? Try that, then copy and truncate the last_rcvd file: > >>> > >>> mount -t ldiskfs /dev/MDSDEV /mnt/mds > >>> cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav > >>> cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav > >>> dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1 > >>> umount /mnt/mds > >>> > >>> mount -t lustre /dev/MSDDEV /mnt/mds > >>> > >>> Cheers, Andreas > >>> -- > >>> Andreas Dilger > >>> Sr. Staff Engineer, Lustre Group > >>> Sun Microsystems of Canada, Inc. > >>> > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.