Dave Johnson
2009-Sep-04 12:45 UTC
[Lustre-discuss] Oops: Lustre mount of MDS causes kernel panic in mds_free_client
Our lustre filesystem is unable to run because the MDS host crashes immediately while mounting the metadata file system. It is accessing an invalid address (deadbeef) in the routine mds_free_client. The Lustre version is 1.6.0.1. Copying the crash log from the console by hand (lost the password to the management processors so we can''t do serial console anymore): mount.lustre Cannot handle kernel paging request mds_client_free+612 Trace: mds_destroy_export obdclass:class_export_destroy obdclass:obd_zombie_impexp_call obdclass:class_detach obdclass:class_process_config obdclass:class_manual_cleanup obdclass:lustre_fill_super I found messages in the mailing list about removing CATALOGS and OBJECTS/* and mounting using -o abort_recov. I tried these things, in addition to removing PENDING/* (all empty files). This last crash trace was done (accidentally) without the -o abort_recov mount option, but the outcome did not improve on the earlier attempts. Any help in this would be greatly appreciated. Thanks, -- ddj Dave Johnson Brown University CCV
Charles A. Taylor
2009-Sep-04 13:10 UTC
[Lustre-discuss] Oops: Lustre mount of MDS causes kernel panic in mds_free_client
You may want to try "The Dilger Procedure". See http://wiki.hpc.ufl.edu/index.php/Lustre This has saved us a number of times. Charlie Taylor UF HPC center On Fri, 2009-09-04 at 08:45 -0400, Dave Johnson wrote:> Our lustre filesystem is unable to run because the MDS host > crashes immediately while mounting the metadata file system. > It is accessing an invalid address (deadbeef) in the routine > mds_free_client. The Lustre version is 1.6.0.1. Copying the > crash log from the console by hand (lost the password to the > management processors so we can''t do serial console anymore): > > mount.lustre Cannot handle kernel paging request mds_client_free+612 > Trace: > mds_destroy_export > obdclass:class_export_destroy > obdclass:obd_zombie_impexp_call > obdclass:class_detach > obdclass:class_process_config > obdclass:class_manual_cleanup > obdclass:lustre_fill_super > > I found messages in the mailing list about removing CATALOGS and OBJECTS/* > and mounting using -o abort_recov. I tried these things, in addition to > removing PENDING/* (all empty files). This last crash trace was done > (accidentally) without the -o abort_recov mount option, but the outcome > did not improve on the earlier attempts. > > Any help in this would be greatly appreciated. > > Thanks, > > -- ddj > > Dave Johnson > Brown University CCV > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Brian J. Murrell
2009-Sep-04 13:17 UTC
[Lustre-discuss] Oops: Lustre mount of MDS causes kernel panic in mds_free_client
On Fri, 2009-09-04 at 08:45 -0400, Dave Johnson wrote:> The Lustre version is 1.6.0.1.Wow. I couldn''t imagine how really old that release is. You seriously need to consider an upgrade to something recent. 1.8.1 or 1.6.7.2 perhaps. I''m not even going to try to guess (or search for) which bug this is that you''ve run across as we have fixed too many bugs since that release to start trawling through them all. Our bug database is open (for the most part) so feel free to search yourself if you wish. http://bugzilla.lustre.org/> Any help in this would be greatly appreciated.Seriously, if there was only one thing you could do to help yourself out that would be upgrade. If the problem persists after that, at least you will be working with code we can all at least remember. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090904/43b27269/attachment.bin
Dave Johnson
2009-Sep-04 15:00 UTC
[Lustre-discuss] Oops: Lustre mount of MDS causes kernel panic in mds_free_client
Thank you very much, this worked perfectly. -- ddj Dave Johnson Brown University CCV On Fri, Sep 04, 2009 at 09:10:12AM -0400, Charles A. Taylor wrote:> > You may want to try "The Dilger Procedure". See > > http://wiki.hpc.ufl.edu/index.php/Lustre > > This has saved us a number of times. > > Charlie Taylor > UF HPC center > > On Fri, 2009-09-04 at 08:45 -0400, Dave Johnson wrote: > > Our lustre filesystem is unable to run because the MDS host > > crashes immediately while mounting the metadata file system. > > It is accessing an invalid address (deadbeef) in the routine > > mds_free_client. The Lustre version is 1.6.0.1. Copying the > > crash log from the console by hand (lost the password to the > > management processors so we can''t do serial console anymore): > > > > mount.lustre Cannot handle kernel paging request mds_client_free+612 > > Trace: > > mds_destroy_export > > obdclass:class_export_destroy > > obdclass:obd_zombie_impexp_call > > obdclass:class_detach > > obdclass:class_process_config > > obdclass:class_manual_cleanup > > obdclass:lustre_fill_super > > > > I found messages in the mailing list about removing CATALOGS and OBJECTS/* > > and mounting using -o abort_recov. I tried these things, in addition to > > removing PENDING/* (all empty files). This last crash trace was done > > (accidentally) without the -o abort_recov mount option, but the outcome > > did not improve on the earlier attempts. > > > > Any help in this would be greatly appreciated. > > > > Thanks, > > > > -- ddj > > > > Dave Johnson > > Brown University CCV > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss