swin wang
2007-Mar-20 08:22 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Hi I want my lustre do such things during OST failed: if some file has stripe data on th failed OST, any operation on the file will return IO error without blocking, and also at this moment I can create and read/write new file or read/write files which have no stripe data on the failed OST without blocking. What should I do ? How to configure? thanks! swin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070320/3541cc15/attachment.html
Andreas Dilger
2007-Mar-20 10:53 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
On Mar 20, 2007 23:22 +0800, swin wang wrote:> I want my lustre do such things during OST failed: if some file > has stripe data on th failed OST, any operation on the file will > return IO error without blocking, and also at this moment I can > create and read/write new file or read/write files which have no stripe > data on the failed OST without blocking. > What should I do ? How to configure?On the clients run: lctl dl # list lustre device configuration lctl --device {failed OST device number} deactivate Same on the MDS, though the device will be different compared to clients. This will deactivate the failed OST on the clients so they get an IO error when accessing any files on the OST, instead of waiting for it to recover. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
swin wang
2007-Mar-20 19:29 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
I know this configuration, but I hope it can automatically bypass the failed OST, because I can''t make sure when the OST will be failed, mybe my question is how to avoid blocking rpc on OST, but instead of return IO error, whenever the OST is failed. 2007/3/21, Andreas Dilger <adilger@clusterfs.com>: On the clients run:> > lctl dl # list lustre device configuration > lctl --device {failed OST device number} deactivate > > Same on the MDS, though the device will be different compared to clients. > > This will deactivate the failed OST on the clients so they get an IO error > when accessing any files on the OST, instead of waiting for it to recover. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070320/927eb81f/attachment.html
Andreas Dilger
2007-Mar-21 03:45 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
On Mar 21, 2007 10:29 +0800, swin wang wrote:> I know this configuration, but I hope it can automatically bypass the > failed OST, because I can''t make sure when the OST will be failed, mybe > my question is how to avoid blocking rpc on OST, but instead of return > IO error, whenever the OST is failed.If you prefer getting IO errors to your application instead of waiting for recovery, then you can configure the OST with "failout" using the --failout option in 1.4. Lustre clients will never retry failed RPCs in this mode.> 2007/3/21, Andreas Dilger <adilger@clusterfs.com>: > > On the clients run: > > > >lctl dl # list lustre device configuration > >lctl --device {failed OST device number} deactivate > > > >Same on the MDS, though the device will be different compared to clients. > > > >This will deactivate the failed OST on the clients so they get an IO error > >when accessing any files on the OST, instead of waiting for it to recover.Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
swin wang
2007-Mar-21 03:53 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
We current use 1.5.97, we try to set it to failout mode, but it didn''t work int this version, what we want is: when read/write the failed OST, it return IO errors, but still can create and read/write new files, when the failed OST is ok, we can read/write files on the failed OST. I''m not sure if the 1.4.x version with "failout" mode can provide what we want? 2007/3/21, Andreas Dilger <adilger@clusterfs.com>:> > On Mar 21, 2007 10:29 +0800, swin wang wrote: > > I know this configuration, but I hope it can automatically bypass the > > failed OST, because I can''t make sure when the OST will be failed, mybe > > my question is how to avoid blocking rpc on OST, but instead of return > > IO error, whenever the OST is failed. > > If you prefer getting IO errors to your application instead of waiting > for recovery, then you can configure the OST with "failout" using the > --failout option in 1.4. Lustre clients will never retry failed RPCs > in this mode. > > > 2007/3/21, Andreas Dilger <adilger@clusterfs.com>: > > > > On the clients run: > > > > > >lctl dl # list lustre device configuration > > >lctl --device {failed OST device number} deactivate > > > > > >Same on the MDS, though the device will be different compared to > clients. > > > > > >This will deactivate the failed OST on the clients so they get an IO > error > > >when accessing any files on the OST, instead of waiting for it to > recover. > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070321/7f66290a/attachment.html
Nathaniel Rutman
2007-Mar-21 09:40 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
swin wang wrote:> We current use 1.5.97, we try to set it to failout mode, but it didn''t > work > int this version, what we want is: when read/write the failed OST, > it return > IO errors, but still can create and read/write new files, when the > failed OST > is ok, we can read/write files on the failed OST.That''s what failout mode it. How did you try to set it?> I''m not sure if the 1.4.x version with "failout" mode can provide what we > want? > > 2007/3/21, Andreas Dilger < adilger@clusterfs.com > <mailto:adilger@clusterfs.com>>: > > On Mar 21, 2007 10:29 +0800, swin wang wrote: > > I know this configuration, but I hope it can automatically > bypass the > > failed OST, because I can''t make sure when the OST will be > failed, mybe > > my question is how to avoid blocking rpc on OST, but instead of > return > > IO error, whenever the OST is failed. > > If you prefer getting IO errors to your application instead of waiting > for recovery, then you can configure the OST with "failout" using the > --failout option in 1.4. Lustre clients will never retry failed RPCs > in this mode. > > > 2007/3/21, Andreas Dilger <adilger@clusterfs.com > <mailto:adilger@clusterfs.com>>: > > > > On the clients run: > > > > > >lctl dl # list lustre device configuration > > >lctl --device {failed OST device number} deactivate > > > > > >Same on the MDS, though the device will be different compared > to clients. > > > > > >This will deactivate the failed OST on the clients so they get > an IO error > > >when accessing any files on the OST, instead of waiting for it > to recover. > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
swin wang
2007-Mar-21 20:47 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
In our test, we didn''t set the failout mode in mkfs, but set it on the mdt/mgs with lctl: lctl conf_param testfs-OST0001.failover.mode=failout but it seem didn''t work. when OST0001 is failed, the client operation is still blocked (with 1.5.97). 2007/3/22, Nathaniel Rutman < nathan@clusterfs.com>:> > swin wang wrote: > > We current use 1.5.97, we try to set it to failout mode, but it didn''t > > work > > int this version, what we want is: when read/write the failed OST, > > it return > > IO errors, but still can create and read/write new files, when the > > failed OST > > is ok, we can read/write files on the failed OST. > That''s what failout mode it. How did you try to set it? > > > I''m not sure if the 1.4.x version with "failout" mode can provide what > we > > want? > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070321/f498f3e7/attachment.html
swin wang
2007-Mar-22 05:35 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
We are not sure if 1.5.97 really support failout mode, and if it doesn''t, we should consider how to supprot this. We have noticed that if we set the failed OST deative both on client and mdt/mgs, it did what we want. We found the function ptlrpc_import_delay_req() in client.c, if we set the status to -EIO and delay to 0 when imp_state isn''t LUSTRE_IMP_FULL, will it avoid blocking? and we also noticed imp->imp_obd->obd_no_recov, if set it to 1, will it avoid blocking? Or any body hava any good idea to solve this problem? 2007/3/22, swin wang <wangswin@gmail.com>:> > In our test, we didn''t set the failout mode in mkfs, but set it on the > mdt/mgs > with lctl: > lctl conf_param testfs-OST0001.failover.mode=failout > but it seem didn''t work. when OST0001 is failed, the > client operation is still blocked (with 1.5.97). > > 2007/3/22, Nathaniel Rutman < nathan@clusterfs.com>: > > > > swin wang wrote: > > > We current use 1.5.97, we try to set it to failout mode, but it didn''t > > > > > work > > > int this version, what we want is: when read/write the failed OST, > > > it return > > > IO errors, but still can create and read/write new files, when the > > > failed OST > > > is ok, we can read/write files on the failed OST. > > That''s what failout mode it. How did you try to set it? > > > > > I''m not sure if the 1.4.x version with "failout" mode can provide what > > we > > > want? > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070322/32424e69/attachment.html
Nathaniel Rutman
2007-Mar-22 14:55 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Well, your question prompted me to try this out. There are two issues: 1. failout mode cannot be set on a live filesystem, and can''t be set with lctl conf_param. The wiki page has instructions for setting failout mode at mkfs time https://mail.clusterfs.com/wikis/lustre/MountConf You can also set failout mode with tunefs and writeconf: tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda There can be no Lustre servers or clients running when changing the failover mode. 2. failout mode is broken in the 1.6 betas. I have an untested patch in bug 12005 https://bugzilla.lustre.org/show_bug.cgi?id=12005 Using failout mode in the betas without this patch will probably lead to an LBUG on the OST. swin wang wrote:> In our test, we didn''t set the failout mode in mkfs, but set it on the > mdt/mgs > with lctl: > lctl conf_param testfs-OST0001.failover.mode=failout > but it seem didn''t work. when OST0001 is failed, the > client operation is still blocked (with 1.5.97). > > 2007/3/22, Nathaniel Rutman < nathan@clusterfs.com > <mailto:nathan@clusterfs.com>>: > > swin wang wrote: > > We current use 1.5.97, we try to set it to failout mode, but it > didn''t > > work > > int this version, what we want is: when read/write the failed OST, > > it return > > IO errors, but still can create and read/write new files, when the > > failed OST > > is ok, we can read/write files on the failed OST. > That''s what failout mode it. How did you try to set it? > > > I''m not sure if the 1.4.x version with "failout" mode can > provide what we > > want? > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
swin wang
2007-Mar-24 02:55 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
The patch work well. Thanks! 2007/3/23, Nathaniel Rutman <nathan@clusterfs.com>:> > Well, your question prompted me to try this out. > > There are two issues: > 1. failout mode cannot be set on a live filesystem, and can''t be set > with lctl conf_param. > The wiki page has instructions for setting failout mode at mkfs time > https://mail.clusterfs.com/wikis/lustre/MountConf > You can also set failout mode with tunefs and writeconf: > > tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda > > There can be no Lustre servers or clients running when changing the > failover mode. > > 2. failout mode is broken in the 1.6 betas. I have an untested patch in > bug 12005 > https://bugzilla.lustre.org/show_bug.cgi?id=12005 > Using failout mode in the betas without this patch will probably lead to > an LBUG on the OST. > > > swin wang wrote: > > In our test, we didn''t set the failout mode in mkfs, but set it on the > > mdt/mgs > > with lctl: > > lctl conf_param testfs-OST0001.failover.mode=failout > > but it seem didn''t work. when OST0001 is failed, the > > client operation is still blocked (with 1.5.97). > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070324/60646b3f/attachment.html
Herb Wartens
2007-Mar-26 14:37 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Here at LLNL we have developed a little tool called stumpy that also bypasses an OST without blocking. The idea is that we can add in a "stump" OST in place of a damaged OST until we find out and fix the problem on the damaged OST. The "stump" OST would be started in a read-only/deactivated state so that no new objects will be written to the device. This avoids us having to go out to our many thousands of clients and deactivating the damaged OST on each one. The data in the client caches should also be safe with some new Lustre fixes to ensure that when an OST goes read-only the client will hold the data since the state is expected to be transient. stumpy does require changes to the ldiskfs code (to allow mounting the filesystem in read-only mode) as well as Lustre code changes to allow Lustre to start in read-only mode. The stumpy tool takes as input an ost name. It will then create a "stump" OST loopback file with certain settings that Lustre expects. It creates the last_rcvd, health_check, CATALOGS, and LAST_ID files along with a base lustre filesystem. Currently it works for lustre-1.4.8 and all prior lnet versions (since it is reading the lustre xml file). It should not be difficult to port to 1.6 we are just not there yet. -Herb
Andreas Dilger
2007-Mar-27 01:03 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
On Mar 26, 2007 14:37 -0700, Herb Wartens wrote:> Here at LLNL we have developed a little tool called stumpy > that also bypasses an OST without blocking. The idea is that > we can add in a "stump" OST in place of a damaged OST until we find > out and fix the problem on the damaged OST. The "stump" OST would be > started in a read-only/deactivated state so that no new objects will > be written to the device. This avoids us having to go out to our > many thousands of clients and deactivating the damaged OST on each one. > The data in the client caches should also be safe with some new Lustre > fixes to ensure that when an OST goes read-only the client will hold the data > since the state is expected to be transient. > > stumpy does require changes to the ldiskfs code (to allow mounting the > filesystem in read-only mode) as well as Lustre code changes to allow > Lustre to start in read-only mode.Herb, could you perhaps attach the code to this thread and/or file a bug in bugzilla with a patch and report the bug number here? I think the read-only OST mounting code might be welcome for other reasons also. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Herb Wartens
2007-Mar-28 15:41 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Sure... I have opened a new enhancement bug with this patch added. This is bug bz12070. I can also add in the stumpy script if people are interested in that tool. -Herb
Brian Behlendorf
2007-Apr-02 13:54 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Stumpy read-only support patch filed as bug 12070 by Herb. https://bugzilla.lustre.org/show_bug.cgi?id=12070 -- Thanks, Brian> On Mar 26, 2007 14:37 -0700, Herb Wartens wrote: > > Here at LLNL we have developed a little tool called stumpy > > that also bypasses an OST without blocking. The idea is that > > we can add in a "stump" OST in place of a damaged OST until we find > > out and fix the problem on the damaged OST. The "stump" OST would be > > started in a read-only/deactivated state so that no new objects will > > be written to the device. This avoids us having to go out to our > > many thousands of clients and deactivating the damaged OST on each one. > > The data in the client caches should also be safe with some new Lustre > > fixes to ensure that when an OST goes read-only the client will hold the > > data since the state is expected to be transient. > > > > stumpy does require changes to the ldiskfs code (to allow mounting the > > filesystem in read-only mode) as well as Lustre code changes to allow > > Lustre to start in read-only mode. > > Herb, could you perhaps attach the code to this thread and/or file a bug > in bugzilla with a patch and report the bug number here? I think the > read-only OST mounting code might be welcome for other reasons also. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Robert Olson
2008-Jan-15 22:03 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
Hi -- Setting up my system that has no OST failover, so would like to set for failout. Have the issues in the 1.6 betas been worked out in 1.6.4.1? Thanks, --bob On Mar 22, 2007, at 4:55 PM, Nathaniel Rutman wrote:> Well, your question prompted me to try this out. > > There are two issues: > 1. failout mode cannot be set on a live filesystem, and can''t be > set with lctl conf_param. > The wiki page has instructions for setting failout mode at mkfs time > https://mail.clusterfs.com/wikis/lustre/MountConf > You can also set failout mode with tunefs and writeconf: > > tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda > > There can be no Lustre servers or clients running when changing the > failover mode. > > 2. failout mode is broken in the 1.6 betas. I have an untested > patch in bug 12005 > https://bugzilla.lustre.org/show_bug.cgi?id=12005 > Using failout mode in the betas without this patch will probably > lead to an LBUG on the OST. > > > swin wang wrote: >> In our test, we didn''t set the failout mode in mkfs, but set it on >> the mdt/mgs >> with lctl: >> lctl conf_param testfs-OST0001.failover.mode=failout >> but it seem didn''t work. when OST0001 is failed, the >> client operation is still blocked (with 1.5.97). >> >> 2007/3/22, Nathaniel Rutman < nathan at clusterfs.com >> <mailto:nathan at clusterfs.com>>: >> >> swin wang wrote: >> > We current use 1.5.97, we try to set it to failout mode, but it >> didn''t >> > work >> > int this version, what we want is: when read/write the >> failed OST, >> > it return >> > IO errors, but still can create and read/write new files, >> when the >> > failed OST >> > is ok, we can read/write files on the failed OST. >> That''s what failout mode it. How did you try to set it? >> >> > I''m not sure if the 1.4.x version with "failout" mode can >> provide what we >> > want? >> > >> >> >> --------------------------------------------------------------------- >> --- >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Andreas Dilger
2008-Jan-17 07:49 UTC
[Lustre-discuss] How to bypass failed OST without blocking?
On Jan 15, 2008 16:03 -0600, Robert Olson wrote:> Setting up my system that has no OST failover, so would like to set > for failout. Have the issues in the 1.6 betas been worked out in > 1.6.4.1?Very little testing is done on failout mode, because even with a single OSS node the common behaviour is to just reboot the node and continue using the OSTs thereon. You can set "lctl -w lnet.panic_on_lbug=1" and "lctl -w kernel.panic_on_oops" and the node will reboot if a bug is hit in Lustre or the kernel. While not 100% covering (it won''t reboot on a deadlock, for example) it is fairly useful.> On Mar 22, 2007, at 4:55 PM, Nathaniel Rutman wrote: > > > Well, your question prompted me to try this out. > > > > There are two issues: > > 1. failout mode cannot be set on a live filesystem, and can''t be > > set with lctl conf_param. > > The wiki page has instructions for setting failout mode at mkfs time > > https://mail.clusterfs.com/wikis/lustre/MountConf > > You can also set failout mode with tunefs and writeconf: > > > > tunefs.lustre --writeconf --param="failover.mode=failout" /dev/sda > > > > There can be no Lustre servers or clients running when changing the > > failover mode. > > > > 2. failout mode is broken in the 1.6 betas. I have an untested > > patch in bug 12005 > > https://bugzilla.lustre.org/show_bug.cgi?id=12005 > > Using failout mode in the betas without this patch will probably > > lead to an LBUG on the OST. > > > > > > swin wang wrote: > >> In our test, we didn''t set the failout mode in mkfs, but set it on > >> the mdt/mgs > >> with lctl: > >> lctl conf_param testfs-OST0001.failover.mode=failout > >> but it seem didn''t work. when OST0001 is failed, the > >> client operation is still blocked (with 1.5.97). > >> > >> 2007/3/22, Nathaniel Rutman < nathan at clusterfs.com > >> <mailto:nathan at clusterfs.com>>: > >> > >> swin wang wrote: > >> > We current use 1.5.97, we try to set it to failout mode, but it > >> didn''t > >> > work > >> > int this version, what we want is: when read/write the > >> failed OST, > >> > it return > >> > IO errors, but still can create and read/write new files, > >> when the > >> > failed OST > >> > is ok, we can read/write files on the failed OST. > >> That''s what failout mode it. How did you try to set it? > >> > >> > I''m not sure if the 1.4.x version with "failout" mode can > >> provide what we > >> > want? > >> > > >> > >> > >> --------------------------------------------------------------------- > >> --- > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at clusterfs.com > >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >> > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.