Andreas Dilger
2008-Jan-17 18:35 UTC
[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
Attention to all Lustre users. There was a serious problem discovered with only the 1.6.4.1 release which could lead to major data loss on relatively new Lustre filesystems in certain situations. The 1.6.4.2 release is being prepared that will fix the problem, and workarounds are available for existing 1.6.4.1 users, but in the meantime customers should be aware of the problem and take measures to avoid the problem (described at the end of the email). The problem is described in bug 14631, and while there are no known cases that this has impacted a production environment, the consequences can be severe and all users should take note. The bug can cause objects on newly formatted OSTs to be deleted if the following conditions are true: OST has had fewer than 20000 objects created on it ever ------------------------------------------------------- This can be seen on each OST via "cat /proc/fs/lustre/obdfilter/*/last_id" which reports the highest object ID ever created on that OST. If this number is greater than 20000 that OST is not at risk of data loss. The OST must be in recovery at the time the MDT is first mounted ---------------------------------------------------------------- This would happen if the OSS node crashed, or if the OST filesystem is unmounted while the MDT or a client is still connected. Unmounting all clients and MDT before the OST is always the correct process and will avoid this problem, but it is also possible to force unmount the OST with "umount -f /mnt/ost*" (or path as appropriate) to evict all connections and avoid the problem. If the OST is in recovery at mount time then it can be mounted before the MDT and "lct --device {OST device number} abort_recovery" used to abort recovery before the MDT is mounted. Alternately, the OST will only wait a specific time for recovery (4:10 by default, actual value printed in dmesg) and this can be allowed to expire before mounting the MDT to avoid the problem. The MDT is not in recovery when it connects to the OST(s) --------------------------------------------------------- If the MDT is not in recovery at mount time (i.e. it was shut down cleanly), but the OST is in recovery then the MDT will try and get information from the OST on existing objects, but fail. Later in the startup process the MDT would incorrectly signal the OST to delete all unused objects. If the MDT is in recovery at startup, then the MDT recovery period will expire after the OST recovery and the problem will not be triggered. If the OSTs are mounted and are not in recovery when the MDT mounts then the problem will also not be triggered. To avoid triggering the problem: -------------------------------- - unmount the clients and MDT before the OST. When unmounting the OST use "umount -f /mnt/ost*" to force disconnect all clients. - mount the OSTs before the MDT, and wait for the recovery to timeout (or cancel it, as above) before mounting the MDT - create at least 20000 objects on each OST. Specific OSTs can be targetted via "lfs setstripe -i {OST index} /path/to/lustre/file". These objects do not need to remain on the OST, there just have to have been that many objects created on the OST ever, to activate a sanity check when the 1.6.4.1 MDT connects to the OST. - upgrade to lustre 1.6.4.2 when available Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Harald van Pee
2008-Jan-17 19:21 UTC
[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
Hi, this are no good news! Just to be sure what does ''relatively new Lustre filesystems'' or ''newly formatted OSTs'' mean? Is an updated filesystem (from v1.6.2) which are not newly formated, but still have less than 20000 objects created on it ever also effected by this bug? Or only filesystems first used with 1.6.4.1? Harald On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote:> Attention to all Lustre users. > > There was a serious problem discovered with only the 1.6.4.1 release > which could lead to major data loss on relatively new Lustre filesystems > in certain situations. The 1.6.4.2 release is being prepared that will > fix the problem, and workarounds are available for existing 1.6.4.1 users, > but in the meantime customers should be aware of the problem and take > measures to avoid the problem (described at the end of the email). > > The problem is described in bug 14631, and while there are no known cases > that this has impacted a production environment, the consequences can be > severe and all users should take note. The bug can cause objects on newly > formatted OSTs to be deleted if the following conditions are true: > > OST has had fewer than 20000 objects created on it ever > ------------------------------------------------------- > This can be seen on each OST via "cat /proc/fs/lustre/obdfilter/*/last_id" > which reports the highest object ID ever created on that OST. If this > number is greater than 20000 that OST is not at risk of data loss. > > The OST must be in recovery at the time the MDT is first mounted > ---------------------------------------------------------------- > This would happen if the OSS node crashed, or if the OST filesystem is > unmounted while the MDT or a client is still connected. Unmounting all > clients and MDT before the OST is always the correct process and will > avoid this problem, but it is also possible to force unmount the OST > with "umount -f /mnt/ost*" (or path as appropriate) to evict all > connections and avoid the problem. > > If the OST is in recovery at mount time then it can be mounted before the > MDT and "lct --device {OST device number} abort_recovery" used to abort > recovery before the MDT is mounted. Alternately, the OST will only wait > a specific time for recovery (4:10 by default, actual value printed in > dmesg) and this can be allowed to expire before mounting the MDT to avoid > the problem. > > The MDT is not in recovery when it connects to the OST(s) > --------------------------------------------------------- > If the MDT is not in recovery at mount time (i.e. it was shut down > cleanly), but the OST is in recovery then the MDT will try and get > information from the OST on existing objects, but fail. Later in > the startup process the MDT would incorrectly signal the OST to delete > all unused objects. If the MDT is in recovery at startup, then the > MDT recovery period will expire after the OST recovery and the problem > will not be triggered. If the OSTs are mounted and are not in recovery > when the MDT mounts then the problem will also not be triggered. > > > To avoid triggering the problem: > -------------------------------- > - unmount the clients and MDT before the OST. When unmounting > the OST use "umount -f /mnt/ost*" to force disconnect all clients. > - mount the OSTs before the MDT, and wait for the recovery to timeout > (or cancel it, as above) before mounting the MDT > - create at least 20000 objects on each OST. Specific OSTs can be > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file". > These objects do not need to remain on the OST, there just have to have > been that many objects created on the OST ever, to activate a sanity > check when the 1.6.4.1 MDT connects to the OST. > - upgrade to lustre 1.6.4.2 when available > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-- Harald van Pee Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
Andreas Dilger
2008-Jan-17 20:31 UTC
[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
On Jan 17, 2008 20:21 +0100, Harald van Pee wrote:> this are no good news!Definitely not, but it is hoped that by releasing a notification of this issue any problems with existing systems can be avoided.> Just to be sure what does ''relatively new Lustre filesystems'' > or ''newly formatted OSTs'' mean?This means "any OSTs with < 20000 objects ever created", no matter how old they actually are.> Is an updated filesystem (from v1.6.2) which are not newly formated, but > still have less than 20000 objects created on it ever > also effected by this bug? > Or only filesystems first used with 1.6.4.1?It doesn''t matter what versions were previously used, the problem exists only while a 1.6.4.1 MDS is in use, due to a defect added while removing another far less common problem.> On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote: > > Attention to all Lustre users. > > > > There was a serious problem discovered with only the 1.6.4.1 release > > which could lead to major data loss on relatively new Lustre filesystems > > in certain situations. The 1.6.4.2 release is being prepared that will > > fix the problem, and workarounds are available for existing 1.6.4.1 users, > > but in the meantime customers should be aware of the problem and take > > measures to avoid the problem (described at the end of the email). > > > > The problem is described in bug 14631, and while there are no known cases > > that this has impacted a production environment, the consequences can be > > severe and all users should take note. The bug can cause objects on newly > > formatted OSTs to be deleted if the following conditions are true: > > > > OST has had fewer than 20000 objects created on it ever > > ------------------------------------------------------- > > This can be seen on each OST via "cat /proc/fs/lustre/obdfilter/*/last_id" > > which reports the highest object ID ever created on that OST. If this > > number is greater than 20000 that OST is not at risk of data loss. > > > > The OST must be in recovery at the time the MDT is first mounted > > ---------------------------------------------------------------- > > This would happen if the OSS node crashed, or if the OST filesystem is > > unmounted while the MDT or a client is still connected. Unmounting all > > clients and MDT before the OST is always the correct process and will > > avoid this problem, but it is also possible to force unmount the OST > > with "umount -f /mnt/ost*" (or path as appropriate) to evict all > > connections and avoid the problem. > > > > If the OST is in recovery at mount time then it can be mounted before the > > MDT and "lct --device {OST device number} abort_recovery" used to abort > > recovery before the MDT is mounted. Alternately, the OST will only wait > > a specific time for recovery (4:10 by default, actual value printed in > > dmesg) and this can be allowed to expire before mounting the MDT to avoid > > the problem. > > > > The MDT is not in recovery when it connects to the OST(s) > > --------------------------------------------------------- > > If the MDT is not in recovery at mount time (i.e. it was shut down > > cleanly), but the OST is in recovery then the MDT will try and get > > information from the OST on existing objects, but fail. Later in > > the startup process the MDT would incorrectly signal the OST to delete > > all unused objects. If the MDT is in recovery at startup, then the > > MDT recovery period will expire after the OST recovery and the problem > > will not be triggered. If the OSTs are mounted and are not in recovery > > when the MDT mounts then the problem will also not be triggered. > > > > > > To avoid triggering the problem: > > -------------------------------- > > - unmount the clients and MDT before the OST. When unmounting > > the OST use "umount -f /mnt/ost*" to force disconnect all clients. > > - mount the OSTs before the MDT, and wait for the recovery to timeout > > (or cancel it, as above) before mounting the MDT > > - create at least 20000 objects on each OST. Specific OSTs can be > > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file". > > These objects do not need to remain on the OST, there just have to have > > been that many objects created on the OST ever, to activate a sanity > > check when the 1.6.4.1 MDT connects to the OST. > > - upgrade to lustre 1.6.4.2 when available > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > -- > Harald van Pee > > Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Lundgren, Andrew
2008-Jan-17 22:05 UTC
[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
We are getting ready to deploy a brand new cluster. Any time from on 1.6.4.2? -- Andrew> -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of > Andreas Dilger > Sent: Thursday, January 17, 2008 1:31 PM > To: Harald van Pee > Cc: Lustre User Discussion Mailing List > Subject: Re: [Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug > > On Jan 17, 2008 20:21 +0100, Harald van Pee wrote: > > this are no good news! > > Definitely not, but it is hoped that by releasing a > notification of this issue any problems with existing systems > can be avoided. > > > Just to be sure what does ''relatively new Lustre filesystems'' > > or ''newly formatted OSTs'' mean? > > This means "any OSTs with < 20000 objects ever created", no > matter how old they actually are. > > > Is an updated filesystem (from v1.6.2) which are not newly > formated, > > but still have less than 20000 objects created on it ever also > > effected by this bug? > > Or only filesystems first used with 1.6.4.1? > > It doesn''t matter what versions were previously used, the > problem exists only while a 1.6.4.1 MDS is in use, due to a > defect added while removing another far less common problem. > > > On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote: > > > Attention to all Lustre users. > > > > > > There was a serious problem discovered with only the > 1.6.4.1 release > > > which could lead to major data loss on relatively new Lustre > > > filesystems in certain situations. The 1.6.4.2 release is being > > > prepared that will fix the problem, and workarounds are available > > > for existing 1.6.4.1 users, but in the meantime customers > should be > > > aware of the problem and take measures to avoid the > problem (described at the end of the email). > > > > > > The problem is described in bug 14631, and while there > are no known > > > cases that this has impacted a production environment, the > > > consequences can be severe and all users should take > note. The bug > > > can cause objects on newly formatted OSTs to be deleted > if the following conditions are true: > > > > > > OST has had fewer than 20000 objects created on it ever > > > ------------------------------------------------------- > > > This can be seen on each OST via "cat > /proc/fs/lustre/obdfilter/*/last_id" > > > which reports the highest object ID ever created on that OST. If > > > this number is greater than 20000 that OST is not at risk > of data loss. > > > > > > The OST must be in recovery at the time the MDT is first mounted > > > ---------------------------------------------------------------- > > > This would happen if the OSS node crashed, or if the OST > filesystem > > > is unmounted while the MDT or a client is still connected. > > > Unmounting all clients and MDT before the OST is always > the correct > > > process and will avoid this problem, but it is also possible to > > > force unmount the OST with "umount -f /mnt/ost*" (or path as > > > appropriate) to evict all connections and avoid the problem. > > > > > > If the OST is in recovery at mount time then it can be mounted > > > before the MDT and "lct --device {OST device number} > abort_recovery" > > > used to abort recovery before the MDT is mounted. > Alternately, the > > > OST will only wait a specific time for recovery (4:10 by default, > > > actual value printed in > > > dmesg) and this can be allowed to expire before mounting > the MDT to > > > avoid the problem. > > > > > > The MDT is not in recovery when it connects to the OST(s) > > > --------------------------------------------------------- > > > If the MDT is not in recovery at mount time (i.e. it was > shut down > > > cleanly), but the OST is in recovery then the MDT will > try and get > > > information from the OST on existing objects, but fail. Later in > > > the startup process the MDT would incorrectly signal the OST to > > > delete all unused objects. If the MDT is in recovery at startup, > > > then the MDT recovery period will expire after the OST > recovery and > > > the problem will not be triggered. If the OSTs are > mounted and are > > > not in recovery when the MDT mounts then the problem will > also not be triggered. > > > > > > > > > To avoid triggering the problem: > > > -------------------------------- > > > - unmount the clients and MDT before the OST. When > unmounting the > > > OST use "umount -f /mnt/ost*" to force disconnect all clients. > > > - mount the OSTs before the MDT, and wait for the recovery to > > > timeout (or cancel it, as above) before mounting the MDT > > > - create at least 20000 objects on each OST. Specific > OSTs can be > > > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file". > > > These objects do not need to remain on the OST, there > just have to > > > have been that many objects created on the OST ever, to > activate a > > > sanity check when the 1.6.4.1 MDT connects to the OST. > > > - upgrade to lustre 1.6.4.2 when available > > > > > > Cheers, Andreas > > > -- > > > Andreas Dilger > > > Sr. Staff Engineer, Lustre Group > > > Sun Microsystems of Canada, Inc. > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > -- > > Harald van Pee > > > > Helmholtz-Institut fuer Strahlen- und Kernphysik der > Universitaet Bonn > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Andreas Dilger
2008-Jan-17 22:29 UTC
[Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug
On Jan 17, 2008 15:05 -0700, Lundgren, Andrew wrote:> We are getting ready to deploy a brand new cluster. Any time from on 1.6.4.2?It has been built and is undergoing QA testing now. We hope to have it ready for Monday, but I can''t promise that.> > -----Original Message----- > > From: lustre-discuss-bounces at clusterfs.com > > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of > > Andreas Dilger > > Sent: Thursday, January 17, 2008 1:31 PM > > To: Harald van Pee > > Cc: Lustre User Discussion Mailing List > > Subject: Re: [Lustre-discuss] [URGENT] Lustre 1.6.4.1 data loss bug > > > > On Jan 17, 2008 20:21 +0100, Harald van Pee wrote: > > > this are no good news! > > > > Definitely not, but it is hoped that by releasing a > > notification of this issue any problems with existing systems > > can be avoided. > > > > > Just to be sure what does ''relatively new Lustre filesystems'' > > > or ''newly formatted OSTs'' mean? > > > > This means "any OSTs with < 20000 objects ever created", no > > matter how old they actually are. > > > > > Is an updated filesystem (from v1.6.2) which are not newly > > formated, > > > but still have less than 20000 objects created on it ever also > > > effected by this bug? > > > Or only filesystems first used with 1.6.4.1? > > > > It doesn''t matter what versions were previously used, the > > problem exists only while a 1.6.4.1 MDS is in use, due to a > > defect added while removing another far less common problem. > > > > > On Thursday 17 January 2008 07:35 pm, Andreas Dilger wrote: > > > > Attention to all Lustre users. > > > > > > > > There was a serious problem discovered with only the > > 1.6.4.1 release > > > > which could lead to major data loss on relatively new Lustre > > > > filesystems in certain situations. The 1.6.4.2 release is being > > > > prepared that will fix the problem, and workarounds are available > > > > for existing 1.6.4.1 users, but in the meantime customers > > should be > > > > aware of the problem and take measures to avoid the > > problem (described at the end of the email). > > > > > > > > The problem is described in bug 14631, and while there > > are no known > > > > cases that this has impacted a production environment, the > > > > consequences can be severe and all users should take > > note. The bug > > > > can cause objects on newly formatted OSTs to be deleted > > if the following conditions are true: > > > > > > > > OST has had fewer than 20000 objects created on it ever > > > > ------------------------------------------------------- > > > > This can be seen on each OST via "cat > > /proc/fs/lustre/obdfilter/*/last_id" > > > > which reports the highest object ID ever created on that OST. If > > > > this number is greater than 20000 that OST is not at risk > > of data loss. > > > > > > > > The OST must be in recovery at the time the MDT is first mounted > > > > ---------------------------------------------------------------- > > > > This would happen if the OSS node crashed, or if the OST > > filesystem > > > > is unmounted while the MDT or a client is still connected. > > > > Unmounting all clients and MDT before the OST is always > > the correct > > > > process and will avoid this problem, but it is also possible to > > > > force unmount the OST with "umount -f /mnt/ost*" (or path as > > > > appropriate) to evict all connections and avoid the problem. > > > > > > > > If the OST is in recovery at mount time then it can be mounted > > > > before the MDT and "lct --device {OST device number} > > abort_recovery" > > > > used to abort recovery before the MDT is mounted. > > Alternately, the > > > > OST will only wait a specific time for recovery (4:10 by default, > > > > actual value printed in > > > > dmesg) and this can be allowed to expire before mounting > > the MDT to > > > > avoid the problem. > > > > > > > > The MDT is not in recovery when it connects to the OST(s) > > > > --------------------------------------------------------- > > > > If the MDT is not in recovery at mount time (i.e. it was > > shut down > > > > cleanly), but the OST is in recovery then the MDT will > > try and get > > > > information from the OST on existing objects, but fail. Later in > > > > the startup process the MDT would incorrectly signal the OST to > > > > delete all unused objects. If the MDT is in recovery at startup, > > > > then the MDT recovery period will expire after the OST > > recovery and > > > > the problem will not be triggered. If the OSTs are > > mounted and are > > > > not in recovery when the MDT mounts then the problem will > > also not be triggered. > > > > > > > > > > > > To avoid triggering the problem: > > > > -------------------------------- > > > > - unmount the clients and MDT before the OST. When > > unmounting the > > > > OST use "umount -f /mnt/ost*" to force disconnect all clients. > > > > - mount the OSTs before the MDT, and wait for the recovery to > > > > timeout (or cancel it, as above) before mounting the MDT > > > > - create at least 20000 objects on each OST. Specific > > OSTs can be > > > > targetted via "lfs setstripe -i {OST index} /path/to/lustre/file". > > > > These objects do not need to remain on the OST, there > > just have to > > > > have been that many objects created on the OST ever, to > > activate a > > > > sanity check when the 1.6.4.1 MDT connects to the OST. > > > > - upgrade to lustre 1.6.4.2 when available > > > > > > > > Cheers, Andreas > > > > -- > > > > Andreas Dilger > > > > Sr. Staff Engineer, Lustre Group > > > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss at clusterfs.com > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > -- > > > Harald van Pee > > > > > > Helmholtz-Institut fuer Strahlen- und Kernphysik der > > Universitaet Bonn > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.