This past weekend, but holiday was ruined due to a log device "replacement" gone awry. I posted all about it here: http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html In a nutshell, an resilver of a single log device with itself, due to the fact one can''t remove a log device from a pool once defined, cause ZFS to fully resilver but then attach the log device as as stripe to the volume, and no longer as a log device. The subsequent pool failure was exceptionally bad as the volume could no longer be imported and required read-only mounting of the remaining filesystems that I could to recover data. It would appear that log resilvers are broken, at least up to B85. I haven''t seen code changes in this space so I presume this is likely an unaddressed problem.
Yeah, I noticed this the other day while I was working on an unrelated problem. The basic problem is that log devices are kept within the normal vdev tree, and are only distinguished by a bit indicating that they are log devices (and is the source for a number of other inconsistencies that Pwel has encountered). When doing a replacement, the userland code is responsible for creating the vdev configuration to use for the newly attached vdev. In this case, it doesn''t preserve the ''is_log'' bit correctly. This should be enforced in the kernel - it doesn''t make sense to replace a log device with a non-log device, ever. I have a workspace with some other random ZFS changes, so I''ll try to include this as well. FWIW, removing log devices is significantly easier than removing arbitrary devices, since there is no data to migrate (after the current txg is synced). At one point there were plans to do this as a separate piece of work (since the vdev changes are needed for the general case anyway), but I don''t know whether this is still the case. - Eric On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote:> This past weekend, but holiday was ruined due to a log device > "replacement" gone awry. > > I posted all about it here: > > http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html > > In a nutshell, an resilver of a single log device with itself, due to > the fact one can''t remove a log device from a pool once defined, cause > ZFS to fully resilver but then attach the log device as as stripe to > the volume, and no longer as a log device. The subsequent pool failure > was exceptionally bad as the volume could no longer be imported and > required read-only mounting of the remaining filesystems that I could > to recover data. It would appear that log resilvers are broken, at > least up to B85. I haven''t seen code changes in this space so I > presume this is likely an unaddressed problem. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On Tue, May 27, 2008 at 1:50 PM, Eric Schrock <eric.schrock at sun.com> wrote:> Yeah, I noticed this the other day while I was working on an unrelated > problem. The basic problem is that log devices are kept within the > normal vdev tree, and are only distinguished by a bit indicating that > they are log devices (and is the source for a number of other > inconsistencies that Pwel has encountered). > > When doing a replacement, the userland code is responsible for creating > the vdev configuration to use for the newly attached vdev. In this > case, it doesn''t preserve the ''is_log'' bit correctly. This should be > enforced in the kernel - it doesn''t make sense to replace a log device > with a non-log device, ever. > > I have a workspace with some other random ZFS changes, so I''ll try to > include this as well. > > FWIW, removing log devices is significantly easier than removing > arbitrary devices, since there is no data to migrate (after the current > txg is synced). At one point there were plans to do this as a separate > piece of work (since the vdev changes are needed for the general case > anyway), but I don''t know whether this is still the case. >Thanks for the reply. As noted, I do recommend against the log device as you can''t remove it and the replacement as you see is touchy at best. I know the larger, but general vdev evacuation is ongoing, but if it is simple, log evacuation would make logs useful now instead of waiting.> - Eric > > On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: >> This past weekend, but holiday was ruined due to a log device >> "replacement" gone awry. >> >> I posted all about it here: >> >> http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html >> >> In a nutshell, an resilver of a single log device with itself, due to >> the fact one can''t remove a log device from a pool once defined, cause >> ZFS to fully resilver but then attach the log device as as stripe to >> the volume, and no longer as a log device. The subsequent pool failure >> was exceptionally bad as the volume could no longer be imported and >> required read-only mounting of the remaining filesystems that I could >> to recover data. It would appear that log resilvers are broken, at >> least up to B85. I haven''t seen code changes in this space so I >> presume this is likely an unaddressed problem. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock >
Joe - We definitely don''t do great accounting of the ''vdev_islog'' state here, and it''s possible to create a situation where the parent replacing vdev has the state set but the children do not, but I have been unable to reproduce the behavior you saw. I have rebooted the system during resilver, manually detached the replacing vdev, and a variety of other things, but I''ve never seen the behavior you describe. In all cases, the log state is kept with the replacing vdev and restored when the resilver completes. I have also not observed the resilver failing with a bad log device. Can you provide more information about how to reproduce this problem? Perhaps without rebooting into B70 in the middle? Thanks, - Eric On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote:> Yeah, I noticed this the other day while I was working on an unrelated > problem. The basic problem is that log devices are kept within the > normal vdev tree, and are only distinguished by a bit indicating that > they are log devices (and is the source for a number of other > inconsistencies that Pwel has encountered). > > When doing a replacement, the userland code is responsible for creating > the vdev configuration to use for the newly attached vdev. In this > case, it doesn''t preserve the ''is_log'' bit correctly. This should be > enforced in the kernel - it doesn''t make sense to replace a log device > with a non-log device, ever. > > I have a workspace with some other random ZFS changes, so I''ll try to > include this as well. > > FWIW, removing log devices is significantly easier than removing > arbitrary devices, since there is no data to migrate (after the current > txg is synced). At one point there were plans to do this as a separate > piece of work (since the vdev changes are needed for the general case > anyway), but I don''t know whether this is still the case. > > - Eric > > On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: > > This past weekend, but holiday was ruined due to a log device > > "replacement" gone awry. > > > > I posted all about it here: > > > > http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html > > > > In a nutshell, an resilver of a single log device with itself, due to > > the fact one can''t remove a log device from a pool once defined, cause > > ZFS to fully resilver but then attach the log device as as stripe to > > the volume, and no longer as a log device. The subsequent pool failure > > was exceptionally bad as the volume could no longer be imported and > > required read-only mounting of the remaining filesystems that I could > > to recover data. It would appear that log resilvers are broken, at > > least up to B85. I haven''t seen code changes in this space so I > > presume this is likely an unaddressed problem. > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On Tue, May 27, 2008 at 4:50 PM, Eric Schrock <eric.schrock at sun.com> wrote:> Joe - > > We definitely don''t do great accounting of the ''vdev_islog'' state here, > and it''s possible to create a situation where the parent replacing vdev > has the state set but the children do not, but I have been unable to > reproduce the behavior you saw. I have rebooted the system during > resilver, manually detached the replacing vdev, and a variety of other > things, but I''ve never seen the behavior you describe. In all cases, > the log state is kept with the replacing vdev and restored when the > resilver completes. I have also not observed the resilver failing with > a bad log device. > > Can you provide more information about how to reproduce this problem? > Perhaps without rebooting into B70 in the middle? >Well, this happened live on a production system, and I''m still in the process of rebuilding said system (trying to save all the snapshots) I don''t know what triggered it. It was trying to resilver in B85, rebooted into B70 where it did resilver (but it was now using cmdk device naming vs the full scsi device names). It was marked "degraded" still even though re-silvering finished. Since the resilver took so long, I suspect the splicing in of the device took place in the B70. Again, it would never work in B85 -- just kept resetting. I''m wondering if the device path changing from cxtxdx to cxdx could be the trigger point.> Thanks, > > - Eric > > On Tue, May 27, 2008 at 01:50:04PM -0700, Eric Schrock wrote: >> Yeah, I noticed this the other day while I was working on an unrelated >> problem. The basic problem is that log devices are kept within the >> normal vdev tree, and are only distinguished by a bit indicating that >> they are log devices (and is the source for a number of other >> inconsistencies that Pwel has encountered). >> >> When doing a replacement, the userland code is responsible for creating >> the vdev configuration to use for the newly attached vdev. In this >> case, it doesn''t preserve the ''is_log'' bit correctly. This should be >> enforced in the kernel - it doesn''t make sense to replace a log device >> with a non-log device, ever. >> >> I have a workspace with some other random ZFS changes, so I''ll try to >> include this as well. >> >> FWIW, removing log devices is significantly easier than removing >> arbitrary devices, since there is no data to migrate (after the current >> txg is synced). At one point there were plans to do this as a separate >> piece of work (since the vdev changes are needed for the general case >> anyway), but I don''t know whether this is still the case. >> >> - Eric >> >> On Tue, May 27, 2008 at 01:13:47PM -0700, Joe Little wrote: >> > This past weekend, but holiday was ruined due to a log device >> > "replacement" gone awry. >> > >> > I posted all about it here: >> > >> > http://jmlittle.blogspot.com/2008/05/problem-with-slogs-how-i-lost.html >> > >> > In a nutshell, an resilver of a single log device with itself, due to >> > the fact one can''t remove a log device from a pool once defined, cause >> > ZFS to fully resilver but then attach the log device as as stripe to >> > the volume, and no longer as a log device. The subsequent pool failure >> > was exceptionally bad as the volume could no longer be imported and >> > required read-only mounting of the remaining filesystems that I could >> > to recover data. It would appear that log resilvers are broken, at >> > least up to B85. I haven''t seen code changes in this space so I >> > presume this is likely an unaddressed problem. >> > _______________________________________________ >> > zfs-discuss mailing list >> > zfs-discuss at opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> -- >> Eric Schrock, Fishworks http://blogs.sun.com/eschrock >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock >
Joe Little wrote:> On Tue, May 27, 2008 at 4:50 PM, Eric Schrock <eric.schrock at sun.com> wrote: >> Joe - >> >> We definitely don''t do great accounting of the ''vdev_islog'' state here, >> and it''s possible to create a situation where the parent replacing vdev >> has the state set but the children do not, but I have been unable to >> reproduce the behavior you saw. I have rebooted the system during >> resilver, manually detached the replacing vdev, and a variety of other >> things, but I''ve never seen the behavior you describe. In all cases, >> the log state is kept with the replacing vdev and restored when the >> resilver completes. I have also not observed the resilver failing with >> a bad log device. >> >> Can you provide more information about how to reproduce this problem? >> Perhaps without rebooting into B70 in the middle? >> > > Well, this happened live on a production system, and I''m still in the > process of rebuilding said system (trying to save all the snapshots) > > I don''t know what triggered it. It was trying to resilver in B85, > rebooted into B70 where it did resilver (but it was now using cmdk > device naming vs the full scsi device names). It was marked "degraded" > still even though re-silvering finished. Since the resilver took so > long, I suspect the splicing in of the device took place in the B70. > Again, it would never work in B85 -- just kept resetting. I''m > wondering if the device path changing from cxtxdx to cxdx could be the > trigger point.Joe, We''re sorry about your problems. My take on how this is best handled, is that it be be better to expedite (raise priority) fixing the bug 6574286 removing a slog doesn''t work rather than expend too much effort in understanding how it failed on your system. You would not have had this problem if you were able to remove a log device. Is that reasonable? Neil.
On Tue, May 27, 2008 at 5:04 PM, Neil Perrin <Neil.Perrin at sun.com> wrote:> Joe Little wrote: >> >> On Tue, May 27, 2008 at 4:50 PM, Eric Schrock <eric.schrock at sun.com> >> wrote: >>> >>> Joe - >>> >>> We definitely don''t do great accounting of the ''vdev_islog'' state here, >>> and it''s possible to create a situation where the parent replacing vdev >>> has the state set but the children do not, but I have been unable to >>> reproduce the behavior you saw. I have rebooted the system during >>> resilver, manually detached the replacing vdev, and a variety of other >>> things, but I''ve never seen the behavior you describe. In all cases, >>> the log state is kept with the replacing vdev and restored when the >>> resilver completes. I have also not observed the resilver failing with >>> a bad log device. >>> >>> Can you provide more information about how to reproduce this problem? >>> Perhaps without rebooting into B70 in the middle? >>> >> >> Well, this happened live on a production system, and I''m still in the >> process of rebuilding said system (trying to save all the snapshots) >> >> I don''t know what triggered it. It was trying to resilver in B85, >> rebooted into B70 where it did resilver (but it was now using cmdk >> device naming vs the full scsi device names). It was marked "degraded" >> still even though re-silvering finished. Since the resilver took so >> long, I suspect the splicing in of the device took place in the B70. >> Again, it would never work in B85 -- just kept resetting. I''m >> wondering if the device path changing from cxtxdx to cxdx could be the >> trigger point. > > Joe, > > We''re sorry about your problems. My take on how this is best handled, > is that it be be better to expedite (raise priority) fixing the bug > > 6574286 removing a slog doesn''t work > > rather than expend too much effort in understanding how it > failed on your system. You would not have had this problem > if you were able to remove a log device. Is that reasonable? >yep. I only tried to replace it to keep from getting "alarms" -- faults from the degraded pool. If the slog can be easily added/removed, it makes it a rather safe investment.> Neil. >