This is a re-post of this issue ... I didn''t get any replies to the previous post of 12/27 ... I''m hoping someone is back from holiday who may have some insight into this problem ... Bill when I remove a separate zil disk from a pool, the pool continues to function, logging synchronous writes to the disks in the pool. Status shows that the log disk has been removed, and everything seems to work fine until I export the pool. After the pool has been exported (long after the log disk was removed and gigabytes of synchronous writes were performed successfully), I am no longer able to import the pool. I get an error stating that a pool device cannot be found, and importing the pool cannot succeed until the missing device (the separate zil log disk) is replaced in the system. There is a bug filed by Neil Perrin: 6574286 "removing a slog doesn''t work" regading the problem of not being able to remove a separate zil device from a pool, but no detail on the ramifications of just taking the device out of the JBOD. Taking it out does not impact the immediate function of the pool, but the inability to re-import it after this event is a significant issue. Has anyone found a workaround for this problem ? I have data in a pool that I cannot import because the separate zil is no longer available to me. This message posted from opensolaris.org
Bill Moloney wrote:> Taking it out does not impact the immediate function of the pool, > but the inability to re-import it after this event is a significant issue. Has > anyone found a workaround for this problem ? I have data in a pool that > I cannot import because the separate zil is no longer available to me. >Just a guess here. The disk the ZIL was on is no longer available, but do you have another disk available? I would think a Zpool replace mioght help you rpelace the missing disk with some other disk.... But maybe not, if you can''t import it to begin with???? -Kyle> > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Perhaps this is being tracked as 6538021? http://bugs.opensolaris.org/view_bug.do?bug_id=6538021 -- richard Bill Moloney wrote:> This is a re-post of this issue ... I didn''t get any replies to the previous > post of 12/27 ... I''m hoping someone is back from holiday > who may have some insight into this problem ... Bill > > when I remove a separate zil disk from a pool, the pool continues to function, > logging synchronous writes to the disks in the pool. Status shows that the log > disk has been removed, and everything seems to work fine until I export the > pool. > > After the pool has been exported (long after the log disk was removed > and gigabytes of synchronous writes were performed successfully), > I am no longer able to > import the pool. I get an error stating that a pool device cannot be found, > and importing the pool cannot succeed until the missing device (the separate > zil log disk) is replaced in the system. > > There is a bug filed by Neil Perrin: > 6574286 "removing a slog doesn''t work" > regading the problem of not being able to remove a separate zil device from > a pool, but no detail on the ramifications of just taking the device out of > the JBOD. > > Taking it out does not impact the immediate function of the pool, > but the inability to re-import it after this event is a significant issue. Has > anyone found a workaround for this problem ? I have data in a pool that > I cannot import because the separate zil is no longer available to me. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
The problem is that the ZIL device is treated just like another toplevel vdev. As part of the import process, we find all vdevs and assemble the config, and verify that the sum of all vdev GUIDs match the expected sum. Now, each vdev only stores enough configuration to keep track of the toplevel vdev its a part of, relying on the zpool.cache file or import logic to assemble the complete pool topology. When you go to import the pool and it can''t find the log device, it doesn''t know about that toplevel vdev at all, notices that the vdev GUID sum doesn''t match, and complains in a generic way about "there must be something out there that I don''t know about". Keeping a fully connected graph of toplevel vdevs is expensive and error prone, but there is an open RFE for "neighbor lists" that would allow discovery of other vdevs, even if entire toplevel vdevs are missing. But there would be situations where you could construct pathological failure modes that would have the same result. A better solution would be making ZFS survive toplevel vdev failure better than it does today. In the world of ditto blocks, it should be possible to import a pool that is missing toplevel vdevs. I have a workspace that implicitly allows you to do this, but there are a bunch of issues in the SPA that need to be addressed before this could be exposed as a first class operation. Recovering from your current situation is doable, but tricky. The easiest thing to do would be compile your own ZFS kernel module that doesn''t do the vdev GUID sum check and import the pool. This should just cause the log device to be forgotten, but you''d definitely want to try this out on a different pool first. You could also create another pool with a separate log device, export it, and then manually tweak the label on the disk to match the expected pool guid and vdev guid. Neither of these is straightforward, and will require some time with the source code, but if your data is vital then it may be worthwhile. - Eric On Mon, Jan 07, 2008 at 08:36:54AM -0800, Bill Moloney wrote:> This is a re-post of this issue ... I didn''t get any replies to the previous > post of 12/27 ... I''m hoping someone is back from holiday > who may have some insight into this problem ... Bill > > when I remove a separate zil disk from a pool, the pool continues to function, > logging synchronous writes to the disks in the pool. Status shows that the log > disk has been removed, and everything seems to work fine until I export the > pool. > > After the pool has been exported (long after the log disk was removed > and gigabytes of synchronous writes were performed successfully), > I am no longer able to > import the pool. I get an error stating that a pool device cannot be found, > and importing the pool cannot succeed until the missing device (the separate > zil log disk) is replaced in the system. > > There is a bug filed by Neil Perrin: > 6574286 "removing a slog doesn''t work" > regading the problem of not being able to remove a separate zil device from > a pool, but no detail on the ramifications of just taking the device out of > the JBOD. > > Taking it out does not impact the immediate function of the pool, > but the inability to re-import it after this event is a significant issue. Has > anyone found a workaround for this problem ? I have data in a pool that > I cannot import because the separate zil is no longer available to me. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, FishWorks http://blogs.sun.com/eschrock
Thanks to Kyle, richard and Eric In dealing with this problem, I realize now that I could have saved myself a lot of grief if I had simply used the replace command and substituted some other drive for my flash drive before I removed it I think that this point is critical for anyone who finds themselves experimenting with separate ZILs ... since a pool will continue to function with no obvious problems after a separate ZIL is removed, it''s easy to think that, while the benefit of a separate ZIL is gone, the pool drives have picked up the ZIL function and all is well with the world the sad reality comes when a reboot or export of the pool occurs and there is then no way to re-import the pool without re-inserting the missing ZIL device, and if the missing ZIL device is no longer available, the pool is inaccessible ... it''s too late now to do a replace, because the pool must be imported to do anything ... all the data in your pool is perfect, but it''s perfectly out of reach ... Bill This message posted from opensolaris.org
Hello Bill, BM> the sad reality comes when a reboot or export of the pool occurs BM> and there is then no way to re-import the pool without re-inserting BM> the missing ZIL device, and if the missing ZIL device is no longer BM> available, the pool is inaccessible ... it''s too late now to do a replace, BM> because the pool must be imported to do anything ... all the data in BM> your pool is perfect, but it''s perfectly out of reach To cheer you up - fortunately it''s not perfectly out of reach - unfortunately it''s quite hard to reach it sorry, just being in a special mood today :) -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com