Paul B. Henson
2009-Jun-06 03:57 UTC
[zfs-discuss] importing pool with missing slog followup
My research into recovering from a pool whose slog goes MIA while the pool is off-line resulted in two possible methods, one requiring prior preparation and the other a copy of the zpool.cache including data for the failed pool. The first method is to simply dump a copy of the slog device right after you make it (just dd if=/dev/dsk/<slog> of=slog.dump>). If the device ever failed, theoretically you could restore the image onto a replacement (dd if=slog.dump of=/dev/dsk/<slog>) and import the pool. My initial testing of that method was promising, however that testing was performed by intentionally corrupting the slog device, and restoring the copy back onto the original device. However, when I tried restoring the slog dump onto a different device, that didn''t work out so well. zpool import recognized the different device as a log device for the pool, but still complained there were unknown missing devices and refused to import the pool. It looks like the device serial number is stored as part of the zfs label, resulting in confusion when that label is restored onto a different device. As such, this method is only usable if the underlying fault is simply corruption, and the original device is available to restore onto. The second method is described at: http://opensolaris.org/jive/thread.jspa?messageID=377018 Unfortunately, the included binary does not run under S10U6, and after half an hour or so of trying to get the source code to compile under S10U6 I gave up (I found some of the missing header files in the S10U6 grub source code package which presumably match the actual data structures in use under S10, but there was additional stuff missing which as I started copying it out of opensolaris code just started getting messier and messier). Unless someone with more zfs-fu than me creates a binary for S10, this approach is not going to be viable. Unofficially I was told that there is expected to be a fix for this issue putback into Nevada around July, but whether or not that might be available in U8 wasn''t said. So, barring any official release of a fix or unofficial availability of a workaround for S10, in the (admittedly unlikely) failure mode of a slog device failure on an inactive pool, have good backups :). -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
In my testing, I''ve seen that trying to duplicate zpool disks with dd often results in a disk that''s unreadable. I believe it has something to do with the block sizes of dd. In order to make my own slog backups, I just used cat instead. I plugged the slog SSD into another system (not a necessary step, but easier in my case), catted the disk to a file, then put the slog SSD back. I imagine this needs to be done with the zpool in a cleanly-exported state, i haven''t tested it otherwise. I''ve also tested replacing an SSD with my method, just cat the file back to the disk. I''ve tested this method of replacing a slog, and the zpool is imported on boot, like nothing happened, even though the physical hardware has changed. A question I have is, does "zpool replace" now work for slog devices as of snv_111b? -Greg On Fri, 2009-06-05 at 20:57 -0700, Paul B. Henson wrote:> My research into recovering from a pool whose slog goes MIA while the pool > is off-line resulted in two possible methods, one requiring prior > preparation and the other a copy of the zpool.cache including data for the > failed pool. > > The first method is to simply dump a copy of the slog device right after > you make it (just dd if=/dev/dsk/<slog> of=slog.dump>). If the device ever > failed, theoretically you could restore the image onto a replacement (dd > if=slog.dump of=/dev/dsk/<slog>) and import the pool. > > My initial testing of that method was promising, however that testing was > performed by intentionally corrupting the slog device, and restoring the > copy back onto the original device. However, when I tried restoring the > slog dump onto a different device, that didn''t work out so well. zpool > import recognized the different device as a log device for the pool, but > still complained there were unknown missing devices and refused to import > the pool. It looks like the device serial number is stored as part of the > zfs label, resulting in confusion when that label is restored onto a > different device. As such, this method is only usable if the underlying > fault is simply corruption, and the original device is available to restore > onto. > > The second method is described at: > > http://opensolaris.org/jive/thread.jspa?messageID=377018 > > Unfortunately, the included binary does not run under S10U6, and after half > an hour or so of trying to get the source code to compile under S10U6 I > gave up (I found some of the missing header files in the S10U6 grub source > code package which presumably match the actual data structures in use under > S10, but there was additional stuff missing which as I started copying it > out of opensolaris code just started getting messier and messier). Unless > someone with more zfs-fu than me creates a binary for S10, this approach is > not going to be viable. > > Unofficially I was told that there is expected to be a fix for this issue > putback into Nevada around July, but whether or not that might be available > in U8 wasn''t said. So, barring any official release of a fix or unofficial > availability of a workaround for S10, in the (admittedly unlikely) failure > mode of a slog device failure on an inactive pool, have good backups :). > >
Paul B. Henson
2009-Jun-10 01:38 UTC
[zfs-discuss] importing pool with missing slog followup
On Tue, 9 Jun 2009, Greg Mason wrote:> In my testing, I''ve seen that trying to duplicate zpool disks with dd > often results in a disk that''s unreadable. I believe it has something to > do with the block sizes of dd. > > In order to make my own slog backups, I just used cat instead.Huh, how about that -- in my case the dd replicated slog was recognized but import still complained about missing devices; however, using cat instead of dd did indeed work correctly. I was able to dump the slog using cat from the SSD, remove the SSD from the system, restore the dump using cat onto a completely different drive in a different bay and it worked perfectly. Thanks for the tip, now if I can just iron out my issue with fma reporting the Intel SSD having failed self test and marking it faulty I''ll be in good shape.> A question I have is, does "zpool replace" now work for slog devices as > of snv_111b?I actually used that a couple of times under S10U6 while testing slog migration, as far as I can tell it worked fine? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768