Sorry for abusing the mailing list, but I don''t know how to report bugs anymore and have no visibility of whether this is a known/resolved issue. So, just in case it is not... With Solaris 11 Express, scrubbing a pool with encrypted datasets for which no key is currently loaded, unrecoverable read errors are reported. The error count applies to the pool, and not to any specific device, which is also somewhat at odds with the "helpful message" text for diagnostic status and suggested action: pool: geek state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 3h8m with 280 errors on Tue May 10 17:12:15 2011 config: NAME STATE READ WRITE CKSUM geek ONLINE 280 0 0 raidz2-0 ONLINE 0 0 0 c13t0d0 ONLINE 0 0 0 c13t1d0 ONLINE 0 0 0 c13t2d0 ONLINE 0 0 0 c13t3d0 ONLINE 0 0 0 c13t4d0 ONLINE 0 0 0 c13t5d0 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 Using -v lists an error for the same 2 hexid''s in each snapshot, as per the following example: geek/crypt at zfs-auto-snap_weekly-2011-03-28-22h39:<0xfffffffffffffffe> geek/crypt at zfs-auto-snap_weekly-2011-03-28-22h39:<0xffffffffffffffff> When this has happened previously (on this and other pools) mounting the dataset by supplying the key, and rerunning the scrub, removes the errors. For some reason, I can''t in this case (keeps complaining that the key is wrong). That may be a different issue that has also happened before, and I will post about separately, once I''m sure I didn''t just made a typo (twice) when first setting the key. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20110511/feb0940e/attachment.bin>
On 11/05/2011 01:07, Daniel Carosone wrote:> Sorry for abusing the mailing list, but I don''t know how to report > bugs anymore and have no visibility of whether this is a > known/resolved issue. So, just in case it is not...Log a support call with Oracle if you have a support contract.> With Solaris 11 Express, scrubbing a pool with encrypted datasets for > which no key is currently loaded, unrecoverable read errors are > reported. The error count applies to the pool, and not to any specific > device, which is also somewhat at odds with the "helpful message" text > for diagnostic status and suggested action:Known issue: 6989185 scrubbing a pool with encrypted filesystems and snapshots can report false positive errors. If you have a support contract you may be able to request that fix be back ported into an SRU (note I''m not guaranteeing it will be just saying that it is technically possible)> When this has happened previously (on this and other pools) mounting > the dataset by supplying the key, and rerunning the scrub, removes the > errors. > > For some reason, I can''t in this case (keeps complaining that > the key is wrong). That may be a different issue that has also > happened before, and I will post about separately, once I''m sure I > didn''t just made a typo (twice) when first setting the key.Since you are saying "typo" I''m assuming you have keysource=passphrase,prompt (ie the default). Have you ever done a send|recv of the encrypted datasets ? and if so where there multiple snapshots recv''d ? -- Darren J Moffat
Darren, Firstly, thankyou for your great work on zfs-crypto, and your continued involvement in public fora. It is very much appreciated. I was also very happy to read about the encrypted zvol swap feature in your blog entry the other day.. this is useful and I had missed it. On Wed, May 11, 2011 at 10:12:41AM +0100, Darren J Moffat wrote:>> Sorry for abusing the mailing list, but I don''t know how to report >> bugs anymore and have no visibility of whether this is a >> known/resolved issue. So, just in case it is not... > > Log a support call with Oracle if you have a support contract.I don''t, or of course I would have.>> With Solaris 11 Express, scrubbing a pool with encrypted datasets for > > Known issue: > > 6989185 scrubbing a pool with encrypted filesystems and snapshots can > report false positive errors.Great, thanks.> If you have a support contract you may be able to request that fix be > back ported into an SRU (note I''m not guaranteeing it will be just > saying that it is technically possible)Since I assume from this that it has been fixed, and seems otherwise harmless, I can wait for it to make it''s way in the next general update / beta / release.>> For some reason, I can''t in this case (keeps complaining that >> the key is wrong). That may be a different issue that has also >> happened before, and I will post about separately, once I''m sure I >> didn''t just made a typo (twice) when first setting the key. > > Since you are saying "typo" I''m assuming you have > keysource=passphrase,prompt (ie the default). Have you ever done a > send|recv of the encrypted datasets ? and if so where there multiple > snapshots recv''d ?Yes, indeed. These datasets were sent from another staging pool (where they were also encrypted, obviously) as part of a reshuffle. And yes, there were some earlier snapshots of these datasets in the replication stream, as well as other datasets - it was a whole pool move. They were also sent from an ashift=9 to an ashift=12 pool, on the remote chance that this may be relevant. The passphrase was entered at a prompt that popped up at the relevant point in the recv stream.. which I thought was odd for a presumably non-interactive command. Not sure what the alternative would be though. I''m guessing there''s another known issue here, by your specific and prescient question. Was there something odd about the terminal mode at the time this input was teken that may have garbled the key? Something related to charset conversions or other crazy unicode stuff? :) More importantly, what are the prospects of correctly reproducing that key so as to get at data? I still have the original staging pool, but some additions made since the transfer would be lost otherwise. It''s not especially important data, but would be annoying to lose or have to reproduce. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20110512/eaa527af/attachment.bin>
On Thu, May 12, 2011 at 12:23:55PM +1000, Daniel Carosone wrote:> They were also sent from an ashift=9 to an ashift=12 poolThis reminded me to post a note describing how I made pools with different ashift. I do this both for pools on usb flash sticks, and on disks with an underlying 4k blocksize, such as my 2Tb WD EARS drives. If I had pools on SATA Flash SSDs, I''d do it for those too. The trick comes from noting that stmfadm create-lu has a blk option for the block size of the iscsi volume to be presented. Creating a pool with at least one disk (per top-level vdev) on an iscsi initiator pointing at such a target will cause zpool to set ashift for the vdev accordingly. This works even when the initiator and target are the same host, over the loopback interface. Oddly, however, it does not work if the host is solaris express b151 - it does work on OI b148. Something has changed in zpool creating in the interim. Anyway, my recipe is to: * boot OI b148 in a vm. * make a zfs dataset to house the working files (reason will be clear below). * In that dataset, I make sparse files corresponding in size and number to the disks that will eventually hold the pool (this makes a pool with the same size and number of metaslabs as would have been natively). * Also make a sparse zvol of the same size. * stmfadm create-lu -p blk=4096 (or whatever, as desired) on the zvol, and make available. * get the iscsi initiator to connect the lu as a new disk device * zpool create, using all bar 1 of the files, and the iscsi disk, in the shape you want your pool (raidz2, etc). * zpool replace the iscsi disk with the last unused file (now you can tear down the lu and zvol) * zpool export the pool-on-files. * zfs send the dataset housing these files to the machine that has the actual disks (much faster than rsync even with the sparse files option, since it doesn''t have to scan for holes). * zpool import the pool from the files * zpool upgrade, if you want newer pool features, like crypto. * zpool set autoexpand=on, if you didn''t actually use files of the same size. * zpool replace a file at a time onto the real disks. Hmm.. when written out like that, it looks a lot more complex than it really is.. :-) Note that if you want lots of mirrors, you''ll need an iscsi device per mirror top-level vdev. Note also that the image created inside the iscsi device is not identical to what you want on a device with 512-byte sector emulation, since the label is constructed for a 4k logical sector size. zpool replace takes care of this when labelling the replacement disk/file. I also played around with another method, using mdb to overwrite the disk model table to match my disks and make the pool directly on them with the right ashift. http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;im=10;i=sd_flash_dev_table This also no longer works on b151 (though the table still exists), so I need the vm anyway, and the iscsi method is easier. Finally, because this doesn''t work on b151, it''s also only good for creating new pools; I don''t know how to expand a pool with new vdevs to have the right ashift in those vdevs. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110512/7e9d4bb4/attachment.bin> -------------- next part -------------- _______________________________________________ zfs-crypto-discuss mailing list zfs-crypto-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss
On 12/05/2011 03:23, Daniel Carosone wrote:> Yes, indeed. These datasets were sent from another staging pool (where > they were also encrypted, obviously) as part of a reshuffle. And yes, > there were some earlier snapshots of these datasets in the replication > stream, as well as other datasets - it was a whole pool move. They > were also sent from an ashift=9 to an ashift=12 pool, on the remote > chance that this may be relevant. > > The passphrase was entered at a prompt that popped up at the relevant > point in the recv stream.. which I thought was odd for a presumably > non-interactive command. Not sure what the alternative would be > though. > > I''m guessing there''s another known issue here, by your specific and > prescient question. Was there something odd about the terminal mode at > the time this input was teken that may have garbled the key? Something > related to charset conversions or other crazy unicode stuff? :)There is a possible bug in in that area too, and it is only for the keysource=passphrase case. It isn''t anything to do with the terminal.> More importantly, what are the prospects of correctly reproducing that > key so as to get at data? I still have the original staging pool, but > some additions made since the transfer would be lost otherwise. It''s > not especially important data, but would be annoying to lose or have > to reproduce.I''m not sure, can you send me the ouput of ''zpool history'' on the pool that the recv was done to. I''ll be able to determine from that if I can fix up the problem or not. -- Darren J Moffat
On Thu, May 12, 2011 at 10:04:19AM +0100, Darren J Moffat wrote:> There is a possible bug in in that area too, and it is only for the > keysource=passphrase case.Ok, sounds like it''s not yet a known one. If there''s anything I can do to help track it down, let me know.> It isn''t anything to do with the terminal.Heh, ok.. just a random WAG.>> More importantly, what are the prospects of correctly reproducing that >> key so as to get at data? I still have the original staging pool, but >> some additions made since the transfer would be lost otherwise. It''s >> not especially important data, but would be annoying to lose or have >> to reproduce. > > I''m not sure, can you send me the ouput of ''zpool history'' on the pool > that the recv was done to. I''ll be able to determine from that if I can > fix up the problem or not.Can do - but it''s odd. Other than the initial create, and the most recent scrub, the history only contains a sequence of auto-snapshot creations and removals. None of the other commands I''d expect, like the filesystem creations and recv, the device replacements (as I described in my other post), previous scrubs, or anything else: dan at ventus:~# zpool history geek | grep -v auto-snap History for ''geek'': 2011-04-01.08:48:15 zpool create -f geek raidz2 /rpool1/stage/file0 /rpool1/stage/file1 /rpool1/stage/file2 /rpool1/stage/file3 /rpool1/stage/file4 /rpool1/stage/file5 /rpool1/stage/file6 /rpool1/stage/file7 /rpool1/stage/file8 c2t600144F0DED90A0000004D9590440001d0 2011-05-10.14:03:34 zpool scrub geek If you want the rest, I''m happy to send it, but I don''t expect it will tell you anything. I do wonder why that is... -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20110512/37dfb871/attachment.bin>
Or you could use one of the binaries provided from several different sources, which force ashift=12 in the code, eg: http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/ <http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/> http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html <http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html>Note that you need to use a modified binary any time you use the command "zfs add", or the added vdev will be of the default ashift. Cheers, On 12 May 2011 13:24, Daniel Carosone <dan at geek.com.au> wrote:> On Thu, May 12, 2011 at 12:23:55PM +1000, Daniel Carosone wrote: > > They were also sent from an ashift=9 to an ashift=12 pool > > This reminded me to post a note describing how I made pools with > different ashift. I do this both for pools on usb flash sticks, and > on disks with an underlying 4k blocksize, such as my 2Tb WD EARS > drives. If I had pools on SATA Flash SSDs, I''d do it for those too. > > The trick comes from noting that stmfadm create-lu has a blk option > for the block size of the iscsi volume to be presented. Creating a > pool with at least one disk (per top-level vdev) on an iscsi initiator > pointing at such a target will cause zpool to set ashift for the vdev > accordingly. > > This works even when the initiator and target are the same host, over > the loopback interface. Oddly, however, it does not work if the host > is solaris express b151 - it does work on OI b148. Something has > changed in zpool creating in the interim. > > Anyway, my recipe is to: > > * boot OI b148 in a vm. > * make a zfs dataset to house the working files (reason will be clear > below). > * In that dataset, I make sparse files corresponding in size and > number to the disks that will eventually hold the pool (this makes > a pool with the same size and number of metaslabs as would have > been natively). > * Also make a sparse zvol of the same size. > * stmfadm create-lu -p blk=4096 (or whatever, as desired) on the > zvol, and make available. > * get the iscsi initiator to connect the lu as a new disk device > * zpool create, using all bar 1 of the files, and the iscsi disk, in > the shape you want your pool (raidz2, etc). > * zpool replace the iscsi disk with the last unused file (now you can > tear down the lu and zvol) > * zpool export the pool-on-files. > * zfs send the dataset housing these files to the machine that has > the actual disks (much faster than rsync even with the sparse files > option, since it doesn''t have to scan for holes). > * zpool import the pool from the files > * zpool upgrade, if you want newer pool features, like crypto. > * zpool set autoexpand=on, if you didn''t actually use files of the > same size. > * zpool replace a file at a time onto the real disks. > > Hmm.. when written out like that, it looks a lot more complex than it > really is.. :-) > > Note that if you want lots of mirrors, you''ll need an iscsi device per > mirror top-level vdev. > > Note also that the image created inside the iscsi device is not > identical to what you want on a device with 512-byte sector emulation, > since the label is constructed for a 4k logical sector size. zpool > replace takes care of this when labelling the replacement disk/file. > > I also played around with another method, using mdb to overwrite the > disk model table to match my disks and make the pool directly on them > with the right ashift. > > http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;im=10;i=sd_flash_dev_table > > This also no longer works on b151 (though the table still exists), so I > need the vm anyway, and the iscsi method is easier. > > Finally, because this doesn''t work on b151, it''s also only good for > creating new pools; I don''t know how to expand a pool with new vdevs > to have the right ashift in those vdevs. > > -- > Dan. > _______________________________________________ > zfs-crypto-discuss mailing list > zfs-crypto-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20110512/9c3945ab/attachment.html>
Well, for the sake of completeness (and perhaps to enable users of snv_151a) there should also be links to alternative methods: 1) Using a patched-source and recompiled, or an already precompiled, "zpool" binary, i.e. http://www.solarismen.de/archives/4-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-1.html http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html http://www.solarismen.de/archives/6-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-3.html http://www.solarismen.de/archives/9-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-4.html http://www.kuehnke.de/christian/solaris/zpool-s10u8 2) Making a pool in an alternate OS, such as FreeBSD LiveCD with their tricks, and then importing/upgrading in Solaris. See www.zfsguru.org and numerous posts in the internet by its author "sub_mesa" or "sub.mesa". I am not promoting either of these methods. I''ve used (1) successfully on my OI_148a by taking a precompiled binary, and I didn''t get around to trying (2). Just my 2c :) //Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-crypto-discuss/attachments/20110512/6fb682ae/attachment.html>
Just a ping for any further updates, as well as a crosspost to migrate the thread to zfs-discuss (from -crypto-). Is there any further information I can provide? What''s going on with that "zpool history", and does it tell you anything about the chances of recovering the actual key used? On Thu, May 12, 2011 at 08:52:04PM +1000, Daniel Carosone wrote:> On Thu, May 12, 2011 at 10:04:19AM +0100, Darren J Moffat wrote: > > There is a possible bug in in that area too, and it is only for the > > keysource=passphrase case. > > Ok, sounds like it''s not yet a known one. If there''s anything I can > do to help track it down, let me know. > > > It isn''t anything to do with the terminal. > > Heh, ok.. just a random WAG. > > >> More importantly, what are the prospects of correctly reproducing that > >> key so as to get at data? I still have the original staging pool, but > >> some additions made since the transfer would be lost otherwise. It''s > >> not especially important data, but would be annoying to lose or have > >> to reproduce. > > > > I''m not sure, can you send me the ouput of ''zpool history'' on the pool > > that the recv was done to. I''ll be able to determine from that if I can > > fix up the problem or not. > > Can do - but it''s odd. Other than the initial create, and the most > recent scrub, the history only contains a sequence of auto-snapshot > creations and removals. None of the other commands I''d expect, like > the filesystem creations and recv, the device replacements (as I > described in my other post), previous scrubs, or anything else: > > dan at ventus:~# zpool history geek | grep -v auto-snap > History for ''geek'': > 2011-04-01.08:48:15 zpool create -f geek raidz2 /rpool1/stage/file0 /rpool1/stage/file1 /rpool1/stage/file2 /rpool1/stage/file3 /rpool1/stage/file4 /rpool1/stage/file5 /rpool1/stage/file6 /rpool1/stage/file7 /rpool1/stage/file8 c2t600144F0DED90A0000004D9590440001d0 > 2011-05-10.14:03:34 zpool scrub geek > > If you want the rest, I''m happy to send it, but I don''t expect it will > tell you anything. I do wonder why that is... > > -- > Dan.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110526/8eb85068/attachment-0001.bin>
> > On Thu, May 12, 2011 at 08:52:04PM +1000, Daniel Carosone wrote: > > Other than the initial create, and the most > > recent scrub, the history only contains a sequence of auto-snapshot > > creations and removals. None of the other commands I''d expect, like > > the filesystem creations and recv, the device replacements (as I > > described in my other post), previous scrubs, or anything else: >We keep a limited amount of history data (up to 32MB of raw data). So if you have a ton of auto-snapshot activity, earlier operations may have fallen out of the history. But we always keep the "zpool create" line. --matt -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110525/39bf2d5a/attachment.html>
On Wed, May 25, 2011 at 11:54:16PM -0700, Matthew Ahrens wrote:> > > > On Thu, May 12, 2011 at 08:52:04PM +1000, Daniel Carosone wrote: > > > Other than the initial create, and the most > > > recent scrub, the history only contains a sequence of auto-snapshot > > > creations and removals. None of the other commands I''d expect, like > > > the filesystem creations and recv, the device replacements (as I > > > described in my other post), previous scrubs, or anything else: > > > > We keep a limited amount of history data (up to 32MB of raw data). So if > you have a ton of auto-snapshot activity, earlier operations may have fallen > out of the history. But we always keep the "zpool create" line.That sounds right for behaviour, but not for numbers. dan at ventus:~# zpool history geek | wc 50100 247555 4359702 I presume the on-disk format is at least a little more compat than this, too. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110527/bb83f04b/attachment.bin>