Hi everyone, sorry if these questions have really obvious answers for people running zfs already. I promise to try it out myself at some point :) Is it possible to query the stored checksum for a file? Ideally in some sort of read-only extended attribute accessible from scripts. This would make a tripwire-like tool very fast, and rsync more accurate. -- Since everything is copy-on-write, and the checksums are stored, would it be possible to store common blocks just once? -- Will encryption be supported at some point? Since there''s already compression, I think the only issue would be to get the key in the zfs layer in a secure way. -- Is it possible to store a zfs filesystem in an auto-expanding file? Mac OS X has sparseimage files, which contain just the used blocks of your filesystem. This, combined with encryption, gives flexible secure home directories. Thanks for any answers, Wout.
>Is it possible to query the stored checksum for a file? Ideally in >some sort of read-only extended attribute accessible from scripts. >This would make a tripwire-like tool very fast, and rsync more accurate.Not that I know>Since everything is copy-on-write, and the checksums are stored, >would it be possible to store common blocks just once?It would be, but it would require: - a master hash table where you can lookup <size, hash> -> block - a bitwise compare of the block to be stored.>Will encryption be supported at some point? Since there''s already >compression, I think the only issue would be to get the key in the >zfs layer in a secure way.It is planned.>Is it possible to store a zfs filesystem in an auto-expanding file? >Mac OS X has sparseimage files, which contain just the used blocks of >your filesystem. This, combined with encryption, gives flexible >secure home directories.But in a rather awkward implementation; if ZFS can encrypt at a different level, then surely that is preferred. Auto-expanding and copy on write would seem to be give you larger files than you would want. Casper
On Thu, 2006-01-19 at 16:41, Wout Mertens wrote:> Is it possible to query the stored checksum for a file? Ideally in > some sort of read-only extended attribute accessible from scripts. > This would make a tripwire-like tool very fast, and rsync more accurate.This is RFE# 6259754. Not implemented yes.> Will encryption be supported at some point? Since there''s already > compression, I think the only issue would be to get the key in the > zfs layer in a secure way.Yes it is planned. Key management is one of the hard problems in this area. Luckily we have the Solaris Cryptographic Framework to help us out; this includes support for hardware security modules and hardware acceleration. Key management isn''t just about getting the key there, thats actually the easy bit. The hard bit is getting the architecture correct so that we can support things like secure delete on demand, secure delete at time N, Escrowed keys, keys that allow backup but not restore etc etc.> Is it possible to store a zfs filesystem in an auto-expanding file? > Mac OS X has sparseimage files, which contain just the used blocks of > your filesystem. This, combined with encryption, gives flexible > secure home directories.If flexible secure home directories is what you want the ZFS encryption will give you that. Just create a separate ZFS filesystem for each user. Filesystems in ZFS are really really cheap and easy to manage (since we aren''t tied to using /etc/vfstab and /etc/dfs/dfstab to mount and share them). -- Darren J Moffat
Hello Wout, Thursday, January 19, 2006, 5:41:09 PM, you wrote: WM> Hi everyone, WM> Is it possible to query the stored checksum for a file? Ideally in WM> some sort of read-only extended attribute accessible from scripts. WM> This would make a tripwire-like tool very fast, and rsync more accurate. I can see it could work for tripwire-like tools but I''m not sure it would for rsync-like ''coz generally you don''t know what block size will be used after file is copied to other filesystem so checksum will be actually for different block. But maybe I''m overlooking something... -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Hi Robert, On 20 Jan 2006, at 09:43, Robert Milkowski wrote:> WM> Is it possible to query the stored checksum for a file? Ideally in > WM> some sort of read-only extended attribute accessible from scripts. > WM> This would make a tripwire-like tool very fast, and rsync more > accurate. > > I can see it could work for tripwire-like tools but I''m not sure it > would for rsync-like ''coz generally you don''t know what block size > will be used after file is copied to other filesystem so checksum will > be actually for different block. But maybe I''m overlooking > something...With "more accurate", I meant that instead of relying on mtime to see if a file changed, you have the checksum available. Normally, the speed of checksumming is prohibitive, but with zfs, you get it for free with the data protection features! Rsync would still have to do its own block-level checksumming on both sides for synchronizing files, but generally, the network is slower than the checksumming, so that''s ok. A real smart rsync could use the zfs block checksums and ask the remote side for the checksums for the same blocks though, that would eliminate half of the work. It would come at the cost of teaching rsync about zfs internals, which seems to me like a very bad thing. Wout.
On 19 Jan 2006, at 18:32, Darren J Moffat wrote:> On Thu, 2006-01-19 at 16:41, Wout Mertens wrote: >> Is it possible to query the stored checksum for a file? Ideally in >> some sort of read-only extended attribute accessible from scripts. >> This would make a tripwire-like tool very fast, and rsync more >> accurate. > > This is RFE# 6259754. Not implemented yes.Is that the actual number? How many are before it? :-(>> Will encryption be supported at some point? Since there''s already >> compression, I think the only issue would be to get the key in the >> zfs layer in a secure way. > > Yes it is planned. Key management is one of the hard problems > in this area. Luckily we have the Solaris Cryptographic Framework > to help us out; this includes support for hardware security modules > and hardware acceleration. Key management isn''t just about getting > the key there, thats actually the easy bit. The hard bit is > getting the architecture correct so that we can support things > like secure delete on demand, secure delete at time N, Escrowed > keys, keys that allow backup but not restore etc etc.That''ll teach me being an armchair programmer :) Can''t wait to see what you guys make of it, should be an interesting read...>> Is it possible to store a zfs filesystem in an auto-expanding file? >> Mac OS X has sparseimage files, which contain just the used blocks of >> your filesystem. This, combined with encryption, gives flexible >> secure home directories. > > If flexible secure home directories is what you want the ZFS > encryption will give you that. Just create a separate ZFS filesystem > for each user. Filesystems in ZFS are really really cheap and > easy to manage (since we aren''t tied to using /etc/vfstab and > /etc/dfs/dfstab to mount and share them).Hmmm, very good point. The only thing you lose in this situation is the possibility of taking a users'' files and putting them on a different system, without knowing what they are... Would encryption be implemented in such a way that you can access the encrypted data? I''m thinking backups here... Thanks for your answers, Wout.
On 19 Jan 2006, at 18:28, Casper.Dik at Sun.COM wrote:>> Since everything is copy-on-write, and the checksums are stored, >> would it be possible to store common blocks just once? > > It would be, but it would require: > - a master hash table where you can lookup <size, hash> -> block > - a bitwise compare of the block to be stored.That''s what I was thinking... Would that be prohibitive? Also, I know that several products implementing this kind of behaviour (disk based backup systems) simply skip the bitwise compare. The assumption is: A hash collision is a very rare occurence. A hash collision with the constraint that your files must make sense somehow (i.e. not random data but images, documents, programs, etc) should be rarer still.>> Is it possible to store a zfs filesystem in an auto-expanding file? >> Mac OS X has sparseimage files, which contain just the used blocks of >> your filesystem. This, combined with encryption, gives flexible >> secure home directories. > > But in a rather awkward implementation; if ZFS can encrypt at a > different > level, then surely that is preferred. > > Auto-expanding and copy on write would seem to be give you larger > files > than you would want.Hmmm, aren''t the released blocks reused afterwards? So you''d have a larger footprint but still less than the amount of space you reserved initially. But I hadn''t considered just making a new filesystem, I still need to get old habits out of my head :) The only disadvantage I can see then is backups, like I mentioned in my reply to Darren. Thanks, Wout.
>On 19 Jan 2006, at 18:28, Casper.Dik at Sun.COM wrote: > >>> Since everything is copy-on-write, and the checksums are stored, >>> would it be possible to store common blocks just once? >> >> It would be, but it would require: >> - a master hash table where you can lookup <size, hash> -> block >> - a bitwise compare of the block to be stored. > >That''s what I was thinking... Would that be prohibitive?Not sure; but you''d also need a "copy-on-write" method of updating the hash table, reference counting for blocks and other things. Perhaps not difficult but certainly more than a minor matter of programming. On a per-file basis this might be more inbteresting.>Also, I know that several products implementing this kind of >behaviour (disk based backup systems) simply skip the bitwise >compare. The assumption is: A hash collision is a very rare >occurence. A hash collision with the constraint that your files must >make sense somehow (i.e. not random data but images, documents, >programs, etc) should be rarer still.Still, that does not make good engineering sense; you know that you''ve built-in data corruption, even though the chances are small.>But I hadn''t considered just making a new filesystem, I still need to >get old habits out of my head :) The only disadvantage I can see then >is backups, like I mentioned in my reply to Darren.ZFS does require some readjustment of your mindset. Casper
Since everything is copy-on-write, and the checksums are stored, would it be possible to store common blocks just once? [cut]>>Also, I know that several products implementing this kind of >>behaviour (disk based backup systems) simply skip the bitwise >>compare. The assumption is: A hash collision is a very rare >>occurence. A hash collision with the constraint that your files must >>make sense somehow (i.e. not random data but images, documents, >>programs, etc) should be rarer still. >> >> > >Still, that does not make good engineering sense; you know that you''ve >built-in data corruption, even though the chances are small. > >The systems I know that depend on the hash as the key for the data use a secure hash, not a checksum, i.e. peer-to-peer systems. There''s a huge difference here in terms of liklyhood of a collision. I don''t think we can call a checksum a hash here. Bill la Forge -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060120/081142fa/attachment.html>
Hello Wout, Friday, January 20, 2006, 9:55:23 AM, you wrote: WM> Hi Robert, WM> On 20 Jan 2006, at 09:43, Robert Milkowski wrote:>> WM> Is it possible to query the stored checksum for a file? Ideally in >> WM> some sort of read-only extended attribute accessible from scripts. >> WM> This would make a tripwire-like tool very fast, and rsync more >> accurate. >> >> I can see it could work for tripwire-like tools but I''m not sure it >> would for rsync-like ''coz generally you don''t know what block size >> will be used after file is copied to other filesystem so checksum will >> be actually for different block. But maybe I''m overlooking >> something...WM> With "more accurate", I meant that instead of relying on mtime to see WM> if a file changed, you have the checksum available. Normally, the WM> speed of checksumming is prohibitive, but with zfs, you get it for WM> free with the data protection features! WM> Rsync would still have to do its own block-level checksumming on both WM> sides for synchronizing files, but generally, the network is slower WM> than the checksumming, so that''s ok. WM> A real smart rsync could use the zfs block checksums and ask the WM> remote side for the checksums for the same blocks though, that would WM> eliminate half of the work. It would come at the cost of teaching WM> rsync about zfs internals, which seems to me like a very bad thing. The problem is that when you are trying to keep in sync two filesystems then you expect mtime to be the same. However as ZFS uses DIFFERENT block sizes you probably can''t gurantee that there are the same block sizes (and number of block) in a source file and file which is (r)synced. You would need to keep you own database of checksums for each block in given file so tripware-like utilities coule take advantage of this. To make me clear - I belive that if you have file named A on one zfs filesystem and then you copy it to another ZFS filesystem (file A'') then you can''t gurantee that A and A'' have the same number of blocks and use the same block sizes. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> The problem is that when you are trying to keep in sync two > filesystems then you expect mtime to be the same. However as ZFS uses > DIFFERENT block sizes you probably can''t gurantee that there are the > same block sizes (and number of block) in a source file and file which > is (r)synced. You would need to keep you own database of checksums for > each block in given file so tripware-like utilities coule take > advantage of this.Actually, I think we are talking about two different things here. I was proposing using the file-level checksum to detect whether a file had changed versus the remote copy. I wasn''t proposing using the block-level checksums, since that would indeed change between filesystems. Now, this will only work if the checksum stored in the dnode is the same for all the block layouts of the files. In fact, thinking about it, that seems to become unlikely :-( If that checksum changes as the file layout changes, it is only moderately useful. A resilver would probably change the checksum, no? Wout.
On Fri, Jan 20, 2006 at 12:28:21PM +0100, Robert Milkowski wrote:> The problem is that when you are trying to keep in sync two > filesystems then you expect mtime to be the same. However as ZFS uses > DIFFERENT block sizes you probably can''t gurantee that there are the > same block sizes (and number of block) in a source file and file which > is (r)synced. You would need to keep you own database of checksums for > each block in given file so tripware-like utilities coule take > advantage of this.Also, a file''s ZFS checksum is affected by the checksums of its indirect blocks, and therefore by the block locations of its data and indirect blocks. All of which means that ZFS checksums are very strongly tied to the filesystem, or, rather, dataset, where the objects in question reside. But ZFS checksums might still be useful if you''re trying to detect change locally (as would be the file''s dnode block pointer, a/c/mtime, and generation number). Nico --
On 20 Jan 2006, at 19:55, Darren J Moffat wrote:> On Fri, 2006-01-20 at 09:03, Wout Mertens wrote: >> Would encryption be implemented in such a way that you can access the >> encrypted data? I''m thinking backups here... > > That is one of the highest priority requirements that we have. > > We really want a method where backups can be done without the backup > operator actually being able to read the data, and hopefully > without the > backup program requiring too much operating system privilege. > > This is why key management is the key, pardon the pun, to the > whole ZFS crypto story and why it is hard to do well.<Geeking out, sorry if this is redundant information. Consider it a fanboy post if so.> There is a LUFS filesystem called CryptoFS that implements something akin to this. http://linux.softpedia.com/get/System/Filesystems/ CryptoFS-1474.shtml Basically, you have a regular directory with encrypted files that is used as the backend storage. When you mount the CryptoFS, it uses the key to decrypt filenames and the files themselves. The information you''re leaking is the rough size of your file and the layout of your filesystem. You can make simple backups of the backend storage. I assume access control is through the access control of the backend storage. (For those who wonder, if you know rough sizes and layout, you can guess which "standard" files are available, making plaintext attacks possible to retrieve the encryption key) Since ZFS is a lower layer and stores everything as objects it could do one better, encrypting the directory blocks as well. (I''ve no idea what the directory structure on disk is, sorry). ZFS would only be leaking rough sizes and the number of objects in that case, and that information can be salted by adding random junk. BTW, a thought occurred to me; OS X sparse images are not very backup- friendly. They''re multi-gigabyte files that change all the time. We actually have an rsync running on the OS X systems using sparse images that syncs the unencrypted files while they''re mounted to the backup server. So apart from CryptoFS (Linux only), there''s no real, cheap, easy, solution to encrypted storage + backups. ZFS could finally make tape backups secure without weird/expensive workarounds! </geeking out> Wout.
On Fri, 2006-01-20 at 09:55 +0100, Wout Mertens wrote:> A real smart rsync could use the zfs block checksums and ask the > remote side for the checksums for the same blocks though, that would > eliminate half of the work. It would come at the cost of teaching > rsync about zfs internals, which seems to me like a very bad thing.on the other hand, it would join a number of similar "layer violations" floating about -- there''s a patch to gzip to add an option which makes the files it compresses more amenable to incremental transfer by rsync as well as a related package called "zsync" which has another way to efficiently do incremental transfers on compressed files. - Bill
On 20 Jan 2006, at 14:43, Nicolas Williams wrote:> Also, a file''s ZFS checksum is affected by the checksums of its > indirect > blocks, and therefore by the block locations of its data and indirect > blocks. All of which means that ZFS checksums are very strongly > tied to > the filesystem, or, rather, dataset, where the objects in question > reside.Sigh, I was afraid of that. I still think a static, instant, checksum of the files would be very valuable. I tried finding some way to still have it, but the only way I can think of involves using weak checksums that obey to sum(concat (a,b)) == f(sum(a),sum(b)). Not an option methinks :-(> But ZFS checksums might still be useful if you''re trying to detect > change locally (as would be the file''s dnode block pointer, a/c/mtime, > and generation number).Myeah. If you keep a separate list of checksums you would know that if the checksum was still the same, they file is still the same. If the checksums differ, you would need to use a different checksum on just the file to verify sameness. Should speed up tripwire-like tools at least. Wout.
On Fri, 2006-01-20 at 09:03, Wout Mertens wrote:> On 19 Jan 2006, at 18:32, Darren J Moffat wrote: > > > On Thu, 2006-01-19 at 16:41, Wout Mertens wrote: > >> Is it possible to query the stored checksum for a file? Ideally in > >> some sort of read-only extended attribute accessible from scripts. > >> This would make a tripwire-like tool very fast, and rsync more > >> accurate. > > > > This is RFE# 6259754. Not implemented yes. > > Is that the actual number? How many are before it? :-(Yes that is the unique primary key in the database. There aren''t actually that many bugs/rfes (they use the same primary key space and the data tables have a flag rfe/bug). There are some "holes" due to past migrations between bug databases and front end tools. The actual number is meaning less since there is a priority and severity and justification associated with bugs and rfes as well. So now there aren''t 6,259,753 things to do before someone gets to this :-) -- Darren J Moffat
On Fri, 2006-01-20 at 09:03, Wout Mertens wrote:> Would encryption be implemented in such a way that you can access the > encrypted data? I''m thinking backups here...That is one of the highest priority requirements that we have. We really want a method where backups can be done without the backup operator actually being able to read the data, and hopefully without the backup program requiring too much operating system privilege. This is why key management is the key, pardon the pun, to the whole ZFS crypto story and why it is hard to do well. -- Darren J Moffat
On Fri, 2006-01-20 at 10:57, Bill la Forge wrote:> > > The systems I know that depend on the hash as the key for the data use > a secure hash, > not a checksum, i.e. peer-to-peer systems. There''s a huge difference > here in terms of liklyhood > of a collision. I don''t think we can call a checksum a hash here.What is not cryptographically secure about SHA256 which is one of the options for ZFS filesystems. zfs set .... checksum YES YES on | off | fletcher2 | fletcher4 | sha256 -- Darren J Moffat
> A resilver would probably change the checksum, no?No. A resilver just fixes any damaged copies of the data. The valid data doesn''t change, so its checksum doesn''t change either. The idea of passing ZFS block checksums up the stack for further upstream (or over-the-wire) validation is tempting, but as folks have noted, it''s tricky. It requires either that the checksum function be partitionable (which greatly reduces its strength), or that each layer above us can cope with whatever block size we give it. It''s certainly possible, but it''s not a cake walk. Jeff