Bady, Brant RBCM:EX
2006-Sep-14 20:11 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
I am working in the area of archiving (in the true send of the word - e.g. using the OAIS reference model) electronic data for long term preservation and access. ZFS now makes magnetic disk arrays a bit more suitable for that. Part of the archiving process is to generate checksums (I happen to use MD5), and store them with other metadata about the digital object in order to verify data integrity and demonstrate the authenticity of the digital object over time. Wouldn''t it be helpful if there was a utility to access/read the checksum data created by ZFS, and use it for those same purposes. Hoping to see something like that in a future release, or a command line utility that could do the same. thanks> Brant BadyManager, Electronic Archives> Access and Information Management > Royal British Columbia Museum > Telephone: (250) 387-4126 > Fax: (250) 387-2072 > Email: Brant.Bady at royalbcmuseum.bc.ca > BC Archives Web Site: http://www.bcarchives.gov.bc.ca > RBCM Web Site: http://www.royalbcmuseum.bc.ca > > Mail: 675 Belleville Street > Victoria, BC > V8W 9W2 > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060914/bcc28c9d/attachment.html>
Henk Langeveld
2006-Sep-14 20:32 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Bady, Brant RBCM:EX wrote:> Part of the archiving process is to generate checksums (I happen to use > MD5), and store them with other metadata about the digital object in > order to verify data integrity and demonstrate the authenticity of the > digital object over time.> Wouldn''t it be helpful if there was a utility to access/read the > checksum data created by ZFS, and use it for those same purposes.Doesn''t ZFS use block-level checksums?> Hoping to see something like that in a future release, or a command line > utility that could do the same.It might be possible to add a user set property to a file with the md5sum and a timestamp when it was computed. But what would this protect against? If you need to avoid tampering, you need the checksums offline anyway - cf. tripwire. Cheers, Henk
Chad Lewis
2006-Sep-14 20:45 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Sep 14, 2006, at 1:32 PM, Henk Langeveld wrote:> Bady, Brant RBCM:EX wrote: >> Part of the archiving process is to generate checksums (I happen >> to use >> MD5), and store them with other metadata about the digital object in >> order to verify data integrity and demonstrate the authenticity of >> the >> digital object over time. > >> Wouldn''t it be helpful if there was a utility to access/read the >> checksum data created by ZFS, and use it for those same purposes. > > Doesn''t ZFS use block-level checksums? >> Hoping to see something like that in a future release, or a >> command line >> utility that could do the same. > > It might be possible to add a user set property to a file with the > md5sum and > a timestamp when it was computed. > > But what would this protect against? If you need to avoid > tampering, you > need the checksums offline anyway - cf. tripwire. > > Cheers, > Henk >Better still would be the forthcoming cryptographic extensions in some kind of digital-signature mode. ckl
James C. McPherson
2006-Sep-14 20:49 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Bady, Brant RBCM:EX wrote:> I am working in the area of archiving (in the true send of the word - > e.g. using the OAIS reference model) electronic data for long term > preservation and access. ZFS now makes magnetic disk arrays a bit more > suitable for that. > > Part of the archiving process is to generate checksums (I happen to use > MD5), and store them with other metadata about the digital object in > order to verify data integrity and demonstrate the authenticity of the > digital object over time. > > Wouldn?t it be helpful if there was a utility to access/read the > checksum data created by ZFS, and use it for those same purposes.Would you want a single checksum per file, or the list of every checksum for every block that the file referenced? The second option might get unwieldy. The first option - a meta-checksum if you like - would require some interesting design. James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/pub/2/1ab/967
Bady, Brant RBCM:EX
2006-Sep-14 20:59 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Actually to clarify - what I want to do is to be able to read the associated checksums ZFS creates for a file and then store them in an external system e.g. an oracle database most likely Its just a way of avoiding having to do MD5''s on everything when ZFS is doing checksums as well. If ZFS does block level checksums, then I guess that''s not so easy to use them in that way. I will check out the crypto extensions when they become available. Thanks>Brant Bady >Access and Information Management >Royal British Columbia Museum >Telephone: (250) 387-4126 >Email: Brant.Bady at royalbcmuseum.bc.ca > >-----Original Message----- From: Chad.Lewis at Sun.COM [mailto:Chad.Lewis at Sun.COM] Sent: Thursday, September 14, 2006 1:46 PM To: Henk Langeveld Cc: Bady, Brant RBCM:EX; zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature On Sep 14, 2006, at 1:32 PM, Henk Langeveld wrote:> Bady, Brant RBCM:EX wrote: >> Part of the archiving process is to generate checksums (I happen to >> use MD5), and store them with other metadata about the digital object>> in order to verify data integrity and demonstrate the authenticity of>> the digital object over time. > >> Wouldn''t it be helpful if there was a utility to access/read the >> checksum data created by ZFS, and use it for those same purposes. > > Doesn''t ZFS use block-level checksums? >> Hoping to see something like that in a future release, or a command >> line utility that could do the same. > > It might be possible to add a user set property to a file with the > md5sum and a timestamp when it was computed. > > But what would this protect against? If you need to avoid tampering, > you need the checksums offline anyway - cf. tripwire. > > Cheers, > Henk >Better still would be the forthcoming cryptographic extensions in some kind of digital-signature mode. ckl
Nicolas Williams
2006-Sep-14 22:08 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote:> Bady, Brant RBCM:EX wrote: > >Part of the archiving process is to generate checksums (I happen to use > >MD5), and store them with other metadata about the digital object in > >order to verify data integrity and demonstrate the authenticity of the > >digital object over time. > > >Wouldn''t it be helpful if there was a utility to access/read the > >checksum data created by ZFS, and use it for those same purposes. > > Doesn''t ZFS use block-level checksums?Yes, but the checksum is stored with the pointer. So then, for each file/directory there''s a dnode, and that dnode has several block pointers to data blocks or indirect blocks, and indirect blocks have pointers to... and so on. If a bit of data in a file changes, then a new block will be written, and the pointer to the previous block will be changed in the indirect block that pointed to it or the dnode itself if there was no indirect block, and so on, and a new block will be written for each indirect block and dnode so modified. All in one transaction. That''s how COW works. And this will necessarily change any checksum of the dnode itself (assuming there are no collisions in the checksum algorithm). So, a checksum of a dnode will capture the entire file''s contents and meta-data. Read from the file, update the atime, and so change its checksum. ZFS could export a dnode checksum that only covers the data, and another that covers both, data and meta-data. Of course, a filesystem "scrub" (if one is implemented, but I think it will be necessary) would change all such checksums. So these checksums may not have the desired property.> >Hoping to see something like that in a future release, or a command line > >utility that could do the same. > > It might be possible to add a user set property to a file with the md5sum > and > a timestamp when it was computed.That would be slow.> But what would this protect against? If you need to avoid tampering, you > need the checksums offline anyway - cf. tripwire.ZFS can very quickly compute a checksum of a file''s data by checksumming all the top-level block pointers in the file''s dnode. Or the data and meta-data by checksumming the entire dnode. That''s O(1), no matter how large the file. That''d be nice indeed! But because of the semantics for when such checksums can/could change (see above), ZFS checksums can only be used to detect the possiblity of change, and so there may be false positives, IMO. Which means that for tamper detection one would need to compute a checksum of the file contents and then store it and the ZFS checksum together, using the ZFS checksum only as a way to optimize against checksumming the entire file most of the time. Nico --
Mike Gerdts
2006-Sep-14 23:26 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On 9/14/06, Chad Lewis <Chad.Lewis at sun.com> wrote:> Better still would be the forthcoming cryptographic extensions in some > kind of digital-signature mode.When I first saw extended attributes I thought that would be a great place to store a digital signature of the file. I''m not saying that it is up to ZFS to generate or manage the signature. The nice thing about it is that so long as the private key is secret, the signature stays with the file as it is moved, taken to tape, other file systems, etc. so long as the file manipulation mechanisms support extended-attributes. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Nicolas Williams
2006-Sep-14 23:40 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Thu, Sep 14, 2006 at 06:26:46PM -0500, Mike Gerdts wrote:> On 9/14/06, Chad Lewis <Chad.Lewis at sun.com> wrote: > >Better still would be the forthcoming cryptographic extensions in some > >kind of digital-signature mode. > > When I first saw extended attributes I thought that would be a great > place to store a digital signature of the file. I''m not saying that > it is up to ZFS to generate or manage the signature. > > The nice thing about it is that so long as the private key is secret, > the signature stays with the file as it is moved, taken to tape, other > file systems, etc. so long as the file manipulation mechanisms support > extended-attributes.Hmm. Picture a magic attribute that returns a checksum of the file''s contents and which recomputes this checksum only the first time it is read after the file has changed. Internally ZFS could invalidate this checksum whenever the file changes, then recompute and store the attribute when the attribute is next read. That sounds useful, but if read at unexpected times it would be observed as a slow down by users. I think I''d rather ZFS export a ZFS checksum (O(1)) instead (also as a magic attribute) and let auditing systems do any additional checksumming explicitly. Nico --
Matthew Ahrens
2006-Sep-14 23:42 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Bady, Brant RBCM:EX wrote:> Actually to clarify - what I want to do is to be able to read the > associated checksums ZFS creates for a file and then store them in an > external system e.g. an oracle database most likelyRather than storing the checksum externally, you could simply let ZFS verify the integrity of the data. Whenever you want to check it, just run ''zpool scrub''. Of course, if you don''t trust ZFS to do that for you, you probably wouldn''t trust it to tell you the checksum either! --matt
Ceri Davies
2006-Sep-15 08:31 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote:> On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote: > > Bady, Brant RBCM:EX wrote: > > >Part of the archiving process is to generate checksums (I happen to use > > >MD5), and store them with other metadata about the digital object in > > >order to verify data integrity and demonstrate the authenticity of the > > >digital object over time. > > > > >Wouldn''t it be helpful if there was a utility to access/read the > > >checksum data created by ZFS, and use it for those same purposes. > > > > Doesn''t ZFS use block-level checksums? > > Yes, but the checksum is stored with the pointer. > > So then, for each file/directory there''s a dnode, and that dnode has > several block pointers to data blocks or indirect blocks, and indirect > blocks have pointers to... and so on.Does ZFS have block fragments? If so, then updating an unrelated file would change the checksum. Ceri -- That must be wonderful! I don''t understand it at all. -- Moliere -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060915/da957349/attachment.bin>
Luke Scharf
2006-Sep-15 15:14 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Matthew Ahrens wrote:> Bady, Brant RBCM:EX wrote: >> Actually to clarify - what I want to do is to be able to read the >> associated checksums ZFS creates for a file and then store them in an >> external system e.g. an oracle database most likely > > Rather than storing the checksum externally, you could simply let ZFS > verify the integrity of the data. Whenever you want to check it, just > run ''zpool scrub''. > > Of course, if you don''t trust ZFS to do that for you, you probably > wouldn''t trust it to tell you the checksum either!It sounded to me like he wanted to implement tripwire, but save some time and CPU power by querying the checksumming-work that was already done by ZFS. (Otherwise, the CPU would have to checksum the ZFS files AND then checksum them again for tripwire.) If that''s what he''s trying to do, the data-integrity provided by ZFS doesn''t do you any good -- because the changes are going to come from the same system-calls that a legitimate user would choose. -Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3271 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060915/3ca6141b/attachment.bin>
Luke Scharf
2006-Sep-15 15:28 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
Luke Scharf wrote:> It sounded to me like he wanted to implement tripwire, but save some > time and CPU power by querying the checksumming-work that was already > done by ZFS.Nevermind. The e-mail client that I chose to use broke up the thread, and I didn''t see that the issue had already been thoroughly discussed. -Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3271 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060915/d7193109/attachment.bin>
Nicolas Williams
2006-Sep-15 15:55 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Fri, Sep 15, 2006 at 09:31:04AM +0100, Ceri Davies wrote:> On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote: > > Yes, but the checksum is stored with the pointer. > > > > So then, for each file/directory there''s a dnode, and that dnode has > > several block pointers to data blocks or indirect blocks, and indirect > > blocks have pointers to... and so on. > > Does ZFS have block fragments? If so, then updating an unrelated file > would change the checksum.No. It has variable sized blocks. A block pointer in ZFS is much more than just a block number. Among other things a block pointer has the checksum of the block it points to. See the on-disk layout document for more info. There is no way that updating one file could change another''s checksum. What does matter is that the ZFS checksum of a file, to be O(1), depends on the on-disk layout of the file, and anything that would change that (today nothing would) would change the ZFS checksum of the file. So I think that ZFS checksums, if exposed, are best left as a file change test optimization, not as an actual checksum of the file. Nico --
Ceri Davies
2006-Sep-15 16:08 UTC
[zfs-discuss] Access to ZFS checksums would be nice and very useful feature
On Fri, Sep 15, 2006 at 10:55:48AM -0500, Nicolas Williams wrote:> On Fri, Sep 15, 2006 at 09:31:04AM +0100, Ceri Davies wrote: > > On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote: > > > Yes, but the checksum is stored with the pointer. > > > > > > So then, for each file/directory there''s a dnode, and that dnode has > > > several block pointers to data blocks or indirect blocks, and indirect > > > blocks have pointers to... and so on. > > > > Does ZFS have block fragments? If so, then updating an unrelated file > > would change the checksum. > > No. It has variable sized blocks.OK, thanks.> A block pointer in ZFS is much more than just a block number. Among > other things a block pointer has the checksum of the block it points to. > See the on-disk layout document for more info.I am aware of the block checksum, but haven''t got round to reading the on disk format document yet, hence the question.> There is no way that updating one file could change another''s checksum.That follows from the non-existence of fragments, sure. Cheers, Ceri -- That must be wonderful! I don''t understand it at all. -- Moliere -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060915/5e7e53bf/attachment.bin>