For purposes of data deduplication and data synchronisation, it would be a powerful tool to expose file data checksums. Since eg BTRFS uses the crc32c algorithm [1], it''s possible to compute the file''s overall CRC from the accumulation of the CRCs from all it''s extents'' CRCs. For now, exposing this via an IOCTL may be sufficient, though any ideas for introducing it in a more standard way? (it''s a pity that when stat64 was introduced, reserved fields weren''t added) Thanks, Daniel [1] http://www.research.ibm.com/haifa/satran/ips/Vince-Luben-crc32c-01.pdf -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel J Blueman <daniel.blueman@gmail.com> writes:> For purposes of data deduplication and data synchronisation, it would > be a powerful tool to expose file data checksums. > > Since eg BTRFS uses the crc32c algorithm [1], it''s possible to compute > the file''s overall CRC from the accumulation of the CRCs from all it''s > extents'' CRCs. > > For now, exposing this via an IOCTL may be sufficient, though any > ideas for introducing it in a more standard way? (it''s a pity that > when stat64 was introduced, reserved fields weren''t added)The problem of doing it in any "standard way" is that it would hard code the way the file system does checksums in the applications. So the file system could never change it without breaking user space. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 27, 2010 at 12:30 PM, Andi Kleen <andi@firstfloor.org> wrote:> Daniel J Blueman <daniel.blueman@gmail.com> writes: > >> For purposes of data deduplication and data synchronisation, it would >> be a powerful tool to expose file data checksums. >> >> Since eg BTRFS uses the crc32c algorithm [1], it''s possible to compute >> the file''s overall CRC from the accumulation of the CRCs from all it''s >> extents'' CRCs. >> >> For now, exposing this via an IOCTL may be sufficient, though any >> ideas for introducing it in a more standard way? (it''s a pity that >> when stat64 was introduced, reserved fields weren''t added) > > The problem of doing it in any "standard way" is that it would > hard code the way the file system does checksums in the applications. > So the file system could never change it without breaking > user space.I guess the filesystem would need to express this in the resulting data-structure, eg: - type 1 corresponds to using the crc32c algorithm with starting seed N and accumulating ascending over data extents, padding with modulus remainder or sparse holes with 0 - type 2 etc The next question, is does filesystem (eg BTRFS) compression come before or after checksumming? -- Daniel J Blueman
On Wed, Jan 27, 2010 at 01:23:28PM +0000, Daniel J Blueman wrote:> On Wed, Jan 27, 2010 at 12:30 PM, Andi Kleen <andi@firstfloor.org> wrote: > > Daniel J Blueman <daniel.blueman@gmail.com> writes: > > > >> For purposes of data deduplication and data synchronisation, it would > >> be a powerful tool to expose file data checksums. > >> > >> Since eg BTRFS uses the crc32c algorithm [1], it''s possible to compute > >> the file''s overall CRC from the accumulation of the CRCs from all it''s > >> extents'' CRCs. > >> > >> For now, exposing this via an IOCTL may be sufficient, though any > >> ideas for introducing it in a more standard way? (it''s a pity that > >> when stat64 was introduced, reserved fields weren''t added) > > > > The problem of doing it in any "standard way" is that it would > > hard code the way the file system does checksums in the applications. > > So the file system could never change it without breaking > > user space.At the end of the day the checksums are also hard coded on disk. We can''t add a new way without continuing to support the old one.> > I guess the filesystem would need to express this in the resulting > data-structure, eg: > - type 1 corresponds to using the crc32c algorithm with starting seed > N and accumulating ascending over data extents, padding with modulus > remainder or sparse holes with 0 > - type 2 etcYes, if they were exported to userland we''d need to export version info.> > The next question, is does filesystem (eg BTRFS) compression come > before or after checksumming?The checksums are based on what is on disk, so they are done on the compressed data. -chris