thr3ads.net - Btrfs devel - New disk format pushed out [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Chris Mason

2008-Dec-02 15:29 UTC

New disk format pushed out

Hello everyone,

I''ve pushed out Josef''s updates to make the checksum
selectable at mkfs
time (right now there''s only one choice, but we now have the disk
format
bits for more).

This means both kernel and progs have a new disk format.  The format
will change a few more times this week as we try to hammer out the 1.0
format.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Dec-09 00:13 UTC

head link

New disk format pushed out

Hello everyone,

I''ve pushed out most of the pending patches, including a few big disk
format changes.  It includes Yan Zheng''s super block duplication code
(with a few small mods for performance) and all new data checksumming
code.

The data checksumming is a big change, so I''ll paste in the changeset
description here.

Btrfs stores checksums for each data block.  Until now, they have
been stored in the subvolume trees, indexed by the inode that is
referencing the data block.  This means that when we read the inode,
we''ve probably read in at least some checksums as well.

But, this has a few problems:

* The checksums are indexed by logical offset in the file.  When
compression is on, this means we have to do the expensive checksumming
on the uncompressed data.  It would be faster if we could checksum
the compressed data instead.

* If we implement encryption, we''ll be checksumming the plain text and
storing that on disk.  This is significantly less secure.

* For either compression or encryption, we have to get the plain text
back before we can verify the checksum as correct.  This makes the raid
layer balancing and extent moving much more expensive.

* It makes the front end caching code more complex, as we have touch
the subvolume and inodes as we cache extents.

* There is potentitally one copy of the checksum in each subvolume
referencing an extent.

The solution used here is to store the extent checksums in a dedicated
tree.  This allows us to index the checksums by phyiscal extent
start and length.  It means:

* The checksum is against the data stored on disk, after any compression
or encryption is done.

* The checksum is stored in a central location, and can be verified without
following back references, or reading inodes.

This makes compression significantly faster by reducing the amount of
data that needs to be checksummed.  It will also allow much faster
raid management code in general.

The checksums are indexed by a key with a fixed objectid (a magic value
in ctree.h) and offset set to the starting byte of the extent.  This
allows us to copy the checksum items into the fsync log tree directly (or
any other tree), without having to invent a second format for them.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gabor MICSKO

2008-Dec-09 07:52 UTC

head link

Re: New disk format pushed out

As i see the standalone kernel module git repo not updated. Will it be
updated too?

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=summary


On Mon, 2008-12-08 at 19:13 -0500, Chris Mason wrote:> Hello everyone,
> 
> I''ve pushed out most of the pending patches, including a few big
disk
> format changes.  It includes Yan Zheng''s super block duplication
code
> (with a few small mods for performance) and all new data checksumming
> code.
> 
> The data checksumming is a big change, so I''ll paste in the
changeset
> description here.
> 
> Btrfs stores checksums for each data block.  Until now, they have
> been stored in the subvolume trees, indexed by the inode that is
> referencing the data block.  This means that when we read the inode,
> we''ve probably read in at least some checksums as well.
> 
> But, this has a few problems:
> 
> * The checksums are indexed by logical offset in the file.  When
> compression is on, this means we have to do the expensive checksumming
> on the uncompressed data.  It would be faster if we could checksum
> the compressed data instead.
> 
> * If we implement encryption, we''ll be checksumming the plain text
and
> storing that on disk.  This is significantly less secure.
> 
> * For either compression or encryption, we have to get the plain text
> back before we can verify the checksum as correct.  This makes the raid
> layer balancing and extent moving much more expensive.
> 
> * It makes the front end caching code more complex, as we have touch
> the subvolume and inodes as we cache extents.
> 
> * There is potentitally one copy of the checksum in each subvolume
> referencing an extent.
> 
> The solution used here is to store the extent checksums in a dedicated
> tree.  This allows us to index the checksums by phyiscal extent
> start and length.  It means:
> 
> * The checksum is against the data stored on disk, after any compression
> or encryption is done.
> 
> * The checksum is stored in a central location, and can be verified without
> following back references, or reading inodes.
> 
> This makes compression significantly faster by reducing the amount of
> data that needs to be checksummed.  It will also allow much faster
> raid management code in general.
> 
> The checksums are indexed by a key with a fixed objectid (a magic value
> in ctree.h) and offset set to the starting byte of the extent.  This
> allows us to copy the checksum items into the fsync log tree directly (or
> any other tree), without having to invent a second format for them.
> 
> -chris
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Dec-09 11:43 UTC

head link

Re: New disk format pushed out

On Tue, 2008-12-09 at 08:52 +0100, Gabor MICSKO wrote:> As i see the standalone kernel module git repo not updated. Will it be
> updated too?
> 
>
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=summary
> 
> 
Thanks, I''ve updated the standalone tree as well.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Dec-09 12:23 UTC

head link

Re: New disk format pushed out

On Mon, 2008-12-08 at 19:13 -0500, Chris Mason wrote:> Hello everyone,
> 
> I''ve pushed out most of the pending patches, including a few big
disk
> format changes.  It includes Yan Zheng''s super block duplication
code
> (with a few small mods for performance) and all new data checksumming
> code.
> 
Yan Zheng noticed that I''ve broken the space balancing code, so people
using raid should not use this yet.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2008-Dec-11 02:01 UTC

head link

Re: New disk format pushed out

On Tue, 2008-12-09 at 07:23 -0500, Chris Mason wrote:> On Mon, 2008-12-08 at 19:13 -0500, Chris Mason wrote:
> > Hello everyone,
> > 
> > I''ve pushed out most of the pending patches, including a few
big disk
> > format changes.  It includes Yan Zheng''s super block
duplication code
> > (with a few small mods for performance) and all new data checksumming
> > code.
> > 
> 
> Yan Zheng noticed that I''ve broken the space balancing code, so
people
> using raid should not use this yet.
This has been fixed now, happy testing.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Dec 2008 - New disk format pushed out

New disk format pushed out

New disk format pushed out

Re: New disk format pushed out

Re: New disk format pushed out

Re: New disk format pushed out

Re: New disk format pushed out