thr3ads.net - Btrfs devel - btrfs wastes disk space after snapshot deletetion. [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Moshe

2013-Feb-04 09:08 UTC

btrfs wastes disk space after snapshot deletetion.

Hello,

If I write large sequential file on snapshot, then create another snapshot, 
overwrite file with small amount of data and delete first snapshot, second 
snapshot has very large data extent and only small part of it is used.
For example if I use following sequence:
mkfs.btrfs /dev/sdn
mount -o noatime,nodatacow,nospace_cache /dev/sdn /mnt/b
btrfs sub snap /mnt/b /mnt/b/snap1
dd if=/dev/zero of=/mnt/b/snap1/t count=15000 bs=65535
sync
btrfs sub snap /mnt/b/snap1 /mnt/b/snap2
dd if=/dev/zero of=/mnt/b/snap2/t seek=3 count=1 bs=2048
sync
btrfs sub delete /mnt/b/snap1
btrfs-debug-tree /dev/sdn
I see following data extents
item 6 key (257 EXTENT_DATA 0) itemoff 3537 itemsize 53
    extent data disk byte 1103101952 nr 194641920
    extent data offset 0 nr 4096 ram 194641920
    extent compression 0
item 7 key (257 EXTENT_DATA 4096) itemoff 3484 itemsize 53
    extent data disk byte 2086129664 nr 4096
    extent data offset 0 nr 4096 ram 4096
    extent compression 0

In item 6: only 4096 from 194641920 are in use. Rest of space is wasted.

If I defragment like: btrfs filesystem defragment /mnt/b/snap2/t it release 
wasted space. But I can''t use defragment because if I have few
snapshots I
need to run defragment on each snapshot and it disconnect relation between 
snapshot and create multiple copies of same data.

In our test that create and delete snapshots while writing data, we end up 
with few GBs of disk space wasted.

Is it possible to limit size of allocated data extents?
Is it possible to defragment subvolume without breaking snapshots relations?
Any other idea how to recover wasted space?

Thanks,
Moshe Melnikov


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Feb-04 15:56 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

On Mon, Feb 04, 2013 at 02:08:01AM -0700, Moshe wrote:> Hello,
> 
> If I write large sequential file on snapshot, then create another snapshot,
> overwrite file with small amount of data and delete first snapshot, second 
> snapshot has very large data extent and only small part of it is used.
> For example if I use following sequence:
> mkfs.btrfs /dev/sdn
> mount -o noatime,nodatacow,nospace_cache /dev/sdn /mnt/b
> btrfs sub snap /mnt/b /mnt/b/snap1
> dd if=/dev/zero of=/mnt/b/snap1/t count=15000 bs=65535
> sync
> btrfs sub snap /mnt/b/snap1 /mnt/b/snap2
> dd if=/dev/zero of=/mnt/b/snap2/t seek=3 count=1 bs=2048
> sync
> btrfs sub delete /mnt/b/snap1
> btrfs-debug-tree /dev/sdn
> I see following data extents
> item 6 key (257 EXTENT_DATA 0) itemoff 3537 itemsize 53
>     extent data disk byte 1103101952 nr 194641920
>     extent data offset 0 nr 4096 ram 194641920
>     extent compression 0
> item 7 key (257 EXTENT_DATA 4096) itemoff 3484 itemsize 53
>     extent data disk byte 2086129664 nr 4096
>     extent data offset 0 nr 4096 ram 4096
>     extent compression 0
> 
> In item 6: only 4096 from 194641920 are in use. Rest of space is wasted.
> 
> If I defragment like: btrfs filesystem defragment /mnt/b/snap2/t it release
> wasted space. But I can''t use defragment because if I have few
snapshots I
> need to run defragment on each snapshot and it disconnect relation between 
> snapshot and create multiple copies of same data.
> 
> In our test that create and delete snapshots while writing data, we end up 
> with few GBs of disk space wasted.
> 
> Is it possible to limit size of allocated data extents?
> Is it possible to defragment subvolume without breaking snapshots
relations?
> Any other idea how to recover wasted space?
This is all by design to try and limit the size of the extent tree.  Instead of
splitting references in the extent tree to account for the split extent we do it
in the file tree.  In your case it results in a lot of wasted space.  This is on
the list of things to fix, we will just split the references in the extent tree
and deal with the larger extent tree, but it''s on the back burner while
we get
things a bit more stable.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Moshe

2013-Feb-05 09:09 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

Thanks for your reply Josef.
I want to experiment with extents size, to see how it influence size of 
extent tree. Can you point me to code that I can change to limit size of 
data extents?

Thanks,
Moshe Melnikov


-----Original Message----- 
From: Josef Bacik
Sent: Monday, February 04, 2013 5:56 PM
To: Moshe
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs wastes disk space after snapshot deletetion.

On Mon, Feb 04, 2013 at 02:08:01AM -0700, Moshe wrote:> Hello,
>
> If I write large sequential file on snapshot, then create another 
> snapshot,
> overwrite file with small amount of data and delete first snapshot, second
> snapshot has very large data extent and only small part of it is used.
> For example if I use following sequence:
> mkfs.btrfs /dev/sdn
> mount -o noatime,nodatacow,nospace_cache /dev/sdn /mnt/b
> btrfs sub snap /mnt/b /mnt/b/snap1
> dd if=/dev/zero of=/mnt/b/snap1/t count=15000 bs=65535
> sync
> btrfs sub snap /mnt/b/snap1 /mnt/b/snap2
> dd if=/dev/zero of=/mnt/b/snap2/t seek=3 count=1 bs=2048
> sync
> btrfs sub delete /mnt/b/snap1
> btrfs-debug-tree /dev/sdn
> I see following data extents
> item 6 key (257 EXTENT_DATA 0) itemoff 3537 itemsize 53
>     extent data disk byte 1103101952 nr 194641920
>     extent data offset 0 nr 4096 ram 194641920
>     extent compression 0
> item 7 key (257 EXTENT_DATA 4096) itemoff 3484 itemsize 53
>     extent data disk byte 2086129664 nr 4096
>     extent data offset 0 nr 4096 ram 4096
>     extent compression 0
>
> In item 6: only 4096 from 194641920 are in use. Rest of space is wasted.
>
> If I defragment like: btrfs filesystem defragment /mnt/b/snap2/t it 
> release
> wasted space. But I can''t use defragment because if I have few
snapshots I
> need to run defragment on each snapshot and it disconnect relation between
> snapshot and create multiple copies of same data.
>
> In our test that create and delete snapshots while writing data, we end up
> with few GBs of disk space wasted.
>
> Is it possible to limit size of allocated data extents?
> Is it possible to defragment subvolume without breaking snapshots 
> relations?
> Any other idea how to recover wasted space?
This is all by design to try and limit the size of the extent tree.  Instead 
of
splitting references in the extent tree to account for the split extent we 
do it
in the file tree.  In your case it results in a lot of wasted space.  This 
is on
the list of things to fix, we will just split the references in the extent 
tree
and deal with the larger extent tree, but it''s on the back burner while
we
get
things a bit more stable.  Thanks,

Josef 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2013-Feb-05 14:41 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

On Tue, Feb 05, 2013 at 02:09:02AM -0700, Moshe wrote:> Thanks for your reply Josef.
> I want to experiment with extents size, to see how it influence size of 
> extent tree. Can you point me to code that I can change to limit size of 
> data extents?

So it''s not the size of the data extents, it''s how we deal
with references to
them.  Let me map out what happens now

1) we do a write and create a 1 gig data extent.
2) create a file extent item in the fs tree pointing to the extent
3) create an reference with a count of 1 for the entire extent
4) create a snapshot of the data extent
5) write 4k to the middle of the extent
6a) we cow down to the file extent item we need to split and add a ref to the
	original 1 gig extent because of the snapshot.
6b) split the file extent item in the fs tree into 3 extents.
	- one from 0 to the random offset
	- one from random offset to random offset + 4k
	- one from random offset + 4k to the end of the original extent
		this points to an offset within the original 1 gig extent
6c) in the split we increase the refcount of the original 1 gig extent by 1
7) add an extent reference for the 4k extent we wrote.

So at the end of this our original 1 gig extent has 3 references, 1 for the
original snapshot with it''s unmodified extent, 2 for the snapshot which
includes
a reference for each chunk of the split extent.  In order to free up this space
you would either have to overwrite the entirety of the remaining chunks of the
original extent in the snapshot and free up the extent in the original fs by
some means.

So say you delete the file in the original file system, and then do something
horrible like overwrite every other 4k block in the file you''d end up
with
around 1.5gig of data in use for logically 1 gig of actual space.  The way to
fix this is in 6c.

In file.c you have __btrfs_drop_extents which does this btrfs_inc_extent_ref on
an extent it has to split on two sides.  Instead of doing this we would probably
add another delayed extent operation for splitting the extent reference.  So
instead of having file extents that span large areas and stick around forever,
we just fix the extent references to account for the actual file extents, so
when you drop a part you actually recover the space.  There is no code for this
yet because this is kind of an overhaul of how things are done, and I''m
still
getting "if I do blah it panics the box" emails so I want to spend
time
stabilizing.  If this is something you want to tackle go for it, but be prepared
to spend a few months on it.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Moshe Melnikov

2013-Feb-05 15:27 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

Is it possible in step 1) to create few smaller extents instead of 1 gig 
data extent?

Moshe

-----Original Message----- 
From: Josef Bacik
Sent: Tuesday, February 05, 2013 4:41 PM
To: Moshe
Cc: Josef Bacik ; linux-btrfs@vger.kernel.org
Subject: Re: btrfs wastes disk space after snapshot deletetion.

On Tue, Feb 05, 2013 at 02:09:02AM -0700, Moshe wrote:> Thanks for your reply Josef.
> I want to experiment with extents size, to see how it influence size of
> extent tree. Can you point me to code that I can change to limit size of
> data extents?

So it''s not the size of the data extents, it''s how we deal
with references
to
them.  Let me map out what happens now

1) we do a write and create a 1 gig data extent.
2) create a file extent item in the fs tree pointing to the extent
3) create an reference with a count of 1 for the entire extent
4) create a snapshot of the data extent
5) write 4k to the middle of the extent
6a) we cow down to the file extent item we need to split and add a ref to 
the
original 1 gig extent because of the snapshot.
6b) split the file extent item in the fs tree into 3 extents.
- one from 0 to the random offset
- one from random offset to random offset + 4k
- one from random offset + 4k to the end of the original extent
this points to an offset within the original 1 gig extent
6c) in the split we increase the refcount of the original 1 gig extent by 1
7) add an extent reference for the 4k extent we wrote.

So at the end of this our original 1 gig extent has 3 references, 1 for the
original snapshot with it''s unmodified extent, 2 for the snapshot which
includes
a reference for each chunk of the split extent.  In order to free up this 
space
you would either have to overwrite the entirety of the remaining chunks of 
the
original extent in the snapshot and free up the extent in the original fs by
some means.

So say you delete the file in the original file system, and then do 
something
horrible like overwrite every other 4k block in the file you''d end up
with
around 1.5gig of data in use for logically 1 gig of actual space.  The way 
to
fix this is in 6c.

In file.c you have __btrfs_drop_extents which does this btrfs_inc_extent_ref 
on
an extent it has to split on two sides.  Instead of doing this we would 
probably
add another delayed extent operation for splitting the extent reference.  So
instead of having file extents that span large areas and stick around 
forever,
we just fix the extent references to account for the actual file extents, so
when you drop a part you actually recover the space.  There is no code for 
this
yet because this is kind of an overhaul of how things are done, and I''m
still
getting "if I do blah it panics the box" emails so I want to spend
time
stabilizing.  If this is something you want to tackle go for it, but be 
prepared
to spend a few months on it.  Thanks,

Josef 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2013-Feb-06 02:24 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

On Tue, Feb 05, 2013 at 05:27:45PM +0200, Moshe Melnikov
wrote:> 
> Is it possible in step 1) to create few smaller extents instead of 1
> gig data extent?
DIO or O_SYNC can help to create extents whose size is your
''bs=xxx'',
but you know, this is not expected as fast as buffered write.

thanks,
liubo
> 
> Moshe
> 
> -----Original Message----- From: Josef Bacik
> Sent: Tuesday, February 05, 2013 4:41 PM
> To: Moshe
> Cc: Josef Bacik ; linux-btrfs@vger.kernel.org
> Subject: Re: btrfs wastes disk space after snapshot deletetion.
> 
> On Tue, Feb 05, 2013 at 02:09:02AM -0700, Moshe wrote:
> >Thanks for your reply Josef.
> >I want to experiment with extents size, to see how it influence size of
> >extent tree. Can you point me to code that I can change to limit size
of
> >data extents?
> 
> 
> So it''s not the size of the data extents, it''s how we
deal with
> references to
> them.  Let me map out what happens now
> 
> 1) we do a write and create a 1 gig data extent.
> 2) create a file extent item in the fs tree pointing to the extent
> 3) create an reference with a count of 1 for the entire extent
> 4) create a snapshot of the data extent
> 5) write 4k to the middle of the extent
> 6a) we cow down to the file extent item we need to split and add a
> ref to the
> original 1 gig extent because of the snapshot.
> 6b) split the file extent item in the fs tree into 3 extents.
> - one from 0 to the random offset
> - one from random offset to random offset + 4k
> - one from random offset + 4k to the end of the original extent
> this points to an offset within the original 1 gig extent
> 6c) in the split we increase the refcount of the original 1 gig extent by 1
> 7) add an extent reference for the 4k extent we wrote.
> 
> So at the end of this our original 1 gig extent has 3 references, 1 for the
> original snapshot with it''s unmodified extent, 2 for the snapshot
> which includes
> a reference for each chunk of the split extent.  In order to free up
> this space
> you would either have to overwrite the entirety of the remaining
> chunks of the
> original extent in the snapshot and free up the extent in the original fs
by
> some means.
> 
> So say you delete the file in the original file system, and then do
> something
> horrible like overwrite every other 4k block in the file you''d end
up with
> around 1.5gig of data in use for logically 1 gig of actual space.
> The way to
> fix this is in 6c.
> 
> In file.c you have __btrfs_drop_extents which does this
> btrfs_inc_extent_ref on
> an extent it has to split on two sides.  Instead of doing this we
> would probably
> add another delayed extent operation for splitting the extent reference. 
So
> instead of having file extents that span large areas and stick
> around forever,
> we just fix the extent references to account for the actual file extents,
so
> when you drop a part you actually recover the space.  There is no
> code for this
> yet because this is kind of an overhaul of how things are done, and
> I''m still
> getting "if I do blah it panics the box" emails so I want to
spend time
> stabilizing.  If this is something you want to tackle go for it, but
> be prepared
> to spend a few months on it.  Thanks,
> 
> Josef
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2013-Feb-06 02:34 UTC

head link

Re: btrfs wastes disk space after snapshot deletetion.

On Mon, Feb 04, 2013 at 11:08:01AM +0200, Moshe wrote:> Hello,
> 
> If I write large sequential file on snapshot, then create another
> snapshot, overwrite file with small amount of data and delete first
> snapshot, second snapshot has very large data extent and only small
> part of it is used.
> For example if I use following sequence:
> mkfs.btrfs /dev/sdn
> mount -o noatime,nodatacow,nospace_cache /dev/sdn /mnt/b
> btrfs sub snap /mnt/b /mnt/b/snap1
> dd if=/dev/zero of=/mnt/b/snap1/t count=15000 bs=65535
> sync
> btrfs sub snap /mnt/b/snap1 /mnt/b/snap2
> dd if=/dev/zero of=/mnt/b/snap2/t seek=3 count=1 bs=2048
> sync
> btrfs sub delete /mnt/b/snap1
> btrfs-debug-tree /dev/sdn
> I see following data extents
> item 6 key (257 EXTENT_DATA 0) itemoff 3537 itemsize 53
>    extent data disk byte 1103101952 nr 194641920
>    extent data offset 0 nr 4096 ram 194641920
>    extent compression 0
> item 7 key (257 EXTENT_DATA 4096) itemoff 3484 itemsize 53
>    extent data disk byte 2086129664 nr 4096
>    extent data offset 0 nr 4096 ram 4096
>    extent compression 0
> 
> In item 6: only 4096 from 194641920 are in use. Rest of space is wasted.
> 
> If I defragment like: btrfs filesystem defragment /mnt/b/snap2/t it
> release wasted space. But I can''t use defragment because if I have
> few snapshots I need to run defragment on each snapshot and it
> disconnect relation between snapshot and create multiple copies of
> same data.
Well, just for this case, you can try our experimental feature for your
test, ''snapshot-aware defrag'', which is designed for this kind
of problems.

It''s still floating on the ML, and I''ve no idea when
it''ll land in
upstream.

Currently the latest patch is V6, and NOTE: if you want to use
autodefrag(which is recommended), you''d like to apply the v6 patch
along
with another patch for autodefrag, otherwise it may crash your box.

FYI,
- snapshot-aware defrag
	https://patchwork.kernel.org/patch/2058911/
- autodefrag fix
	https://patchwork.kernel.org/patch/2058921/

thanks,
liubo
> 
> In our test that create and delete snapshots while writing data, we
> end up with few GBs of disk space wasted.
> 
> Is it possible to limit size of allocated data extents?
> Is it possible to defragment subvolume without breaking snapshots
relations?
> Any other idea how to recover wasted space?
> 
> Thanks,
> Moshe Melnikov
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Feb 2013 - btrfs wastes disk space after snapshot deletetion.

btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.

Re: btrfs wastes disk space after snapshot deletetion.