thr3ads.net - Btrfs devel - atime and filesystems with snapshots (especially Btrfs) [May 2012]

If this information is useful, please help other people find it:
Share via:

Alexander Block

2012-May-25 15:35 UTC

atime and filesystems with snapshots (especially Btrfs)

Hello,

(this is a resend with proper CC for linux-fsdevel and linux-kernel)

I would like to start a discussion on atime in Btrfs (and other
filesystems with snapshot support).

As atime is updated on every access of a file or directory, we get
many changes to the trees in btrfs that as always trigger cow
operations. This is no problem as long as the changed tree blocks are
not shared by other subvolumes. Performance is also not a problem, no
matter if shared or not (thanks to relatime which is the default).
The problems start when someone starts to use snapshots. If you for
example snapshot your root and continue working on your root, after
some time big parts of the tree will be cowed and unshared. In the
worst case, the whole tree gets unshared and thus takes up the double
space. Normally, a user would expect to only use extra space for a
tree if he changes something.
A worst case scenario would be if someone took regular snapshots for
backup purposes and later greps the contents of all snapshots to find
a specific file. This would touch all inodes in all trees and thus
make big parts of the trees unshared.

relatime (which is the default) reduces this problem a little bit, as
it by default only updates atime once a day. This means, if anyone
wants to test this problem, mount with relatime disabled or change the
system date before you try to update atime (that''s the way i tested
it).

As a solution, I would suggest to make noatime the default for btrfs.
I''m however not sure if it is allowed in linux to have different
default mount options for different filesystem types. I know this
discussion pops up every few years (last time it resulted in making
relatime the default). But this is a special case for btrfs. atime is
already bad on other filesystems, but it''s much much worse in btrfs.

Alex.

Josef Bacik

2012-May-25 15:42 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 05:35:37PM +0200, Alexander Block
wrote:> Hello,
> 
> (this is a resend with proper CC for linux-fsdevel and linux-kernel)
> 
> I would like to start a discussion on atime in Btrfs (and other
> filesystems with snapshot support).
> 
> As atime is updated on every access of a file or directory, we get
> many changes to the trees in btrfs that as always trigger cow
> operations. This is no problem as long as the changed tree blocks are
> not shared by other subvolumes. Performance is also not a problem, no
> matter if shared or not (thanks to relatime which is the default).
> The problems start when someone starts to use snapshots. If you for
> example snapshot your root and continue working on your root, after
> some time big parts of the tree will be cowed and unshared. In the
> worst case, the whole tree gets unshared and thus takes up the double
> space. Normally, a user would expect to only use extra space for a
> tree if he changes something.
> A worst case scenario would be if someone took regular snapshots for
> backup purposes and later greps the contents of all snapshots to find
> a specific file. This would touch all inodes in all trees and thus
> make big parts of the trees unshared.
> 
> relatime (which is the default) reduces this problem a little bit, as
> it by default only updates atime once a day. This means, if anyone
> wants to test this problem, mount with relatime disabled or change the
> system date before you try to update atime (that''s the way i
tested
> it).
> 
> As a solution, I would suggest to make noatime the default for btrfs.
> I''m however not sure if it is allowed in linux to have different
> default mount options for different filesystem types. I know this
> discussion pops up every few years (last time it resulted in making
> relatime the default). But this is a special case for btrfs. atime is
> already bad on other filesystems, but it''s much much worse in
btrfs.
> 
Just mount with -o noatime, there''s no chance of turning something like
that on
by default since it will break some applications (notably mutt).  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Block

2012-May-25 15:59 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 5:42 PM, Josef Bacik <josef@redhat.com>
wrote:> On Fri, May 25, 2012 at 05:35:37PM +0200, Alexander Block wrote:
>> Hello,
>>
>> (this is a resend with proper CC for linux-fsdevel and linux-kernel)
>>
>> I would like to start a discussion on atime in Btrfs (and other
>> filesystems with snapshot support).
>>
>> As atime is updated on every access of a file or directory, we get
>> many changes to the trees in btrfs that as always trigger cow
>> operations. This is no problem as long as the changed tree blocks are
>> not shared by other subvolumes. Performance is also not a problem, no
>> matter if shared or not (thanks to relatime which is the default).
>> The problems start when someone starts to use snapshots. If you for
>> example snapshot your root and continue working on your root, after
>> some time big parts of the tree will be cowed and unshared. In the
>> worst case, the whole tree gets unshared and thus takes up the double
>> space. Normally, a user would expect to only use extra space for a
>> tree if he changes something.
>> A worst case scenario would be if someone took regular snapshots for
>> backup purposes and later greps the contents of all snapshots to find
>> a specific file. This would touch all inodes in all trees and thus
>> make big parts of the trees unshared.
>>
>> relatime (which is the default) reduces this problem a little bit, as
>> it by default only updates atime once a day. This means, if anyone
>> wants to test this problem, mount with relatime disabled or change the
>> system date before you try to update atime (that''s the way i
tested
>> it).
>>
>> As a solution, I would suggest to make noatime the default for btrfs.
>> I''m however not sure if it is allowed in linux to have
different
>> default mount options for different filesystem types. I know this
>> discussion pops up every few years (last time it resulted in making
>> relatime the default). But this is a special case for btrfs. atime is
>> already bad on other filesystems, but it''s much much worse in
btrfs.
>>
>
> Just mount with -o noatime, there''s no chance of turning something
like that on
> by default since it will break some applications (notably mutt).  Thanks,
>
> Josef
I know about the discussions regarding compatibility with existing
applications. The problem here is, that it is not only a compatibility
problem. Having atime enabled by default, may give you ENOSPC
for reasons that a normal user does not understand or expect.
As a normal user, I would think: If I never change something, why
does it then take up more space just by reading it?

Andreas Dilger

2012-May-25 16:28 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On 2012-05-25, at 9:59, Alexander Block <ablock84@googlemail.com> wrote:
> On Fri, May 25, 2012 at 5:42 PM, Josef Bacik <josef@redhat.com>
wrote:
>> On Fri, May 25, 2012 at 05:35:37PM +0200, Alexander Block wrote:
>>> Hello,
>>> 
>>> (this is a resend with proper CC for linux-fsdevel and
linux-kernel)
>>> 
>>> I would like to start a discussion on atime in Btrfs (and other
>>> filesystems with snapshot support).
>>> 
>>> As atime is updated on every access of a file or directory, we get
>>> many changes to the trees in btrfs that as always trigger cow
>>> operations. This is no problem as long as the changed tree blocks
are
>>> not shared by other subvolumes. Performance is also not a problem,
no
>>> matter if shared or not (thanks to relatime which is the default).
>>> The problems start when someone starts to use snapshots. If you for
>>> example snapshot your root and continue working on your root, after
>>> some time big parts of the tree will be cowed and unshared. In the
>>> worst case, the whole tree gets unshared and thus takes up the
double
>>> space. Normally, a user would expect to only use extra space for a
>>> tree if he changes something.
>>> A worst case scenario would be if someone took regular snapshots
for
>>> backup purposes and later greps the contents of all snapshots to
find
>>> a specific file. This would touch all inodes in all trees and thus
>>> make big parts of the trees unshared.
Are you talking about the atime for the primary copy, or the atime for the
snapshots?  IMHO, the atime should not be updated for a snapshot unless it is
explicitly mounted r/w, or it isn''t really a good snapshot.
>>> relatime (which is the default) reduces this problem a little bit,
as
>>> it by default only updates atime once a day. This means, if anyone
>>> wants to test this problem, mount with relatime disabled or change
the
>>> system date before you try to update atime (that''s the way
i tested
>>> it).
>>> 
>>> As a solution, I would suggest to make noatime the default for
btrfs.
>>> I''m however not sure if it is allowed in linux to have
different
>>> default mount options for different filesystem types. I know this
>>> discussion pops up every few years (last time it resulted in making
>>> relatime the default). But this is a special case for btrfs. atime
is
>>> already bad on other filesystems, but it''s much much worse
in btrfs.
>>> 
>> 
>> Just mount with -o noatime, there''s no chance of turning
something like that on
>> by default since it will break some applications (notably mutt). 
Thanks,
>> 
>> Josef
> 
> I know about the discussions regarding compatibility with existing
> applications. The problem here is, that it is not only a compatibility
> problem. Having atime enabled by default, may give you ENOSPC
> for reasons that a normal user does not understand or expect.
> As a normal user, I would think: If I never change something, why
> does it then take up more space just by reading it?
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Block

2012-May-25 16:38 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 6:28 PM, Andreas Dilger <adilger@dilger.ca>
wrote:> On 2012-05-25, at 9:59, Alexander Block <ablock84@googlemail.com>
wrote:
>
> Are you talking about the atime for the primary copy, or the atime for the
snapshots?  IMHO, the atime should not be updated for a snapshot unless it is
explicitly mounted r/w, or it isn''t really a good snapshot.
>Snapshots are by default r/w but can be created r/o explicitly. But
that doesn''t matter for the normal use case where you snapshot / and
continue working on /. After snapshotting, all metadata is shared
between the two subvolumes, but when a metadata block in one of both
subvolume changes (no matter which one), this one metadata block get''s
cowed and unshared and uses up more space.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Block

2012-May-25 16:48 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 6:32 PM, Freddie Cash <fjwcash@gmail.com>
wrote:>
> On May 25, 2012 9:00 AM, "Alexander Block"
<ablock84@googlemail.com> wrote:
>>
>> On Fri, May 25, 2012 at 5:42 PM, Josef Bacik <josef@redhat.com>
wrote:
>> > On Fri, May 25, 2012 at 05:35:37PM +0200, Alexander Block wrote:
>> >> Hello,
>> >>
>> >> (this is a resend with proper CC for linux-fsdevel and
linux-kernel)
>> >>
>> >> I would like to start a discussion on atime in Btrfs (and
other
>> >> filesystems with snapshot support).
>> >>
>> >> As atime is updated on every access of a file or directory, we
get
>> >> many changes to the trees in btrfs that as always trigger cow
>> >> operations. This is no problem as long as the changed tree
blocks are
>> >> not shared by other subvolumes. Performance is also not a
problem, no
>> >> matter if shared or not (thanks to relatime which is the
default).
>> >> The problems start when someone starts to use snapshots. If
you for
>> >> example snapshot your root and continue working on your root,
after
>> >> some time big parts of the tree will be cowed and unshared. In
the
>> >> worst case, the whole tree gets unshared and thus takes up the
double
>> >> space. Normally, a user would expect to only use extra space
for a
>> >> tree if he changes something.
>> >> A worst case scenario would be if someone took regular
snapshots for
>> >> backup purposes and later greps the contents of all snapshots
to find
>> >> a specific file. This would touch all inodes in all trees and
thus
>> >> make big parts of the trees unshared.
>> >>
>> >> relatime (which is the default) reduces this problem a little
bit, as
>> >> it by default only updates atime once a day. This means, if
anyone
>> >> wants to test this problem, mount with relatime disabled or
change the
>> >> system date before you try to update atime (that''s
the way i tested
>> >> it).
>> >>
>> >> As a solution, I would suggest to make noatime the default for
btrfs.
>> >> I''m however not sure if it is allowed in linux to
have different
>> >> default mount options for different filesystem types. I know
this
>> >> discussion pops up every few years (last time it resulted in
making
>> >> relatime the default). But this is a special case for btrfs.
atime is
>> >> already bad on other filesystems, but it''s much much
worse in btrfs.
>> >>
>> >
>> > Just mount with -o noatime, there''s no chance of turning
something like
>> > that on
>> > by default since it will break some applications (notably mutt).
>> >  Thanks,
>> >
>> > Josef
>>
>> I know about the discussions regarding compatibility with existing
>> applications. The problem here is, that it is not only a compatibility
>> problem. Having atime enabled by default, may give you ENOSPC
>> for reasons that a normal user does not understand or expect.
>> As a normal user, I would think: If I never change something, why
>> does it then take up more space just by reading it?
>
> Atime is metadata. Thus, by reading a file, only the metadata block for
that
> file is CoW''d...not the actual file data blocks. IOW, your
snapshots won''t
> change and suddenly balloon in size from reading files (metadata blocks are
> tiny).
>
> And, if they do, then something is horribly wrong with the snapshot system.
> Fixing that would be more important than changing the default mount
options.
> :)
That''s true, metadata blocks are tiny. But they still cost space, and
if you run through the whole tree and access all files/directories
(e.g. with grep, rsync, diff, or whatever) a lot (probably all)
metadata blocks are affected, which can be megabytes or even
gigabytes. All those metadata blocks get cowed and unshared, and thus
use up more and more space. If you use snapshots and get to a point
where nearly no space is left, a simple search for files that one
could delete may already result in no space left. If you use hundreds
(or millions...there is no limit on snapshot counts) of snapshots, the
problem gets worse and worse.

Alexander Block

2012-May-25 19:10 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 5:35 PM, Alexander Block
<ablock84@googlemail.com> wrote:> Hello,
>
> (this is a resend with proper CC for linux-fsdevel and linux-kernel)
>
> I would like to start a discussion on atime in Btrfs (and other
> filesystems with snapshot support).
>
> As atime is updated on every access of a file or directory, we get
> many changes to the trees in btrfs that as always trigger cow
> operations. This is no problem as long as the changed tree blocks are
> not shared by other subvolumes. Performance is also not a problem, no
> matter if shared or not (thanks to relatime which is the default).
> The problems start when someone starts to use snapshots. If you for
> example snapshot your root and continue working on your root, after
> some time big parts of the tree will be cowed and unshared. In the
> worst case, the whole tree gets unshared and thus takes up the double
> space. Normally, a user would expect to only use extra space for a
> tree if he changes something.
> A worst case scenario would be if someone took regular snapshots for
> backup purposes and later greps the contents of all snapshots to find
> a specific file. This would touch all inodes in all trees and thus
> make big parts of the trees unshared.
>
> relatime (which is the default) reduces this problem a little bit, as
> it by default only updates atime once a day. This means, if anyone
> wants to test this problem, mount with relatime disabled or change the
> system date before you try to update atime (that''s the way i
tested
> it).
>
> As a solution, I would suggest to make noatime the default for btrfs.
> I''m however not sure if it is allowed in linux to have different
> default mount options for different filesystem types. I know this
> discussion pops up every few years (last time it resulted in making
> relatime the default). But this is a special case for btrfs. atime is
> already bad on other filesystems, but it''s much much worse in
btrfs.
>
> Alex.
Just to show some numbers I made a simple test on a fresh btrfs fs. I
copied my hosts /usr (4 gig) folder to that fs and checked metadata
usage with "btrfs fi df /mnt", which was around 300m. Then I created
10 snapshots and checked metadata usage again, which didn''t change
much. Then I run "grep foobar /mnt -R" to update all files atime.
After this was finished, metadata usage was 2.59 gig. So I lost 2.2
gig just because I searched for something. If someone already has
nearly no space left, he probably won''t be able to move some data to
another disk, as he may get ENOSPC while copying the data.

Here is the output of the final "btrfs fi df":

# btrfs fi df /mnt
Data: total=6.01GB, used=4.19GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=3.25GB, used=2.59GB
Metadata: total=8.00MB, used=0.00

I don''t know much about other filesystems that support snapshots, but
I have the feeling that most of them would have the same problem. Also
all other filesystems in combination with LVM snapshots may cause
problems (I''m not very familiar with LVM). Filesystem image formats,
like qcow, vmdk, vbox and so on may also have problems with atime.

Peter Maloney

2012-May-25 20:27 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On 05/25/2012 09:10 PM, Alexander Block wrote:> Just to show some numbers I made a simple test on a fresh btrfs fs. I
> copied my hosts /usr (4 gig) folder to that fs and checked metadata
> usage with "btrfs fi df /mnt", which was around 300m. Then I
created
> 10 snapshots and checked metadata usage again, which didn''t change
> much. Then I run "grep foobar /mnt -R" to update all files atime.
> After this was finished, metadata usage was 2.59 gig. So I lost 2.2
> gig just because I searched for something. If someone already has
> nearly no space left, he probably won''t be able to move some data
to
> another disk, as he may get ENOSPC while copying the data.
>
> Here is the output of the final "btrfs fi df":
>
> # btrfs fi df /mnt
> Data: total=6.01GB, used=4.19GB
> System, DUP: total=8.00MB, used=4.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=3.25GB, used=2.59GB
> Metadata: total=8.00MB, used=0.00
>
> I don''t know much about other filesystems that support snapshots,
but
> I have the feeling that most of them would have the same problem. Also
> all other filesystems in combination with LVM snapshots may cause
> problems (I''m not very familiar with LVM). Filesystem image
formats,
> like qcow, vmdk, vbox and so on may also have problems with atime.
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.htmlDid you run the recursive grep after each snapshot (which I would expect
would result in 11 times as many metadata blocks, max 3.3 GB), or just
once after all 10 snapshots (which I think would mean only 2x as many
metadata blocks, max 600 MB)?

Alexander Block

2012-May-25 20:42 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 10:27 PM, Peter Maloney
<peter.maloney@brockmann-consult.de> wrote:> On 05/25/2012 09:10 PM, Alexander Block wrote:
>> Just to show some numbers I made a simple test on a fresh btrfs fs. I
>> copied my hosts /usr (4 gig) folder to that fs and checked metadata
>> usage with "btrfs fi df /mnt", which was around 300m. Then I
created
>> 10 snapshots and checked metadata usage again, which didn''t
change
>> much. Then I run "grep foobar /mnt -R" to update all files
atime.
>> After this was finished, metadata usage was 2.59 gig. So I lost 2.2
>> gig just because I searched for something. If someone already has
>> nearly no space left, he probably won''t be able to move some
data to
>> another disk, as he may get ENOSPC while copying the data.
>>
>> Here is the output of the final "btrfs fi df":
>>
>> # btrfs fi df /mnt
>> Data: total=6.01GB, used=4.19GB
>> System, DUP: total=8.00MB, used=4.00KB
>> System: total=4.00MB, used=0.00
>> Metadata, DUP: total=3.25GB, used=2.59GB
>> Metadata: total=8.00MB, used=0.00
>>
>> I don''t know much about other filesystems that support
snapshots, but
>> I have the feeling that most of them would have the same problem. Also
>> all other filesystems in combination with LVM snapshots may cause
>> problems (I''m not very familiar with LVM). Filesystem image
formats,
>> like qcow, vmdk, vbox and so on may also have problems with atime.
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Did you run the recursive grep after each snapshot (which I would expect
> would result in 11 times as many metadata blocks, max 3.3 GB), or just
> once after all 10 snapshots (which I think would mean only 2x as many
> metadata blocks, max 600 MB)?
>
I''ve run it only once after creating all snapshots. My expectation is
that
in both cases the result is the same. If all snapshots have the file /foo/bar,
then each individual snapshotted copy of it would have a different atime
and thus an own metadata block for it. As this happens with all files, no
matter how i iterated the files, then nearly all metadata blocks get their
own copy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Block

2012-May-25 20:48 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Fri, May 25, 2012 at 10:42 PM, Alexander Block
<ablock84@googlemail.com> wrote:> On Fri, May 25, 2012 at 10:27 PM, Peter Maloney
> <peter.maloney@brockmann-consult.de> wrote:
>> On 05/25/2012 09:10 PM, Alexander Block wrote:
>>> Just to show some numbers I made a simple test on a fresh btrfs fs.
I
>>> copied my hosts /usr (4 gig) folder to that fs and checked metadata
>>> usage with "btrfs fi df /mnt", which was around 300m.
Then I created
>>> 10 snapshots and checked metadata usage again, which
didn''t change
>>> much. Then I run "grep foobar /mnt -R" to update all
files atime.
>>> After this was finished, metadata usage was 2.59 gig. So I lost 2.2
>>> gig just because I searched for something. If someone already has
>>> nearly no space left, he probably won''t be able to move
some data to
>>> another disk, as he may get ENOSPC while copying the data.
>>>
>>> Here is the output of the final "btrfs fi df":
>>>
>>> # btrfs fi df /mnt
>>> Data: total=6.01GB, used=4.19GB
>>> System, DUP: total=8.00MB, used=4.00KB
>>> System: total=4.00MB, used=0.00
>>> Metadata, DUP: total=3.25GB, used=2.59GB
>>> Metadata: total=8.00MB, used=0.00
>>>
>>> I don''t know much about other filesystems that support
snapshots, but
>>> I have the feeling that most of them would have the same problem.
Also
>>> all other filesystems in combination with LVM snapshots may cause
>>> problems (I''m not very familiar with LVM). Filesystem
image formats,
>>> like qcow, vmdk, vbox and so on may also have problems with atime.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Did you run the recursive grep after each snapshot (which I would
expect
>> would result in 11 times as many metadata blocks, max 3.3 GB), or just
>> once after all 10 snapshots (which I think would mean only 2x as many
>> metadata blocks, max 600 MB)?
>>
>
> I''ve run it only once after creating all snapshots. My expectation
is that
> in both cases the result is the same. If all snapshots have the file
/foo/bar,
> then each individual snapshotted copy of it would have a different atime
> and thus an own metadata block for it. As this happens with all files, no
> matter how i iterated the files, then nearly all metadata blocks get their
> own copy.
Hmm, you did maybe assume the snapshots were r/o. In my test, the
snapshots were all r/w. In the r/o case, I would have to do the recursive
grep after each snapshot creation to get the same result.

Xavier Nicollet

2012-May-26 09:52 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

Le 25 May 2012 ? 11:42, Josef Bacik a écrit:> Just mount with -o noatime, there''s no chance of turning something
like that on
> by default since it will break some applications (notably mutt).  Thanks,
I''ve just updated the wiki:
https://btrfs.wiki.kernel.org/index.php/Mount_options#Performance

-- 
Xavier Nicollet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Boaz Harrosh

2012-May-29 08:14 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On 05/25/2012 06:35 PM, Alexander Block wrote:
> Hello,
> 
> (this is a resend with proper CC for linux-fsdevel and linux-kernel)
> 
> I would like to start a discussion on atime in Btrfs (and other
> filesystems with snapshot support).
> 
> As atime is updated on every access of a file or directory, we get
> many changes to the trees in btrfs that as always trigger cow
> operations. This is no problem as long as the changed tree blocks are
> not shared by other subvolumes. Performance is also not a problem, no
> matter if shared or not (thanks to relatime which is the default).
> The problems start when someone starts to use snapshots. If you for
> example snapshot your root and continue working on your root, after
> some time big parts of the tree will be cowed and unshared. In the
> worst case, the whole tree gets unshared and thus takes up the double
> space. Normally, a user would expect to only use extra space for a
> tree if he changes something.
> A worst case scenario would be if someone took regular snapshots for
> backup purposes and later greps the contents of all snapshots to find
> a specific file. This would touch all inodes in all trees and thus
> make big parts of the trees unshared.
> 
> relatime (which is the default) reduces this problem a little bit, as
> it by default only updates atime once a day. This means, if anyone
> wants to test this problem, mount with relatime disabled or change the
> system date before you try to update atime (that''s the way i
tested
> it).
> 
> As a solution, I would suggest to make noatime the default for btrfs.
> I''m however not sure if it is allowed in linux to have different
> default mount options for different filesystem types. I know this
> discussion pops up every few years (last time it resulted in making
> relatime the default). But this is a special case for btrfs. atime is
> already bad on other filesystems, but it''s much much worse in
btrfs.
> 

Sounds like a real problem. I would suggest a few remedies.
1. Make a filesystem persistent parameter that says noatime/relatime/atime
   So the default if not specified on mount is taken as a property of
   the FS (mkfs can set it)
2. The snapshot program should check and complain if it is on, and recommend
   an off. Since the problem only starts with a snapshot.
3. If space availability drops under some threshold, disable atime. As you said
   this is catastrophic in this case. So user can always search and delete
files.
   In fact if the IO was only because of atime, it should be a soft error,
warned,
   and ignored.

But perhaps the true solution is to put atime on a side table, so only the atime
info gets COW and not the all MetaData

Just my $0.017
Boaz
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Block

2012-May-29 14:03 UTC

head link

Re: atime and filesystems with snapshots (especially Btrfs)

On Tue, May 29, 2012 at 10:14 AM, Boaz Harrosh <bharrosh@panasas.com>
wrote:>
> Sounds like a real problem. I would suggest a few remedies.
> 1. Make a filesystem persistent parameter that says noatime/relatime/atime
>   So the default if not specified on mount is taken as a property of
>   the FS (mkfs can set it)That would be possible. But again, I''m not sure if it is allowed for
one fs type to
differ from all the other filesystems in its default
behavior.> 2. The snapshot program should check and complain if it is on, and
recommend
>   an off. Since the problem only starts with a snapshot.That would definitely cause awareness for the problem and many people would
probably switch to noatime on mount.> 3. If space availability drops under some threshold, disable atime. As you
said
>   this is catastrophic in this case. So user can always search and delete
files.
>   In fact if the IO was only because of atime, it should be a soft error,
warned,
>   and ignored.It would be hard to determine a good threshold. This really depends on the way
snapshots are used.>
> But perhaps the true solution is to put atime on a side table, so only the
atime
> info gets COW and not the all MetaDataThis would definitely reduce the problem to a minimum. But it may be harder
to implement as it sounds. You would either have to keep 2 trees per subvolume
(one for the fs and one for atime) or share one tree for all subvols.
I don''t think
2 trees per subvolume would be acceptable, but I''m not sure. A shared
tree
would require to implement some kind of custom refcounting for the items, as
changes to one fs tree should not change atime of the other and thus create
new items on demand. It would probably also require snapshot origin tracking,
because on a freshly snapshotted subvolume, no atime entries would exist at
all and must be read from the parent/origin.>
> Just my $0.017
> Boaz
>
>> Alex.
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Btrfs devel - May 2012 - atime and filesystems with snapshots (especially Btrfs)

atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)

Re: atime and filesystems with snapshots (especially Btrfs)