thr3ads.net - Btrfs devel - BTRFS fsck apparent errors [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Swâmi Petaramesh

2012-Jul-03 15:10 UTC

BTRFS fsck apparent errors

Hi there,

A couple days ago, I have converted my Ubuntu Precise machine from ext4 
to BTRFS using btrfs-convert.

I currently use kernel:
Linux fnix 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 
x86_64 x86_64 x86_64 GNU/Linux

...and a btrfs-tools package more recent than the old one that came with 
Ubuntu Precise:
Version: 0.19+20120328-4ubuntu1

After I had shifted, I tried to defragment and compress my FS using 
commands such as :

find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;

During execution of such commands, my kernel oopsed, so I restarted.

Afterwards, I noticed that, during the execution of such a command, my 
FS free space was quickly dropping, where I would have expected it to 
increase...

Once finished, I checked a couple of BTRFS FSes using btrfsck, but I 
interpret the results as having some errors :

root@fnix:/# btrfsck /dev/VG1/DEBMINT
checking extents
checking fs roots
root 256 inode 257 errors 800
found 7814565888 bytes used err is 1
total csum bytes: 6264636
total tree bytes: 394928128
total fs tree bytes: 365121536
btree space waste bytes: 101451531
file data blocks allocated: 20067590144
  referenced 13270241280
Btrfs Btrfs v0.19


root@fnix:/# btrfsck /dev/VG1/STORAGE
checking extents
checking fs roots
root 301 inode 10644 errors 1000
root 301 inode 10687 errors 1000
root 301 inode 10688 errors 1000
root 301 inode 10749 errors 1000
found 55683117056 bytes used err is 1
total csum bytes: 54188580
total tree bytes: 191500288
total fs tree bytes: 103596032
btree space waste bytes: 49730472
file data blocks allocated: 55640522752
  referenced 56466059264
Btrfs Btrfs v0.19

It doesn''t seem that btrfsck attempts to fix these errors in any way...
It just displays them.

Besides this, my filesystem looks sane, but I suspect that at least my 
"STORAGE" FS uses much more space than it should...

Are these errors serious ?Is there a way to get rid of them without 
reformatting ?

TIA for any clue.

Best regards.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2012-Jul-03 15:22 UTC

head link

Re: BTRFS fsck apparent errors

On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh
wrote:> A couple days ago, I have converted my Ubuntu Precise machine from
> ext4 to BTRFS using btrfs-convert.
[snip]> After I had shifted, I tried to defragment and compress my FS using
> commands such as :
> 
> find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;
> 
> During execution of such commands, my kernel oopsed, so I restarted.
> 
> Afterwards, I noticed that, during the execution of such a command,
> my FS free space was quickly dropping, where I would have expected
> it to increase...
   What you''re seeing is the fact that you''ve still got the
complete
ext4 filesystem and all of its data sitting untouched on the disk as
well. The defrag will have taken a complete new copy of the data but
not removed the ext4 copy.

   If you delete the conversion recovery directory (ext2_subvol), then
you''ll see the space usage drop again. Of course, doing that will also
mean that you won''t be able to roll back to ext4 without reformatting
and restoring from your backups. (You have got backups, right?)
> Once finished, I checked a couple of BTRFS FSes using btrfsck, but I
> interpret the results as having some errors :
> 
> root@fnix:/# btrfsck /dev/VG1/DEBMINT
> checking extents
> checking fs roots
> root 256 inode 257 errors 800
> found 7814565888 bytes used err is 1
> total csum bytes: 6264636
> total tree bytes: 394928128
> total fs tree bytes: 365121536
> btree space waste bytes: 101451531
> file data blocks allocated: 20067590144
>  referenced 13270241280
> Btrfs Btrfs v0.19
> 
> root@fnix:/# btrfsck /dev/VG1/STORAGE
> checking extents
> checking fs roots
> root 301 inode 10644 errors 1000
> root 301 inode 10687 errors 1000
> root 301 inode 10688 errors 1000
> root 301 inode 10749 errors 1000
> found 55683117056 bytes used err is 1
> total csum bytes: 54188580
> total tree bytes: 191500288
> total fs tree bytes: 103596032
> btree space waste bytes: 49730472
> file data blocks allocated: 55640522752
>  referenced 56466059264
> Btrfs Btrfs v0.19
> 
> It doesn''t seem that btrfsck attempts to fix these errors in any
> way... It just displays them.
   Correct, by default it just checks the filesystem. Just to be sure:
the filesystems in question weren''t mounted, were they?

   I would also suggest using a 3.4 kernel. There''s at least one FS
corruption bug known to exist in 3.2 that''s been fixed in 3.4.
(Probably not what''s happened in this case, but it''s best to
try to
avoid these kinds of issues).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
                 --- emacs: Eats Memory and Crashes. ---

David Sterba

2012-Jul-03 15:52 UTC

head link

Re: BTRFS fsck apparent errors

On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills
wrote:>    Correct, by default it just checks the filesystem. Just to be sure:
> the filesystems in question weren''t mounted, were they?
fsck will refuse to run on a mounted filesystem, though in case of a
read-only mount it might be useful during debugging, I''m using this
patch

--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3474,6 +3474,7 @@ static struct option long_options[] = {
        { "repair", 0, NULL, 0 },
        { "init-csum-tree", 0, NULL, 0 },
        { "init-extent-tree", 0, NULL, 0 },
+       { "force", 0, NULL, 0 },
        { 0, 0, 0, 0}
 };

@@ -3484,12 +3485,13 @@ int main(int ac, char **av)
        struct btrfs_fs_info *info;
        struct btrfs_trans_handle *trans = NULL;
        u64 bytenr = 0;
-       int ret;
+       int ret = 0;
        int num;
        int repair = 0;
        int option_index = 0;
        int init_csum_tree = 0;
        int rw = 0;
+       int force = 0;

        while(1) {
                int c;
@@ -3516,6 +3518,9 @@ int main(int ac, char **av)
                        printf("Creating a new CRC tree\n");
                        init_csum_tree = 1;
                        rw = 1;
+               } else if (option_index == 4) {
+                       printf("Skip mount checks\n");
+                       force = 1;
                }

        }
@@ -3527,7 +3532,7 @@ int main(int ac, char **av)
        radix_tree_init();
        cache_tree_init(&root_cache);

-       if((ret = check_mounted(av[optind])) < 0) {
+       if(!force && (ret = check_mounted(av[optind])) < 0) {
                fprintf(stderr, "Could not check mount status: %s\n",
strerror(-ret));
                return ret;
        } else if(ret) {

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2012-Jul-03 16:26 UTC

head link

Re: BTRFS fsck apparent errors

On 07/03/2012 08:52 AM, David Sterba wrote:> On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills wrote:
>>     Correct, by default it just checks the filesystem. Just to be sure:
>> the filesystems in question weren''t mounted, were they?
>
> fsck will refuse to run on a mounted filesystem, though in case of a
> read-only mount it might be useful during debugging, I''m using
this
> patch
>
> --- a/btrfsck.c
> +++ b/btrfsck.c
> @@ -3474,6 +3474,7 @@ static struct option long_options[] = {
>          { "repair", 0, NULL, 0 },
>          { "init-csum-tree", 0, NULL, 0 },
>          { "init-extent-tree", 0, NULL, 0 },
> +       { "force", 0, NULL, 0 },
If we were to run with this, I think it should be called something other
than force.  fsck.ext* has trained people to think that
''forcing'' a fsck
means doing a full repair pass even if the fs thinks that it was shut
down cleanly.

--read-only would be good if fsck was taught to not even try to write in
this mode.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-03 17:37 UTC

head link

Re: BTRFS fsck apparent errors

On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown
wrote:> On 07/03/2012 08:52 AM, David Sterba wrote:
> >--- a/btrfsck.c
> >+++ b/btrfsck.c
> >@@ -3474,6 +3474,7 @@ static struct option long_options[] = {
> >         { "repair", 0, NULL, 0 },
> >         { "init-csum-tree", 0, NULL, 0 },
> >         { "init-extent-tree", 0, NULL, 0 },
> >+       { "force", 0, NULL, 0 },
> 
> If we were to run with this, I think it should be called something other
> than force.  fsck.ext* has trained people to think that
''forcing'' a fsck
> means doing a full repair pass even if the fs thinks that it was shut
> down cleanly.
Agreed, it''s not a good name and was rather a quick aid to myself, I
didn''t put much thinking into the user interface as I usually do :)
> --read-only would be good if fsck was taught to not even try to write in
> this mode.
read-only mode is default and (hopefully) does no writes to the device,
this would require the --repair option so what you propose is sort of a
sanity check, right?


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zach Brown

2012-Jul-03 17:42 UTC

head link

Re: BTRFS fsck apparent errors

> read-only mode is default and (hopefully) does no writes to the device,
> this would require the --repair option so what you propose is sort of a
> sanity check, right?
Ah, I didn''t realize that it didn''t write without --repair. 
Yeah,
making sure that people don''t try to combine the repair and
read-from-mounted-devices options seems reasonable.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Swâmi Petaramesh

2012-Jul-03 19:17 UTC

head link

Re: BTRFS fsck apparent errors

Le 03/07/2012 17:22, Hugo Mills a écrit :> What you''re seeing is the fact that you''ve still got the
complete ext4
> filesystem and all of its data sitting untouched on the disk as well.
> The defrag will have taken a complete new copy of the data but not
> removed the ext4 copy. I though about that... However, I had "btrfs su del" the ext2_saved
subvolume, so it is expected to have been deleted...

If not, how could I possibly delete it, now that I can''t see it anymore
?

>> It doesn''t seem that btrfsck attempts to fix these errors in
any
>> way... It just displays them.
>    Correct, by default it just checks the filesystem. Just to be sure:
> the filesystems in question weren''t mounted, were they?
>
>  No, the filesystems weren''t mounted... If by default, btrfsck
doesn''t
fix, how could I ask it to fix ? "man btrfsck" or "btrfsck
-h" do not
show any option, only a device name...

TIA.

Kind regards.

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fajar A. Nugraha

2012-Jul-04 00:40 UTC

head link

Re: BTRFS fsck apparent errors

On Tue, Jul 3, 2012 at 10:22 PM, Hugo Mills <hugo@carfax.org.uk>
wrote:> On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:
>> After I had shifted, I tried to defragment and compress my FS using
>> commands such as :
>>
>> find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;
>>
>> During execution of such commands, my kernel oopsed, so I restarted.
>    I would also suggest using a 3.4 kernel. There''s at least one
FS
> corruption bug known to exist in 3.2 that''s been fixed in 3.4.

Are there any known btrfs regression in 3.4? I''m using 3.4.0-3-generic
from a ppa, but a normal mount - umount cycle seems MUCH longer
compared to how it was on 3.2, and iostat shows the disk is
read-IOPS-bound

# time mount LABEL=WD-root

real	0m10.400s
user	0m0.000s
sys	0m0.060s

# time umount /media/WD-root/

real	0m22.419s
user	0m0.000s
sys	0m0.064s

# /proc/10142/stack  <--- the PID of umount process
[<ffffffff8111dd1e>] sleep_on_page+0xe/0x20
[<ffffffff8111de88>] wait_on_page_bit+0x78/0x80
[<ffffffff8111e08c>] filemap_fdatawait_range+0x10c/0x1a0
[<ffffffffa00744eb>] btrfs_wait_marked_extents+0x6b/0xc0 [btrfs]
[<ffffffffa007457b>] btrfs_write_and_wait_marked_extents+0x3b/0x60 [btrfs]
[<ffffffffa00745cb>] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[<ffffffffa0074e69>] btrfs_commit_transaction+0x759/0x960 [btrfs]
[<ffffffffa00700db>] btrfs_commit_super+0xbb/0x110 [btrfs]
[<ffffffffa0071490>] close_ctree+0x2a0/0x310 [btrfs]
[<ffffffffa004b6c9>] btrfs_put_super+0x19/0x20 [btrfs]
[<ffffffff811810b2>] generic_shutdown_super+0x62/0xf0
[<ffffffff811811d6>] kill_anon_super+0x16/0x30
[<ffffffffa004df3a>] btrfs_kill_super+0x1a/0x90 [btrfs]
[<ffffffff811816ac>] deactivate_locked_super+0x3c/0xa0
[<ffffffff81181f9e>] deactivate_super+0x4e/0x70
[<ffffffff8119df9c>] mntput_no_expire+0xdc/0x130
[<ffffffff8119f296>] sys_umount+0x66/0xe0
[<ffffffff8169e129>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner

2012-Jul-04 03:25 UTC

head link

Re: BTRFS fsck apparent errors

On Tue, Jul 03, 2012 at 07:37:42PM +0200, David Sterba
wrote:> On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown wrote:
> > On 07/03/2012 08:52 AM, David Sterba wrote:
> > >--- a/btrfsck.c
> > >+++ b/btrfsck.c
> > >@@ -3474,6 +3474,7 @@ static struct option long_options[] = {
> > >         { "repair", 0, NULL, 0 },
> > >         { "init-csum-tree", 0, NULL, 0 },
> > >         { "init-extent-tree", 0, NULL, 0 },
> > >+       { "force", 0, NULL, 0 },
> > 
> > If we were to run with this, I think it should be called something
other
> > than force.  fsck.ext* has trained people to think that
''forcing'' a fsck
> > means doing a full repair pass even if the fs thinks that it was shut
> > down cleanly.
> 
> Agreed, it''s not a good name and was rather a quick aid to myself,
I
> didn''t put much thinking into the user interface as I usually do
:)
xfs_repair uses:

       -d     Repair  dangerously.  Allow  xfs_repair  to  repair an
	      XFS filesystem mounted read only. This is typically
	      done on a root fileystem from single user mode,
	      immediately followed by a reboot.
> > --read-only would be good if fsck was taught to not even try to write
in
> > this mode.
> 
> read-only mode is default and (hopefully) does no writes to the device,
> this would require the --repair option so what you propose is sort of a
> sanity check, right?
If you run fsck/reapir on a mounted filesystem, and it changes the
block device (i.e. fixes something) the mounted filesystem does not
know about it and so may use stale metadata and bad things will
happen. That''s why it''s called "dangerous". ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-04 13:42 UTC

head link

Re: BTRFS fsck apparent errors

On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha
wrote:> Are there any known btrfs regression in 3.4? I''m using
3.4.0-3-generic
> from a ppa, but a normal mount - umount cycle seems MUCH longer
> compared to how it was on 3.2, and iostat shows the disk is
> read-IOPS-bound
Is it just mount/umount without any other activity? Is the fs
fragmented (or aged), almost full, has lots of files?
> 
> # time mount LABEL=WD-root
> 
> real	0m10.400s
> user	0m0.000s
> sys	0m0.060s
> 
> # time umount /media/WD-root/
> 
> real	0m22.419s
> user	0m0.000s
> sys	0m0.064s
> 
> # /proc/10142/stack  <--- the PID of umount process
The process(es) actually doing the work are the btrfs workers, usual
sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache)
that are writing the cache states back to disk.
I''m using iotop to observe such things.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fajar A. Nugraha

2012-Jul-04 15:46 UTC

head link

Re: BTRFS fsck apparent errors

On Wed, Jul 4, 2012 at 8:42 PM, David Sterba <dave@jikos.cz>
wrote:> On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha wrote:
>> Are there any known btrfs regression in 3.4? I''m using
3.4.0-3-generic
>> from a ppa, but a normal mount - umount cycle seems MUCH longer
>> compared to how it was on 3.2, and iostat shows the disk is
>> read-IOPS-bound
>
> Is it just mount/umount without any other activity?
Yes
> Is the fs
> fragmented
Not sure how to check that quickly
> (or aged),
Over 1 year, so yes
> almost full,
df says 83% used, so probably yes (depending on how you define
"almost")

~ $ df -h /media/WD-root
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc2       922G  733G  155G  83% /media/WD-root

~ $ sudo btrfs fi df /media/WD-root/
Data: total=883.95GB, used=729.68GB
System, DUP: total=8.00MB, used=104.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=18.75GB, used=1.49GB
Metadata: total=8.00MB, used=0.00
> has lots of files?
it''s a "normal" 1 TB usb disk, with docs, movies, vm images,
etc. No
particular lots-of-small-files like maildir or anything like that.

>> # time umount /media/WD-root/
>>
>> real  0m22.419s
>> user  0m0.000s
>> sys   0m0.064s
>>
>> # /proc/10142/stack  <--- the PID of umount process
>
> The process(es) actually doing the work are the btrfs workers, usual
> sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache)
> that are writing the cache states back to disk.
Not sure about that, since iostat shows it''s mostly read, not write.
Will try iotop later.
I tested also with Chris'' for-linus on top of 3.4, same result (really
long time to umount).

Reverting back to ubuntu''s 3.2.0-26-generic, umount only took less than
1 s :P
So I guess I''m switching back to 3.2 for now.

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2012-Jul-05 00:32 UTC

head link

Re: BTRFS fsck apparent errors

On Wed, Jul 04, 2012 at 10:46:21PM +0700, Fajar A. Nugraha
wrote:> > Is it just mount/umount without any other activity?
> Yes
> 
> > Is the fs
> > fragmented
> Not sure how to check that quickly
> 
> > (or aged),
> Over 1 year, so yes
> 
> > almost full,
> df says 83% used, so probably yes (depending on how you define
"almost")
that matches my expectation that could lead to the mount/umount
slowness due to fragmentation
> > has lots of files?
> 
> it''s a "normal" 1 TB usb disk, with docs, movies, vm
images, etc. No
> particular lots-of-small-files like maildir or anything like that.
So it''s probably not an issue with inode_cache.
> >> # time umount /media/WD-root/
> >>
> >> real  0m22.419s
> >> user  0m0.000s
> >> sys   0m0.064s
> >>
> >> # /proc/10142/stack  <--- the PID of umount process
> >
> > The process(es) actually doing the work are the btrfs workers, usual
> > sucspects are btrfs-cache (free space cache) or btrfs-ino (inode
cache)
> > that are writing the cache states back to disk.
> 
> Not sure about that, since iostat shows it''s mostly read, not
write.
> Will try iotop later.
> I tested also with Chris'' for-linus on top of 3.4, same result
(really
> long time to umount).
Would be good to verify if it''s the btrfs-cache worker or not, IIRC
there were more writes than reads, so I''m not sure this is the right
direction.

The 3.5 series or 3.4+for-linus has some changes wrt free space cache
(removed the ''ideal caching mode'') that caused slow mounts but
has been
fixed.

I''ve looked again at the umount process call stack, and it''s
waiting
for writing the btree_inode which is the representation of the b-tree
nodes, it''s quite possible that changes to the generic writeback code
is
causing this. AFAIK the btree_inode does not behave as a normal file
inode regarding writeback.  The good reference point is 3.2, there were
non-trivial writeback changes merged since.

Guessing now, if the mount causes eg. atime update, then this triggers
cow, dirties the btree_inode and needs to read data from disk,
fragmentation slows this down. Number of cowed blocks is small compared
to the reads (and maybe generic readahead reads more than what''s
actually needed for the cow operation ...).

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2012 - BTRFS fsck apparent errors

BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors

Re: BTRFS fsck apparent errors