Hi there, A couple days ago, I have converted my Ubuntu Precise machine from ext4 to BTRFS using btrfs-convert. I currently use kernel: Linux fnix 3.2.0-26-generic #41-Ubuntu SMP Thu Jun 14 17:49:24 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ...and a btrfs-tools package more recent than the old one that came with Ubuntu Precise: Version: 0.19+20120328-4ubuntu1 After I had shifted, I tried to defragment and compress my FS using commands such as : find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \; During execution of such commands, my kernel oopsed, so I restarted. Afterwards, I noticed that, during the execution of such a command, my FS free space was quickly dropping, where I would have expected it to increase... Once finished, I checked a couple of BTRFS FSes using btrfsck, but I interpret the results as having some errors : root@fnix:/# btrfsck /dev/VG1/DEBMINT checking extents checking fs roots root 256 inode 257 errors 800 found 7814565888 bytes used err is 1 total csum bytes: 6264636 total tree bytes: 394928128 total fs tree bytes: 365121536 btree space waste bytes: 101451531 file data blocks allocated: 20067590144 referenced 13270241280 Btrfs Btrfs v0.19 root@fnix:/# btrfsck /dev/VG1/STORAGE checking extents checking fs roots root 301 inode 10644 errors 1000 root 301 inode 10687 errors 1000 root 301 inode 10688 errors 1000 root 301 inode 10749 errors 1000 found 55683117056 bytes used err is 1 total csum bytes: 54188580 total tree bytes: 191500288 total fs tree bytes: 103596032 btree space waste bytes: 49730472 file data blocks allocated: 55640522752 referenced 56466059264 Btrfs Btrfs v0.19 It doesn''t seem that btrfsck attempts to fix these errors in any way... It just displays them. Besides this, my filesystem looks sane, but I suspect that at least my "STORAGE" FS uses much more space than it should... Are these errors serious ?Is there a way to get rid of them without reformatting ? TIA for any clue. Best regards. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:> A couple days ago, I have converted my Ubuntu Precise machine from > ext4 to BTRFS using btrfs-convert.[snip]> After I had shifted, I tried to defragment and compress my FS using > commands such as : > > find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \; > > During execution of such commands, my kernel oopsed, so I restarted. > > Afterwards, I noticed that, during the execution of such a command, > my FS free space was quickly dropping, where I would have expected > it to increase...What you''re seeing is the fact that you''ve still got the complete ext4 filesystem and all of its data sitting untouched on the disk as well. The defrag will have taken a complete new copy of the data but not removed the ext4 copy. If you delete the conversion recovery directory (ext2_subvol), then you''ll see the space usage drop again. Of course, doing that will also mean that you won''t be able to roll back to ext4 without reformatting and restoring from your backups. (You have got backups, right?)> Once finished, I checked a couple of BTRFS FSes using btrfsck, but I > interpret the results as having some errors : > > root@fnix:/# btrfsck /dev/VG1/DEBMINT > checking extents > checking fs roots > root 256 inode 257 errors 800 > found 7814565888 bytes used err is 1 > total csum bytes: 6264636 > total tree bytes: 394928128 > total fs tree bytes: 365121536 > btree space waste bytes: 101451531 > file data blocks allocated: 20067590144 > referenced 13270241280 > Btrfs Btrfs v0.19 > > root@fnix:/# btrfsck /dev/VG1/STORAGE > checking extents > checking fs roots > root 301 inode 10644 errors 1000 > root 301 inode 10687 errors 1000 > root 301 inode 10688 errors 1000 > root 301 inode 10749 errors 1000 > found 55683117056 bytes used err is 1 > total csum bytes: 54188580 > total tree bytes: 191500288 > total fs tree bytes: 103596032 > btree space waste bytes: 49730472 > file data blocks allocated: 55640522752 > referenced 56466059264 > Btrfs Btrfs v0.19 > > It doesn''t seem that btrfsck attempts to fix these errors in any > way... It just displays them.Correct, by default it just checks the filesystem. Just to be sure: the filesystems in question weren''t mounted, were they? I would also suggest using a 3.4 kernel. There''s at least one FS corruption bug known to exist in 3.2 that''s been fixed in 3.4. (Probably not what''s happened in this case, but it''s best to try to avoid these kinds of issues). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- emacs: Eats Memory and Crashes. ---
On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills wrote:> Correct, by default it just checks the filesystem. Just to be sure: > the filesystems in question weren''t mounted, were they?fsck will refuse to run on a mounted filesystem, though in case of a read-only mount it might be useful during debugging, I''m using this patch --- a/btrfsck.c +++ b/btrfsck.c @@ -3474,6 +3474,7 @@ static struct option long_options[] = { { "repair", 0, NULL, 0 }, { "init-csum-tree", 0, NULL, 0 }, { "init-extent-tree", 0, NULL, 0 }, + { "force", 0, NULL, 0 }, { 0, 0, 0, 0} }; @@ -3484,12 +3485,13 @@ int main(int ac, char **av) struct btrfs_fs_info *info; struct btrfs_trans_handle *trans = NULL; u64 bytenr = 0; - int ret; + int ret = 0; int num; int repair = 0; int option_index = 0; int init_csum_tree = 0; int rw = 0; + int force = 0; while(1) { int c; @@ -3516,6 +3518,9 @@ int main(int ac, char **av) printf("Creating a new CRC tree\n"); init_csum_tree = 1; rw = 1; + } else if (option_index == 4) { + printf("Skip mount checks\n"); + force = 1; } } @@ -3527,7 +3532,7 @@ int main(int ac, char **av) radix_tree_init(); cache_tree_init(&root_cache); - if((ret = check_mounted(av[optind])) < 0) { + if(!force && (ret = check_mounted(av[optind])) < 0) { fprintf(stderr, "Could not check mount status: %s\n", strerror(-ret)); return ret; } else if(ret) { -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/03/2012 08:52 AM, David Sterba wrote:> On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills wrote: >> Correct, by default it just checks the filesystem. Just to be sure: >> the filesystems in question weren''t mounted, were they? > > fsck will refuse to run on a mounted filesystem, though in case of a > read-only mount it might be useful during debugging, I''m using this > patch > > --- a/btrfsck.c > +++ b/btrfsck.c > @@ -3474,6 +3474,7 @@ static struct option long_options[] = { > { "repair", 0, NULL, 0 }, > { "init-csum-tree", 0, NULL, 0 }, > { "init-extent-tree", 0, NULL, 0 }, > + { "force", 0, NULL, 0 },If we were to run with this, I think it should be called something other than force. fsck.ext* has trained people to think that ''forcing'' a fsck means doing a full repair pass even if the fs thinks that it was shut down cleanly. --read-only would be good if fsck was taught to not even try to write in this mode. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown wrote:> On 07/03/2012 08:52 AM, David Sterba wrote: > >--- a/btrfsck.c > >+++ b/btrfsck.c > >@@ -3474,6 +3474,7 @@ static struct option long_options[] = { > > { "repair", 0, NULL, 0 }, > > { "init-csum-tree", 0, NULL, 0 }, > > { "init-extent-tree", 0, NULL, 0 }, > >+ { "force", 0, NULL, 0 }, > > If we were to run with this, I think it should be called something other > than force. fsck.ext* has trained people to think that ''forcing'' a fsck > means doing a full repair pass even if the fs thinks that it was shut > down cleanly.Agreed, it''s not a good name and was rather a quick aid to myself, I didn''t put much thinking into the user interface as I usually do :)> --read-only would be good if fsck was taught to not even try to write in > this mode.read-only mode is default and (hopefully) does no writes to the device, this would require the --repair option so what you propose is sort of a sanity check, right? david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> read-only mode is default and (hopefully) does no writes to the device, > this would require the --repair option so what you propose is sort of a > sanity check, right?Ah, I didn''t realize that it didn''t write without --repair. Yeah, making sure that people don''t try to combine the repair and read-from-mounted-devices options seems reasonable. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 03/07/2012 17:22, Hugo Mills a écrit :> What you''re seeing is the fact that you''ve still got the complete ext4 > filesystem and all of its data sitting untouched on the disk as well. > The defrag will have taken a complete new copy of the data but not > removed the ext4 copy.I though about that... However, I had "btrfs su del" the ext2_saved subvolume, so it is expected to have been deleted... If not, how could I possibly delete it, now that I can''t see it anymore ?>> It doesn''t seem that btrfsck attempts to fix these errors in any >> way... It just displays them.> Correct, by default it just checks the filesystem. Just to be sure: > the filesystems in question weren''t mounted, were they? > >No, the filesystems weren''t mounted... If by default, btrfsck doesn''t fix, how could I ask it to fix ? "man btrfsck" or "btrfsck -h" do not show any option, only a device name... TIA. Kind regards. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 3, 2012 at 10:22 PM, Hugo Mills <hugo@carfax.org.uk> wrote:> On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:>> After I had shifted, I tried to defragment and compress my FS using >> commands such as : >> >> find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \; >> >> During execution of such commands, my kernel oopsed, so I restarted.> I would also suggest using a 3.4 kernel. There''s at least one FS > corruption bug known to exist in 3.2 that''s been fixed in 3.4.Are there any known btrfs regression in 3.4? I''m using 3.4.0-3-generic from a ppa, but a normal mount - umount cycle seems MUCH longer compared to how it was on 3.2, and iostat shows the disk is read-IOPS-bound # time mount LABEL=WD-root real 0m10.400s user 0m0.000s sys 0m0.060s # time umount /media/WD-root/ real 0m22.419s user 0m0.000s sys 0m0.064s # /proc/10142/stack <--- the PID of umount process [<ffffffff8111dd1e>] sleep_on_page+0xe/0x20 [<ffffffff8111de88>] wait_on_page_bit+0x78/0x80 [<ffffffff8111e08c>] filemap_fdatawait_range+0x10c/0x1a0 [<ffffffffa00744eb>] btrfs_wait_marked_extents+0x6b/0xc0 [btrfs] [<ffffffffa007457b>] btrfs_write_and_wait_marked_extents+0x3b/0x60 [btrfs] [<ffffffffa00745cb>] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs] [<ffffffffa0074e69>] btrfs_commit_transaction+0x759/0x960 [btrfs] [<ffffffffa00700db>] btrfs_commit_super+0xbb/0x110 [btrfs] [<ffffffffa0071490>] close_ctree+0x2a0/0x310 [btrfs] [<ffffffffa004b6c9>] btrfs_put_super+0x19/0x20 [btrfs] [<ffffffff811810b2>] generic_shutdown_super+0x62/0xf0 [<ffffffff811811d6>] kill_anon_super+0x16/0x30 [<ffffffffa004df3a>] btrfs_kill_super+0x1a/0x90 [btrfs] [<ffffffff811816ac>] deactivate_locked_super+0x3c/0xa0 [<ffffffff81181f9e>] deactivate_super+0x4e/0x70 [<ffffffff8119df9c>] mntput_no_expire+0xdc/0x130 [<ffffffff8119f296>] sys_umount+0x66/0xe0 [<ffffffff8169e129>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 03, 2012 at 07:37:42PM +0200, David Sterba wrote:> On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown wrote: > > On 07/03/2012 08:52 AM, David Sterba wrote: > > >--- a/btrfsck.c > > >+++ b/btrfsck.c > > >@@ -3474,6 +3474,7 @@ static struct option long_options[] = { > > > { "repair", 0, NULL, 0 }, > > > { "init-csum-tree", 0, NULL, 0 }, > > > { "init-extent-tree", 0, NULL, 0 }, > > >+ { "force", 0, NULL, 0 }, > > > > If we were to run with this, I think it should be called something other > > than force. fsck.ext* has trained people to think that ''forcing'' a fsck > > means doing a full repair pass even if the fs thinks that it was shut > > down cleanly. > > Agreed, it''s not a good name and was rather a quick aid to myself, I > didn''t put much thinking into the user interface as I usually do :)xfs_repair uses: -d Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only. This is typically done on a root fileystem from single user mode, immediately followed by a reboot.> > --read-only would be good if fsck was taught to not even try to write in > > this mode. > > read-only mode is default and (hopefully) does no writes to the device, > this would require the --repair option so what you propose is sort of a > sanity check, right?If you run fsck/reapir on a mounted filesystem, and it changes the block device (i.e. fixes something) the mounted filesystem does not know about it and so may use stale metadata and bad things will happen. That''s why it''s called "dangerous". ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha wrote:> Are there any known btrfs regression in 3.4? I''m using 3.4.0-3-generic > from a ppa, but a normal mount - umount cycle seems MUCH longer > compared to how it was on 3.2, and iostat shows the disk is > read-IOPS-boundIs it just mount/umount without any other activity? Is the fs fragmented (or aged), almost full, has lots of files?> > # time mount LABEL=WD-root > > real 0m10.400s > user 0m0.000s > sys 0m0.060s > > # time umount /media/WD-root/ > > real 0m22.419s > user 0m0.000s > sys 0m0.064s > > # /proc/10142/stack <--- the PID of umount processThe process(es) actually doing the work are the btrfs workers, usual sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache) that are writing the cache states back to disk. I''m using iotop to observe such things. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 4, 2012 at 8:42 PM, David Sterba <dave@jikos.cz> wrote:> On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha wrote: >> Are there any known btrfs regression in 3.4? I''m using 3.4.0-3-generic >> from a ppa, but a normal mount - umount cycle seems MUCH longer >> compared to how it was on 3.2, and iostat shows the disk is >> read-IOPS-bound > > Is it just mount/umount without any other activity?Yes> Is the fs > fragmentedNot sure how to check that quickly> (or aged),Over 1 year, so yes> almost full,df says 83% used, so probably yes (depending on how you define "almost") ~ $ df -h /media/WD-root Filesystem Size Used Avail Use% Mounted on /dev/sdc2 922G 733G 155G 83% /media/WD-root ~ $ sudo btrfs fi df /media/WD-root/ Data: total=883.95GB, used=729.68GB System, DUP: total=8.00MB, used=104.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=18.75GB, used=1.49GB Metadata: total=8.00MB, used=0.00> has lots of files?it''s a "normal" 1 TB usb disk, with docs, movies, vm images, etc. No particular lots-of-small-files like maildir or anything like that.>> # time umount /media/WD-root/ >> >> real 0m22.419s >> user 0m0.000s >> sys 0m0.064s >> >> # /proc/10142/stack <--- the PID of umount process > > The process(es) actually doing the work are the btrfs workers, usual > sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache) > that are writing the cache states back to disk.Not sure about that, since iostat shows it''s mostly read, not write. Will try iotop later. I tested also with Chris'' for-linus on top of 3.4, same result (really long time to umount). Reverting back to ubuntu''s 3.2.0-26-generic, umount only took less than 1 s :P So I guess I''m switching back to 3.2 for now. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 04, 2012 at 10:46:21PM +0700, Fajar A. Nugraha wrote:> > Is it just mount/umount without any other activity? > Yes > > > Is the fs > > fragmented > Not sure how to check that quickly > > > (or aged), > Over 1 year, so yes > > > almost full, > df says 83% used, so probably yes (depending on how you define "almost")that matches my expectation that could lead to the mount/umount slowness due to fragmentation> > has lots of files? > > it''s a "normal" 1 TB usb disk, with docs, movies, vm images, etc. No > particular lots-of-small-files like maildir or anything like that.So it''s probably not an issue with inode_cache.> >> # time umount /media/WD-root/ > >> > >> real 0m22.419s > >> user 0m0.000s > >> sys 0m0.064s > >> > >> # /proc/10142/stack <--- the PID of umount process > > > > The process(es) actually doing the work are the btrfs workers, usual > > sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache) > > that are writing the cache states back to disk. > > Not sure about that, since iostat shows it''s mostly read, not write. > Will try iotop later. > I tested also with Chris'' for-linus on top of 3.4, same result (really > long time to umount).Would be good to verify if it''s the btrfs-cache worker or not, IIRC there were more writes than reads, so I''m not sure this is the right direction. The 3.5 series or 3.4+for-linus has some changes wrt free space cache (removed the ''ideal caching mode'') that caused slow mounts but has been fixed. I''ve looked again at the umount process call stack, and it''s waiting for writing the btree_inode which is the representation of the b-tree nodes, it''s quite possible that changes to the generic writeback code is causing this. AFAIK the btree_inode does not behave as a normal file inode regarding writeback. The good reference point is 3.2, there were non-trivial writeback changes merged since. Guessing now, if the mount causes eg. atime update, then this triggers cow, dirties the btree_inode and needs to read data from disk, fragmentation slows this down. Number of cowed blocks is small compared to the reads (and maybe generic readahead reads more than what''s actually needed for the cow operation ...). david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html