Hello once more... I''m on Ubuntu 13.04 with Ubuntu kernel 3.8.0-19-lowlatency and Btrfs v0.20-rc1. (Did I say that before...? *G*) Okay, ATM I''m writing a script for creating snapshots for "backups" of all my btrfs filesystems. I come from a FreeBSD / Solaris background with heavy use of ZFS. In ZFS, it''s very easy to create snapshots of all "zpools" ("filesystems") of the system, like so: SnapName=backup.`date +%Y%m%d--%H%M%S` zpool list -Ho name | while read zpool; do zfs snapshot -r $zpool@$SnapName & done ; wait Well… How to do something like this with btrfs? I do _not_ want to have to manually list the filesystems/subvolumes I''ve got in some script or configuration file. Where I''m hanging right now, is that I can''t seem to figure out a "bullet proof" way to find all the subvolumes of the filesystems I might have. I''ve got this: root@ask-home:~# btrfs fi show failed to read /dev/sr0 Label: ''Data'' uuid: 7d2eb10f-aced-4d41-bb7f-7badbf075b6a Total devices 1 FS bytes used 27.96GB devid 1 size 35.00GB used 35.00GB path /dev/dm-0 Label: ''KernBtrfs'' uuid: 8b16bc2a-43e3-40da-89f5-7b333c6682f3 Total devices 1 FS bytes used 655.05MB devid 1 size 11.85GB used 6.04GB path /dev/sda11 Btrfs v0.20-rc1 root@ask-home:~# mount -t btrfs /dev/mapper/ssd-Data on /data type btrfs (rw,noatime,ssd,discard) /dev/mapper/ssd-Data on /home type btrfs (rw,noatime,ssd,discard,subvol=home) /dev/sda11 on /data/Kernel/KernBtrfs type btrfs (rw,noatime) Now, how would I find all the subvolumes of the btrfs with the label "Data" (or uuid 7d...) or /dev/dm-0? "btrfs subvolume list" only seems to operate on where a btrfs is mounted. It works on "path". I could do "btrfs subv list -a /data", but how would I figure out, that "/data" is the "root" of the filesystem with uuid 7d...? For now, I do something like this (-> http://pastebin.com/u08ub1i8): SnapName=backup.`date +%Y%m%d--%H%M%S` btrfs fi show 2>/dev/null | awk ''/ path / {print $NF}'' | while read path; do SafePath=`echo "$path" | tr / .` TmpMountDir=`mktemp -d /tmp/.btrfs.mount.$SafePath.XXXXXX` mount -t btrfs $path $TmpMountDir (btrfs subv list -ar $TmpMountDir; btrfs subv list -a $TmpMountDir) | sort | uniq -u | while read _id Id _gen Gen _top _level Toplevel _path Path; do btrfs subv snaps -r "$TmpMountDir/$Path" "$TmpMountDir/$Path.$SnapName" done umount $TmpMountDir rmdir $TmpMountDir done Now, this works, but seems "somewhat" complicated... But maybe I''m just spoiled by ZFS ;) Is there an easier way to achieve what I want? I want to achieve: Creating recursive snapshots for all filesystems ;) Thanks, Alexander -- => Google+ => http://plus.skwar.me <==> Chat (Jabber/Google Talk) => a.skwar@gmail.com <=-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar wrote (ao):> Where I''m hanging right now, is that I can''t seem to figure out a > "bullet proof" way to find all the subvolumes of the filesystems I > might have.> Is there an easier way to achieve what I want? I want to achieve: > > Creating recursive snapshots for all filesystemsNot sure if this helps, but I have subvolid=0, which contains all my subvolumes, mounted under /.root/ /etc/fstab: LABEL=panda / btrfs subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd 0 0 LABEL=panda /home btrfs subvol=home 0 0 LABEL=panda /root btrfs subvol=root 0 0 LABEL=panda /var btrfs subvol=var 0 0 LABEL=panda /holding btrfs subvol=.holding 0 0 LABEL=panda /.root btrfs subvolid=0 0 0 LABEL=panda /.backupadmin btrfs subvol=backupadmin 0 0 /Varlib /var/lib none bind 0 0 panda:~# ls -l /.root/ total 0 drwxr-xr-x. 1 root root 580800 Jan 30 17:46 backupadmin drwxr-xr-x. 1 root root 24 Mar 27 2012 home drwx------. 1 root root 742 Mar 19 15:50 root drwxr-xr-x. 1 root root 226 May 16 2012 rootvolume drwxr-xr-x. 1 root root 96 Apr 3 2012 var In my snapshots script: ... yyyymmddhhmm=`date +%Y%m%d_%H.%M` ... for subvolume in `ls /.root/` do ... /sbin/btrfs subvolume snapshot ${filesystem}/${subvolume}/ \ /.root/.snapshot_${yyyymmddhhmm}_${hostname}_${subvolume}/ || result=2 ... done ... This creates timestamped snapshots for all subvolumes. Sander -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar
2013-May-03 11:46 UTC
Re: Creating recursive snapshots for all filesystems
Hi Sander <sander <at> humilis.net> writes:> > Alexander Skwar wrote (ao): > > Where I''m hanging right now, is that I can''t seem to figure out a > > "bullet proof" way to find all the subvolumes of the filesystems I > > might have. > > > Is there an easier way to achieve what I want? I want to achieve: > > > > Creating recursive snapshots for all filesystems > > Not sure if this helps, but I have subvolid=0, which contains all my > subvolumes, mounted under /.root/Hm, not quite what I''m after and not nearly as easy as ZFS... "Problem" with your approach: The admin has to maintain this. I was looking for something, which "maints itself", so to say. And your approach also wouldn''t scale if there are sub-subvolumes. ZFS really is so much easier (at least regarding that). Thanks a lot, though. It''s a worthwhile idea. Regards, Alexander -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb:> Where I''m hanging right now, is that I can''t seem to figure out a > "bullet proof" way to find all the subvolumes of the filesystems I > might have.What about this: # btrfs sub list -a / ID 256 gen 1487089 top level 5 path <FS_TREE>/root64 ID 258 gen 1487089 top level 5 path <FS_TREE>/home ID 259 gen 1486932 top level 5 path <FS_TREE>/usr-src ID 260 gen 1486885 top level 5 path <FS_TREE>/usr-portage ID 261 gen 1487089 top level 5 path <FS_TREE>/var-tmp You still need to iterate through the mounted btrfs filesystems if you are using more than one: # btrfs fi show | fgrep uuid Label: ''usb-backup'' uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca Label: ''system'' uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536 Then translate the uuid back to an fspath somehow. Another option would be to use blkid: # blkid -t TYPE=btrfs /dev/sda3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" UUID_SUB="7e0a2d93-86cc-4421-9e2c-b5c405075ff3" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="7807f64f-3d0b-4e99-81a4-975128ed2918" /dev/sdb3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" UUID_SUB="caa3a27b-8546-4519-9a13-8f50cd1caf70" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="1330dcf5-f1d3-462b-af08-e7cad839b3dc" /dev/sdc3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" UUID_SUB="e919b066-db86-4818-975c-bae0548c822a" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="847fd503-8dca-4c6b-b199-20a47d62aa55" /dev/sdd1: LABEL="usb-backup" UUID="7038c8fa-4293-49e9-b493-a9c46e5663ca" UUID_SUB="f2eac9b0-22e3-44a2-bdc5-c98ee86da71f" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="275ba328-9d5a-494b-bf19-a995596a4b6b" Still needs translation back to fspathes. But that could be done with grep/head/lsblk trickery... HTH, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar
2013-May-05 12:59 UTC
Re: Creating recursive snapshots for all filesystems
Hello On Sun, May 5, 2013 at 1:05 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:> Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb: > >> Where I''m hanging right now, is that I can''t seem to figure out a >> "bullet proof" way to find all the subvolumes of the filesystems I >> might have. > > What about this: > > # btrfs sub list -a /How do I know that "/" is a btrfs? How do I know, that there are not also other btrfs filesystems? :)> You still need to iterate through the mounted btrfs filesystems if you are > using more than one: > > # btrfs fi show | fgrep uuid > Label: ''usb-backup'' uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca > Label: ''system'' uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536 > > Then translate the uuid back to an fspath somehow.Yep. That''s basically what I''m doing ATM.> > Another option would be to use blkid: > > # blkid -t TYPE=btrfsAh, that''s cool! Didn''t know that blkid had this option.> Still needs translation back to fspathes. But that could be done with > grep/head/lsblk trickery...Hm, I didn''t know lsblk until now, but it seems that it doesn''t handle LVM well, does it? (See http://pastebin.com/fuZ8HHQi for better readable version.) a@ask-home:~$ blkid -t TYPE=btrfs | grep mapper /dev/mapper/ssd-Data: LABEL="Data" UUID="7d2eb10f-aced-4d41-bb7f-7badbf075b6a" UUID_SUB="582b64f5-edd5-48f2-978e-24df9a839b5b" TYPE="btrfs" a@ask-home:~$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 167.7G 0 disk ├─sda1 8:1 0 100M 0 part ├─sda2 8:2 0 10.9G 0 part /dell-recovery ├─sda3 8:3 0 39.1G 0 part /windows ├─sda4 8:4 0 1K 0 part ├─sda5 8:5 0 102M 0 part /boot ├─sda6 8:6 0 8.1G 0 part [SWAP] ├─sda7 8:7 0 12.2G 0 part │ ├─ssd-Data (dm-0) 252:0 0 35G 0 lvm │ └─ssd-UbRoot1304 (dm-1) 252:1 0 10G 0 lvm / ├─sda8 8:8 0 12.2G 0 part │ └─ssd-Data (dm-0) 252:0 0 35G 0 lvm ├─sda9 8:9 0 12.2G 0 part │ └─ssd-Data (dm-0) 252:0 0 35G 0 lvm ├─sda10 8:10 0 12.2G 0 part │ └─ssd-Data (dm-0) 252:0 0 35G 0 lvm ├─sda11 8:11 0 11.9G 0 part ├─sda12 8:12 0 11.9G 0 part /data/Kernel/KernExt4 ├─sda13 8:13 0 11.9G 0 part ├─sda14 8:14 0 11.9G 0 part /data/Kernel/KernReiserfs └─sda15 8:15 0 11.9G 0 part /data/Kernel/KernXfs sr0 11:0 1 1024M 0 rom a@ask-home:~$ lsblk -P|grep Data NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0" TYPE="lvm" MOUNTPOINT="" NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0" TYPE="lvm" MOUNTPOINT="" NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0" TYPE="lvm" MOUNTPOINT="" NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0" TYPE="lvm" MOUNTPOINT="" So I guess I''d still need to mount the root volume temporarily somewhere to do the translation. ZFS really wipes the floor with btrfs regarding ease of use as far as that''s concerned… That blkid trick was quite useful, though. Thanks! Cheers, Alexander -- => Google+ => http://plus.skwar.me <==> Chat (Jabber/Google Talk) => a.skwar@gmail.com <=-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb:> So I guess I''d still need to mount the root volume temporarily > somewhere to do the translation.That brings in the idea how bedup seems to handle this. Maybe you want to take one or the other idea from there as it also has to enumerate all btrfs filesystems and snapshots: https://github.com/g2p/bedup Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexander Skwar
2013-May-05 16:19 UTC
Re: Creating recursive snapshots for all filesystems
Hi Kai On Sun, May 5, 2013 at 6:03 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:> That brings in the idea how bedup seems to handle this. Maybe you want to > take one or the other idea from there as it also has to enumerate all btrfs > filesystems and snapshots:Sure, will have a look. There are _certainly_ more clever ways to do that, than what I came up with ☺ BR, Alexander -- => Google+ => http://plus.skwar.me <==> Chat (Jabber/Google Talk) => a.skwar@gmail.com <=-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I''ll preface that I''m running Ubuntu 13.04 with the standard 3.8 series kernel so please disregard if this has been fixed in higher versions. This is on a btrfs RAID1 with 3 then 4 disks. My use case is to set the nocow ''C'' flag on a directory and copy in some files, then make lots of writes (same file sizes) and note that the number of extents stays the same, good. Then run a balance (I added a disk) and start making writes again, now the number of extents starts climbing, boo. Is this standard behavior? I realize a balance will cow the files. Are they also being checksummed thereby breaking the nocow flag? I have made no snapshots and made no writes to said files while the balance was running. Thanks, Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:> I''ll preface that I''m running Ubuntu 13.04 with the standard 3.8 > series kernel so please disregard if this has been fixed in higher > versions. This is on a btrfs RAID1 with 3 then 4 disks. > > My use case is to set the nocow ''C'' flag on a directory and copy in > some files, then make lots of writes (same file sizes) and note that > the number of extents stays the same, good. > Then run a balance (I added a disk) and start making writes again, > now the number of extents starts climbing, boo. > Is this standard behavior? I realize a balance will cow the files. > Are they also being checksummed thereby breaking the nocow flag? > > I have made no snapshots and made no writes to said files while the > balance was running.Hi Kyle, It''s hard to say if it''s standard, it is a side effect casued by balance. During balance, our reloc root works like a snapshot, so we set last_snapshot on the fs root, and this makes new nocow writes think that we have to do cow as the extent is created before taking snapshot. But the nocow ''C'' flag on the file is still there, if you make new writes on the new extent after balance, you still get the same number of extents. thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 10, 2013 Liu Bo wrote:> On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote: >> I''ll preface that I''m running Ubuntu 13.04 with the standard 3.8 >> series kernel so please disregard if this has been fixed in higher >> versions. This is on a btrfs RAID1 with 3 then 4 disks. >> >> My use case is to set the nocow ''C'' flag on a directory and copy in >> some files, then make lots of writes (same file sizes) and note that >> the number of extents stays the same, good. >> Then run a balance (I added a disk) and start making writes again, >> now the number of extents starts climbing, boo. >> Is this standard behavior? I realize a balance will cow the files. >> Are they also being checksummed thereby breaking the nocow flag? >> >> I have made no snapshots and made no writes to said files while the >> balance was running. > > Hi Kyle, > > It''s hard to say if it''s standard, it is a side effect casued by balance. > > During balance, our reloc root works like a snapshot, so we set > last_snapshot on the fs root, and this makes new nocow writes think that > we have to do cow as the extent is created before taking snapshot. > > But the nocow ''C'' flag on the file is still there, if you make new > writes on the new extent after balance, you still get the same number of > extents. > > thanks, > liuboThank you for the explanation. On my machine this didn''t happen however. IIRC one 10GiB file had 24 extents before balance, 26 extents after balance, and 1000+ and growing when I checked the following day. I''ll add that I am running a relatively recent version of btrfs-tools from a ppa. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Fri, May 10, 2013 Liu Bo wrote: >> On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote: >>> I''ll preface that I''m running Ubuntu 13.04 with the standard 3.8 >>> series kernel so please disregard if this has been fixed in higher >>> versions. This is on a btrfs RAID1 with 3 then 4 disks. >>> >>> My use case is to set the nocow ''C'' flag on a directory and copy in >>> some files, then make lots of writes (same file sizes) and note that >>> the number of extents stays the same, good. >>> Then run a balance (I added a disk) and start making writes again, >>> now the number of extents starts climbing, boo. >>> Is this standard behavior? I realize a balance will cow the files. >>> Are they also being checksummed thereby breaking the nocow flag? >>> >>> I have made no snapshots and made no writes to said files while the >>> balance was running. >> >> Hi Kyle, >> >> It''s hard to say if it''s standard, it is a side effect casued by balance. >> >> During balance, our reloc root works like a snapshot, so we set >> last_snapshot on the fs root, and this makes new nocow writes think that >> we have to do cow as the extent is created before taking snapshot. >> >> But the nocow ''C'' flag on the file is still there, if you make new >> writes on the new extent after balance, you still get the same number of >> extents. >> >> thanks, >> liubo > > Thank you for the explanation. > On my machine this didn''t happen however. IIRC one ~10GiB file had 24 > extents before balance, 26 extents after balance, and 1000+ and growing > when I checked the following day. > I''ll add that I am running a relatively recent version of btrfs-tools from > a ppa.and mounted with autodefrag Am I actually just seeing large ranges getting split while remaining contiguous on disk? This would imply crc calculation on the two outside ranges. Or perhaps there is some data being inlined for each write. I believe writes on this file are 32KiB each. Does the balance produce persistent crc values in the metadata even though the files are nocow which implies nocrc? ... I ran this test again and here''s filefrag -v after about a day of use: Filesystem type is: 9123683e File size of /blah/blah/file is 10213265920 (2493474 blocks, blocksize 4096) ext logical physical expected length flags 0 0 675625629 9 1 9 675621279 675625638 55 2 64 674410131 675621334 886 3 950 675558303 674411017 9 4 959 675583473 675558312 55 5 1014 674411081 675583528 708 6 1722 675456318 674411789 9 7 1731 675710934 675456327 55 8 1786 674411853 675710989 521 9 2307 675424433 674412374 9 10 2316 675471062 675424442 55 11 2371 674412438 675471117 984 12 3355 676012018 674413422 9 13 3364 676024295 676012027 55 14 3419 674413486 676024350 871 15 4290 675681138 674414357 9 16 4299 675618500 675681147 55 ... 13986 2486955 671627059 675876382 627 13987 2487582 675677542 671627686 9 13988 2487591 675700351 675677551 55 13989 2487646 671627750 675700406 1212 13990 2488858 675932037 671628962 9 13991 2488867 675990025 675932046 55 13992 2488922 671629026 675990080 220 13993 2489142 675674447 671629246 9 13994 2489151 675687864 675674456 55 13995 2489206 671629310 675687919 1821 13996 2491027 676209288 671631131 9 13997 2491036 676260767 676209297 55 13998 2491091 671631195 676260822 285 13999 2491376 675650278 671631480 9 14000 2491385 675678822 675650287 55 14001 2491440 671631544 675678877 1464 14002 2492904 675534255 671633008 9 14003 2492913 675503514 675534264 55 14004 2492968 671633072 675503569 506 eof /blah/blah/file: 14005 extents found As you can see the 32KiB writes fit in the extents of size 9 and 55. Are those 9 block extents inlined? If I understand correctly, new extents are created for these nocow writes, then the old extents are basically hole punched producing three (four? because of inlining) separate extents. Something here begs for optimization. Perhaps balance should treat nocow files a little differently. That would be the time to remove the extra bits that prevent inplace overwrites. After the fact it becomes much more difficult, although removing a crc for the extent being written seems a little easier then iterating over the entire file. Thanks for taking the time to read, Kyle P.S. I''m CCing David as I believe he wrote the patch to get the ''C'' flag working on empty files and directories. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:> and mounted with autodefrag > Am I actually just seeing large ranges getting split while remaining > contiguous on disk? This would imply crc calculation on the two > outside ranges. Or perhaps there is some data being inlined for each > write. I believe writes on this file are 32KiB each. > Does the balance produce persistent crc values in the metadata even > though the files are nocow which implies nocrc? > ... > I ran this test again and here''s filefrag -v after about a day of use: > >[...] > As you can see the 32KiB writes fit in the extents of size 9 and 55. > Are those 9 block extents inlined? > If I understand correctly, new extents are created for these nocow > writes, then the old extents are basically hole punched producing > three (four? because of inlining) separate extents. > Something here begs for optimization. Perhaps balance should treat > nocow files a little differently. That would be the time to remove > the extra bits that prevent inplace overwrites. After the fact it > becomes much more difficult, although removing a crc for the extent > being written seems a little easier then iterating over the entire > file. > > Thanks for taking the time to read, > Kyle > > P.S. I''m CCing David as I believe he wrote the patch to get the ''C'' > flag working on empty files and directories.Hi Kyle, Can you please apply this patch and see if it helps? thanks, liubo From: Liu Bo <bo.li.liu@oracle.com> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance Balance will create reloc_root for each fs root, and it''s going to record last_snapshot to filter shared blocks. The side effect of setting last_snapshot is to break nocow attributes of files. So here we update file extent''s generation while walking relocated file extents in data reloc root, and use file extent''s generation instead for checking if we have cross refs for the file extent. That way we can make nocow happy again and have no impact on others. Reported-by: Kyle Gates <kylegates@hotmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent-tree.c | 18 +++++++++++++----- fs/btrfs/inode.c | 10 ++++++++-- fs/btrfs/relocation.c | 1 + 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4560052..eb2e782 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr); + u64 objectid, u64 offset, u64 bytenr, u64 gen); struct btrfs_block_group_cache *btrfs_lookup_block_group( struct btrfs_fs_info *info, u64 bytenr); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 1e84c74..f3b3616 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2816,7 +2816,8 @@ out: static noinline int check_committed_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, + u64 fi_gen) { struct btrfs_root *extent_root = root->fs_info->extent_root; struct extent_buffer *leaf; @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) goto out; - if (btrfs_extent_generation(leaf, ei) <- btrfs_root_last_snapshot(&root->root_item)) + /* + * Usually generation in extent item is larger than that in file extent + * item because of delay refs. But we don''t want balance to break + * file''s nocow behaviour, so use file_extent''s generation which has + * been updates when we update fs root to point to relocated file + * extents in data reloc root. + */ + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen); + if (fi_gen <= btrfs_root_last_snapshot(&root->root_item)) goto out; iref = (struct btrfs_extent_inline_ref *)(ei + 1); @@ -2886,7 +2894,7 @@ out: int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr) + u64 objectid, u64 offset, u64 bytenr, u64 gen) { struct btrfs_path *path; int ret; @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, do { ret = check_committed_ref(trans, root, path, objectid, - offset, bytenr); + offset, bytenr, gen); if (ret && ret != -ENOENT) goto out; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2cfdd33..976b045 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1727,6 +1727,8 @@ next_slot: ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); if (extent_type == BTRFS_FILE_EXTENT_REG || extent_type == BTRFS_FILE_EXTENT_PREALLOC) { + u64 gen; + gen = btrfs_file_extent_generation(leaf, fi); disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); extent_offset = btrfs_file_extent_offset(leaf, fi); extent_end = found_key.offset + @@ -1749,7 +1751,8 @@ next_slot: goto out_check; if (btrfs_cross_ref_exist(trans, root, ino, found_key.offset - - extent_offset, disk_bytenr)) + extent_offset, disk_bytenr, + gen)) goto out_check; disk_bytenr += extent_offset; disk_bytenr += cur_offset - found_key.offset; @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, struct btrfs_key key; u64 disk_bytenr; u64 backref_offset; + u64 fi_gen; u64 extent_end; u64 num_bytes; int slot; @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, } disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); backref_offset = btrfs_file_extent_offset(leaf, fi); + fi_gen = btrfs_file_extent_generation(leaf, fi); *orig_start = key.offset - backref_offset; *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi); @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle *trans, * find any we must cow */ if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode), - key.offset - backref_offset, disk_bytenr)) + key.offset - backref_offset, disk_bytenr, + fi_gen)) goto out; /* diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 704a1b8..07faabf 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans, BUG_ON(ret < 0); btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr); + btrfs_set_file_extent_generation(leaf, fi, trans->transid); dirty = 1; key.offset -= btrfs_file_extent_offset(leaf, fi); -- 1.7.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 17 May 2013 15:04:45 +0800, Liu Bo wrote:> On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote: >> and mounted with autodefrag >> Am I actually just seeing large ranges getting split while remaining >> contiguous on disk? This would imply crc calculation on the two >> outside ranges. Or perhaps there is some data being inlined for each >> write. I believe writes on this file are 32KiB each. >> Does the balance produce persistent crc values in the metadata even >> though the files are nocow which implies nocrc? >> ... >> I ran this test again and here''s filefrag -v after about a day of use: >> >>[...] >> As you can see the 32KiB writes fit in the extents of size 9 and 55. >> Are those 9 block extents inlined? >> If I understand correctly, new extents are created for these nocow >> writes, then the old extents are basically hole punched producing >> three (four? because of inlining) separate extents. >> Something here begs for optimization. Perhaps balance should treat >> nocow files a little differently. That would be the time to remove >> the extra bits that prevent inplace overwrites. After the fact it >> becomes much more difficult, although removing a crc for the extent >> being written seems a little easier then iterating over the entire >> file. >> >> Thanks for taking the time to read, >> Kyle >> >> P.S. I''m CCing David as I believe he wrote the patch to get the ''C'' >> flag working on empty files and directories. > > Hi Kyle, > > Can you please apply this patch and see if it helps? > > thanks, > liubo > > > From: Liu Bo <bo.li.liu@oracle.com> > > Subject: [PATCH] Btrfs: fix broken nocow after a normal balance > > Balance will create reloc_root for each fs root, and it''s going to > record last_snapshot to filter shared blocks. The side effect of > setting last_snapshot is to break nocow attributes of files. > > So here we update file extent''s generation while walking relocated > file extents in data reloc root, and use file extent''s generation > instead for checking if we have cross refs for the file extent. > > That way we can make nocow happy again and have no impact on others. > > Reported-by: Kyle Gates <kylegates@hotmail.com> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > --- > fs/btrfs/ctree.h | 2 +- > fs/btrfs/extent-tree.c | 18 +++++++++++++----- > fs/btrfs/inode.c | 10 ++++++++-- > fs/btrfs/relocation.c | 1 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 4560052..eb2e782 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct > btrfs_root *root, > u64 bytenr, u64 num_bytes); > int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > - u64 objectid, u64 offset, u64 bytenr); > + u64 objectid, u64 offset, u64 bytenr, u64 gen); > struct btrfs_block_group_cache *btrfs_lookup_block_group( > struct btrfs_fs_info *info, > u64 bytenr); > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 1e84c74..f3b3616 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2816,7 +2816,8 @@ out: > static noinline int check_committed_ref(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > struct btrfs_path *path, > - u64 objectid, u64 offset, u64 bytenr) > + u64 objectid, u64 offset, u64 bytenr, > + u64 fi_gen) > { > struct btrfs_root *extent_root = root->fs_info->extent_root; > struct extent_buffer *leaf; > @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct > btrfs_trans_handle > *trans, > btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) > goto out; > > - if (btrfs_extent_generation(leaf, ei) <> - btrfs_root_last_snapshot(&root->root_item)) > + /* > + * Usually generation in extent item is larger than that in file extent > + * item because of delay refs. But we don''t want balance to break > + * file''s nocow behaviour, so use file_extent''s generation which has > + * been updates when we update fs root to point to relocated file > + * extents in data reloc root. > + */ > + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen); > + if (fi_gen <= btrfs_root_last_snapshot(&root->root_item)) > goto out; > > iref = (struct btrfs_extent_inline_ref *)(ei + 1); > @@ -2886,7 +2894,7 @@ out: > > int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > - u64 objectid, u64 offset, u64 bytenr) > + u64 objectid, u64 offset, u64 bytenr, u64 gen) > { > struct btrfs_path *path; > int ret; > @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle > *trans, > > do { > ret = check_committed_ref(trans, root, path, objectid, > - offset, bytenr); > + offset, bytenr, gen); > if (ret && ret != -ENOENT) > goto out; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 2cfdd33..976b045 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1727,6 +1727,8 @@ next_slot: > ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); > if (extent_type == BTRFS_FILE_EXTENT_REG || > extent_type == BTRFS_FILE_EXTENT_PREALLOC) { > + u64 gen; > + gen = btrfs_file_extent_generation(leaf, fi); > disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); > extent_offset = btrfs_file_extent_offset(leaf, fi); > extent_end = found_key.offset + > @@ -1749,7 +1751,8 @@ next_slot: > goto out_check; > if (btrfs_cross_ref_exist(trans, root, ino, > found_key.offset - > - extent_offset, disk_bytenr)) > + extent_offset, disk_bytenr, > + gen)) > goto out_check; > disk_bytenr += extent_offset; > disk_bytenr += cur_offset - found_key.offset; > @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > struct btrfs_key key; > u64 disk_bytenr; > u64 backref_offset; > + u64 fi_gen; > u64 extent_end; > u64 num_bytes; > int slot; > @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > } > disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); > backref_offset = btrfs_file_extent_offset(leaf, fi); > + fi_gen = btrfs_file_extent_generation(leaf, fi); > > *orig_start = key.offset - backref_offset; > *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi); > @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > * find any we must cow > */ > if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode), > - key.offset - backref_offset, disk_bytenr)) > + key.offset - backref_offset, disk_bytenr, > + fi_gen)) > goto out; > > /* > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 704a1b8..07faabf 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle > *trans, > BUG_ON(ret < 0); > > btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr); > + btrfs_set_file_extent_generation(leaf, fi, trans->transid); > dirty = 1; > > key.offset -= btrfs_file_extent_offset(leaf, fi); > -- > 1.7.7 >Liu Bo, Thank you for taking the time to produce this patch. I''m not in a position to try a kernel compile and will be on vacation next week. I hope someone else would like to give it a try and use a little dd magic to test it. Thanks, Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Liu Bo <bo.li.liu@oracle.com> > > Subject: [PATCH] Btrfs: fix broken nocow after a normal balance > > Balance will create reloc_root for each fs root, and it''s going to > record last_snapshot to filter shared blocks. The side effect of > setting last_snapshot is to break nocow attributes of files. > > So here we update file extent''s generation while walking relocated > file extents in data reloc root, and use file extent''s generation > instead for checking if we have cross refs for the file extent. > > That way we can make nocow happy again and have no impact on others. > > Reported-by: Kyle Gates <kylegates@hotmail.com> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > --- > fs/btrfs/ctree.h | 2 +- > fs/btrfs/extent-tree.c | 18 +++++++++++++----- > fs/btrfs/inode.c | 10 ++++++++-- > fs/btrfs/relocation.c | 1 + > 4 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 4560052..eb2e782 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct > btrfs_root *root, > u64 bytenr, u64 num_bytes); > int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > - u64 objectid, u64 offset, u64 bytenr); > + u64 objectid, u64 offset, u64 bytenr, u64 gen); > struct btrfs_block_group_cache *btrfs_lookup_block_group( > struct btrfs_fs_info *info, > u64 bytenr); > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 1e84c74..f3b3616 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2816,7 +2816,8 @@ out: > static noinline int check_committed_ref(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > struct btrfs_path *path, > - u64 objectid, u64 offset, u64 bytenr) > + u64 objectid, u64 offset, u64 bytenr, > + u64 fi_gen) > { > struct btrfs_root *extent_root = root->fs_info->extent_root; > struct extent_buffer *leaf; > @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct > btrfs_trans_handle > *trans, > btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY)) > goto out; > > - if (btrfs_extent_generation(leaf, ei) <> - btrfs_root_last_snapshot(&root->root_item)) > + /* > + * Usually generation in extent item is larger than that in file extent > + * item because of delay refs. But we don''t want balance to break > + * file''s nocow behaviour, so use file_extent''s generation which has > + * been updates when we update fs root to point to relocated file > + * extents in data reloc root. > + */ > + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen); > + if (fi_gen <= btrfs_root_last_snapshot(&root->root_item)) > goto out; > > iref = (struct btrfs_extent_inline_ref *)(ei + 1); > @@ -2886,7 +2894,7 @@ out: > > int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > - u64 objectid, u64 offset, u64 bytenr) > + u64 objectid, u64 offset, u64 bytenr, u64 gen) > { > struct btrfs_path *path; > int ret; > @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle > *trans, > > do { > ret = check_committed_ref(trans, root, path, objectid, > - offset, bytenr); > + offset, bytenr, gen); > if (ret && ret != -ENOENT) > goto out; > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 2cfdd33..976b045 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1727,6 +1727,8 @@ next_slot: > ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi); > if (extent_type == BTRFS_FILE_EXTENT_REG || > extent_type == BTRFS_FILE_EXTENT_PREALLOC) { > + u64 gen; > + gen = btrfs_file_extent_generation(leaf, fi); > disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); > extent_offset = btrfs_file_extent_offset(leaf, fi); > extent_end = found_key.offset + > @@ -1749,7 +1751,8 @@ next_slot: > goto out_check; > if (btrfs_cross_ref_exist(trans, root, ino, > found_key.offset - > - extent_offset, disk_bytenr)) > + extent_offset, disk_bytenr, > + gen)) > goto out_check; > disk_bytenr += extent_offset; > disk_bytenr += cur_offset - found_key.offset; > @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > struct btrfs_key key; > u64 disk_bytenr; > u64 backref_offset; > + u64 fi_gen; > u64 extent_end; > u64 num_bytes; > int slot; > @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > } > disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi); > backref_offset = btrfs_file_extent_offset(leaf, fi); > + fi_gen = btrfs_file_extent_generation(leaf, fi); > > *orig_start = key.offset - backref_offset; > *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi); > @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct > btrfs_trans_handle > *trans, > * find any we must cow > */ > if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode), > - key.offset - backref_offset, disk_bytenr)) > + key.offset - backref_offset, disk_bytenr, > + fi_gen)) > goto out; > > /* > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 704a1b8..07faabf 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle > *trans, > BUG_ON(ret < 0); > > btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr); > + btrfs_set_file_extent_generation(leaf, fi, trans->transid); > dirty = 1; > > key.offset -= btrfs_file_extent_offset(leaf, fi); > -- > 1.7.7 >Sorry for the long wait in replying. This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu Raring kernel). I can probably try again on a newer version if you think it will help. This was my first kernel compile so I patched by hand and waited (10 hours on my old 32 bit single core machine). I did move some of the files off and back on to the filesystem to start fresh and compare but all seem to exhibit the same behavior after a balance. Thanks, Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:> >From: Liu Bo <bo.li.liu@oracle.com> > > > >Subject: [PATCH] Btrfs: fix broken nocow after a normal balance > > >[...] > > Sorry for the long wait in replying. > This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu > Raring kernel). I can probably try again on a newer version if you > think it will help. > This was my first kernel compile so I patched by hand and waited (10 > hours on my old 32 bit single core machine). > > I did move some of the files off and back on to the filesystem to > start fresh and compare but all seem to exhibit the same behavior > after a balance. >Thanks for testing the patch although it didn''t help you. Actually I tested it to be sure that it fixed the problems in my reproducer. So anyway can you please apply this debug patch in order to nail it down? thanks, liubo diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index df472ab..c12a11c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, goto out; if (btrfs_extent_generation(leaf, ei) <- btrfs_root_last_snapshot(&root->root_item)) + btrfs_root_last_snapshot(&root->root_item)) { + printk("extent gen %llu last_snap %llu\n", + btrfs_extent_generation(leaf, ei), + btrfs_root_last_snapshot(&root->root_item)); goto out; + } iref = (struct btrfs_extent_inline_ref *)(ei + 1); if (btrfs_extent_inline_ref_type(leaf, iref) !diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 23c596c..8cad6ee 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1317,16 +1317,24 @@ next_slot: goto out_check; if (btrfs_file_extent_compression(leaf, fi) || btrfs_file_extent_encryption(leaf, fi) || - btrfs_file_extent_other_encoding(leaf, fi)) + btrfs_file_extent_other_encoding(leaf, fi)) { + printk("special encoding\n"); goto out_check; - if (extent_type == BTRFS_FILE_EXTENT_REG && !force) + } + if (extent_type == BTRFS_FILE_EXTENT_REG && !force) { + printk("BTRFS_FILE_EXTENT_REF\n"); goto out_check; - if (btrfs_extent_readonly(root, disk_bytenr)) + } + if (btrfs_extent_readonly(root, disk_bytenr)) { + printk("ro\n"); goto out_check; + } if (btrfs_cross_ref_exist(trans, root, ino, found_key.offset - - extent_offset, disk_bytenr)) + extent_offset, disk_bytenr)) { + printk("cross ref\n"); goto out_check; + } disk_bytenr += extent_offset; disk_bytenr += cur_offset - found_key.offset; num_bytes = min(end + 1, extent_end) - cur_offset; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote: >>> From: Liu Bo <bo.li.liu@oracle.com> >>> >>> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance >>> >> [...] >> >> Sorry for the long wait in replying. >> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu >> Raring kernel). I can probably try again on a newer version if you >> think it will help. >> This was my first kernel compile so I patched by hand and waited (10 >> hours on my old 32 bit single core machine). >> >> I did move some of the files off and back on to the filesystem to >> start fresh and compare but all seem to exhibit the same behavior >> after a balance. >> > > Thanks for testing the patch although it didn''t help you. > Actually I tested it to be sure that it fixed the problems in my reproducer. > > So anyway can you please apply this debug patch in order to nail it down?Your patch can not fix the above problem is because we may update ->last_snapshot after we relocate the file data extent. For example, there are two block groups which will be relocated, One is data block group, the other is metadata block group. Then we relocate the data block group firstly, and set the new generation for the file data extent item/the relative extent item and set (new_generation - 1) for ->last_snapshot. After the relocation of this block group, we will end the transaction and drop the relocation tree. If we end the space balance now, we won''t break the nocow rule because ->last_snapshot is less than the generation of the file data extent item/the relative extent item. But there is still one block group which will be relocated, when relocating the second block group, we will also start a new transaction, and update ->last_snapshot if need. So, ->last_snapshot is greater than the generation of the file data extent item we set before. And the nocow rule is broken. Back to this above problem. I don''t think it is a serious problem, we only do COW once after the relocation, then we will still honour the nocow rule. The behaviour is similar to snapshot. So maybe it needn''t be fixed. If we must fix this problem, I think the only way is that get the generation at the beginning of the space balance, and then set it to ->last_snapshot if ->last_snapshot is less than it, don''t use (current_generation - 1) to update the ->last_snapshot. Besides that, don''t forget to store the generation into btrfs_balance_item, or the problem will happen after we resume the balance. Thanks Miao> > thanks, > liubo > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index df472ab..c12a11c 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, > goto out; > > if (btrfs_extent_generation(leaf, ei) <> - btrfs_root_last_snapshot(&root->root_item)) > + btrfs_root_last_snapshot(&root->root_item)) { > + printk("extent gen %llu last_snap %llu\n", > + btrfs_extent_generation(leaf, ei), > + btrfs_root_last_snapshot(&root->root_item)); > goto out; > + } > > iref = (struct btrfs_extent_inline_ref *)(ei + 1); > if (btrfs_extent_inline_ref_type(leaf, iref) !> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 23c596c..8cad6ee 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1317,16 +1317,24 @@ next_slot: > goto out_check; > if (btrfs_file_extent_compression(leaf, fi) || > btrfs_file_extent_encryption(leaf, fi) || > - btrfs_file_extent_other_encoding(leaf, fi)) > + btrfs_file_extent_other_encoding(leaf, fi)) { > + printk("special encoding\n"); > goto out_check; > - if (extent_type == BTRFS_FILE_EXTENT_REG && !force) > + } > + if (extent_type == BTRFS_FILE_EXTENT_REG && !force) { > + printk("BTRFS_FILE_EXTENT_REF\n"); > goto out_check; > - if (btrfs_extent_readonly(root, disk_bytenr)) > + } > + if (btrfs_extent_readonly(root, disk_bytenr)) { > + printk("ro\n"); > goto out_check; > + } > if (btrfs_cross_ref_exist(trans, root, ino, > found_key.offset - > - extent_offset, disk_bytenr)) > + extent_offset, disk_bytenr)) { > + printk("cross ref\n"); > goto out_check; > + } > disk_bytenr += extent_offset; > disk_bytenr += cur_offset - found_key.offset; > num_bytes = min(end + 1, extent_end) - cur_offset; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 28, 2013, Liu Bo wrote:> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote: >> >From: Liu Bo <bo.li.liu@oracle.com> >> > >> >Subject: [PATCH] Btrfs: fix broken nocow after a normal balance >> > >>[...] >> >> Sorry for the long wait in replying. >> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu >> Raring kernel). I can probably try again on a newer version if you >> think it will help. >> This was my first kernel compile so I patched by hand and waited (10 >> hours on my old 32 bit single core machine). >> >> I did move some of the files off and back on to the filesystem to >> start fresh and compare but all seem to exhibit the same behavior >> after a balance. >> > > Thanks for testing the patch although it didn''t help you. > Actually I tested it to be sure that it fixed the problems in my > reproducer. > > So anyway can you please apply this debug patch in order to nail it down? > > thanks, > liubo > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index df472ab..c12a11c 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct > btrfs_trans_handle *trans, > goto out; > > if (btrfs_extent_generation(leaf, ei) <> - btrfs_root_last_snapshot(&root->root_item)) > + btrfs_root_last_snapshot(&root->root_item)) { > + printk("extent gen %llu last_snap %llu\n", > + btrfs_extent_generation(leaf, ei), > + btrfs_root_last_snapshot(&root->root_item)); > goto out; > + } > > iref = (struct btrfs_extent_inline_ref *)(ei + 1); > if (btrfs_extent_inline_ref_type(leaf, iref) !> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 23c596c..8cad6ee 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1317,16 +1317,24 @@ next_slot: > goto out_check; > if (btrfs_file_extent_compression(leaf, fi) || > btrfs_file_extent_encryption(leaf, fi) || > - btrfs_file_extent_other_encoding(leaf, fi)) > + btrfs_file_extent_other_encoding(leaf, fi)) { > + printk("special encoding\n"); > goto out_check; > - if (extent_type == BTRFS_FILE_EXTENT_REG && !force) > + } > + if (extent_type == BTRFS_FILE_EXTENT_REG && !force) { > + printk("BTRFS_FILE_EXTENT_REF\n"); > goto out_check; > - if (btrfs_extent_readonly(root, disk_bytenr)) > + } > + if (btrfs_extent_readonly(root, disk_bytenr)) { > + printk("ro\n"); > goto out_check; > + } > if (btrfs_cross_ref_exist(trans, root, ino, > found_key.offset - > - extent_offset, disk_bytenr)) > + extent_offset, disk_bytenr)) { > + printk("cross ref\n"); > goto out_check; > + } > disk_bytenr += extent_offset; > disk_bytenr += cur_offset - found_key.offset; > num_bytes = min(end + 1, extent_end) - cur_offset; >In another email Miao Xie suggests this patch won''t help, so I''ll wait for more comments/suggestions. Thanks, Kyle -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 29, 2013 Miao Xie wrote:> On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote: >> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote: >>>> From: Liu Bo <bo.li.liu@oracle.com> >>>> >>>> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance >>>> >>> [...] >>> >>> Sorry for the long wait in replying. >>> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu >>> Raring kernel). I can probably try again on a newer version if you >>> think it will help. >>> This was my first kernel compile so I patched by hand and waited (10 >>> hours on my old 32 bit single core machine). >>> >>> I did move some of the files off and back on to the filesystem to >>> start fresh and compare but all seem to exhibit the same behavior >>> after a balance. >>> >> >> Thanks for testing the patch although it didn''t help you. >> Actually I tested it to be sure that it fixed the problems in my >> reproducer. >> >> So anyway can you please apply this debug patch in order to nail it down? > > Your patch can not fix the above problem is because we may > update ->last_snapshot > after we relocate the file data extent. > > For example, there are two block groups which will be relocated, One is > data block > group, the other is metadata block group. Then we relocate the data block > group firstly, > and set the new generation for the file data extent item/the relative > extent item and > set (new_generation - 1) for ->last_snapshot. After the relocation of this > block group, > we will end the transaction and drop the relocation tree. If we end the > space balance now, > we won''t break the nocow rule because ->last_snapshot is less than the > generation of the file > data extent item/the relative extent item. But there is still one block > group which will be > relocated, when relocating the second block group, we will also start a > new transaction, > and update ->last_snapshot if need. So, ->last_snapshot is greater than > the generation of the file > data extent item we set before. And the nocow rule is broken. > > Back to this above problem. I don''t think it is a serious problem, we only > do COW once after > the relocation, then we will still honour the nocow rule. The behaviour is > similar to snapshot. > So maybe it needn''t be fixed.I would argue that for large vm workloads, running a balance or adding disks is a common practice that will result in a drastic drop in performance as well as massive increases in metadata writes and fragmentation. In my case my disks were thrashing severely, performance was poor and ntp couldn''t even hold my clock stable. If the fix is nontrival please add this to the todo list. Thanks, Kyle> If we must fix this problem, I think the only way is that get the > generation at the beginning > of the space balance, and then set it to ->last_snapshot > if ->last_snapshot is less than it, > don''t use (current_generation - 1) to update the ->last_snapshot. Besides > that, don''t forget > to store the generation into btrfs_balance_item, or the problem will > happen after we resume the > balance. > > Thanks > Miao > >> >> thanks, >> liubo >> >> [...] >> >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html