Hi! On apt-get dist-upgrading my Amarok ThinkPad T23 with BTRFS as / and as /home I get extremely slow operation - my ThinkPad T42 with Ext4 is running circles around it and thats likely not only due to the faster CPU. vmstat 1 shows: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 0 4 151452 75016 68 382084 0 0 1164 28 595 1959 31 19 0 50 0 3 151452 74272 68 382560 0 0 488 0 538 1735 10 9 0 81 4 2 151452 71644 68 385776 0 0 3804 0 663 1886 56 38 0 6 3 2 151452 66916 68 387192 0 0 1264 0 633 1018 74 24 0 2 1 3 151452 63296 68 389336 0 0 1580 0 656 4095 80 20 0 0 2 3 151452 66272 68 390028 8 0 572 0 601 3449 40 17 0 43 3 2 151452 65032 68 390828 0 0 760 0 673 2364 42 25 0 32 3 2 151452 61816 68 393508 0 0 2672 0 748 2203 52 29 0 19 2 2 151452 60824 68 394236 0 0 724 0 660 2338 51 22 0 27 4 2 151452 59832 68 395024 0 0 808 0 662 2309 40 20 0 40 0 2 151452 58708 68 395856 0 0 812 12 683 2217 46 23 0 30 0 2 151452 57964 68 396416 0 0 512 0 619 2196 41 24 0 35 I know laptop harddisks aren´t the fastest, but AFAIR the T23 felt way faster with Ext3/4. I get quite some stalles when opening a new window in "screen". It can take 10-20 seconds to load the new Z-Shell into it. Also Amarok stops playing music for a while sometimes which it didn´t with Ext3/4. I suspect that the kernel does not service an I/O request of Amarok quickly enough. Surprisingly I do not see an excessive amount of CPU usage of brtfs kernel threads with atop. But the disk seems to be quite busy with block out rates in vmstat of merely a few thousands at maximum. Thus I suspect fragmentation of btrfs trees or files. The filesystems has the following specifics - apt-get will work on / only obviously: deepdance:~> btrfs filesystem show failed to read /dev/sr0 Label: ''debian'' uuid: 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 Total devices 1 FS bytes used 7.34GB devid 1 size 15.00GB used 14.97GB path /dev/dm-0 Label: ''home'' uuid: a600de65-e1ab-4cbf-b150-bbaeaf9fa98d Total devices 1 FS bytes used 28.13GB devid 1 size 80.00GB used 40.54GB path /dev/dm-2 Btrfs Btrfs v0.19 deepdance:~> btrfs filesystem df / Data: total=11.23GB, used=6.84GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.86GB, used=510.99MB I cleaned out a lot of packages due to the slow dist-upgrades already and also cause I do not need them on that laptop anymore. Thus the data tree only uses half of the allocated space. BTRFS doesn´t seem to give space back to the pool for all trees. Maybe it will do that on btrfs filesystem balance? home is: deepdance:~> btrfs filesystem df /home Data: total=37.01GB, used=27.54GB System, DUP: total=8.00MB, used=12.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.75GB, used=598.76MB Metadata: total=8.00MB, used=0.00 deepdance:~> BTW why does it have two metadata and systems trees while / only have one? Currently I have: deepdance:~> cat /proc/version Linux version 3.0.0-2-686-pae (Debian 3.0.0-6) (ben@decadent.org.uk) (gcc version 4.5.3 (Debian 4.5.3-9) ) #1 SMP Wed Nov 2 05:29:50 UTC 2011 from Debian Wheezy. Free memory is quite okay: deepdance:~> free -m total used free shared buffers cached Mem: 755 699 55 0 0 347 -/+ buffers/cache: 352 402 Swap: 2047 148 1899 I am wondering on how to optimize performance on the / BTRFS filesystem. I am not sure whether to try btrfs filesystem balance or btrfs filesystem defragment /. I also wonder whether some Debian package management related file might be fragmented. But the ones I tested do not seem to be: deepdance:/var/lib/dpkg> filefrag available available: 1 extent found deepdance:/var/lib/dpkg> filefrag status status: 1 extent found deepdance:/var/lib/dpkg> But then I also do not know whether "filefrag" from "e2fsprogs" 1.42~WIP-2011-10-16-1 will work with BTRFS. Any advice? Its not critical for me to fix these issues (soon), but I am curious whether its possible to get the filesystem speedier by some maintenance. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Freitag, 16. Dezember 2011 schrieb Martin Steigerwald:> Its not critical for me to fix these issues (soon), but I am curious > whether its possible to get the filesystem speedier by some > maintenance.Maybe after it is clear why it is so slow in the first place ;). -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, 16 December, 2011 18:54:46 you wrote:> Am Freitag, 16. Dezember 2011 schrieb Martin Steigerwald: > > Its not critical for me to fix these issues (soon), but I am curious > > whether its possible to get the filesystem speedier by some > > maintenance. > > Maybe after it is clear why it is so slow in the first place ;).I had the same experience. apt-get upgrade was a frustrating experience! IIRC the copy-on-write file-system in order to have good performance have to merge the write requests most as possible. Instead apt-get makes a lot of sync calls which don''t allow btrfs to merge the write requests. This explains why btrfs is slow in this case. I found a solution, but requires a bit of setup. The idea is to avoid do perform sync during the package installation. In order to avoid data loss in case of failure, I create a snapshot before the upgrading. If something goes wrong (i.e. a power failure) I rebooot the system from the snapshot. If the installation finish without problem, I flush all the data to the disk and remove the snapshot. For the detail, see a my old post titled "[RFC] aptitude & BTRFS slow" (2011-10-19) BR G.Baroncelli -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Freitag, 16. Dezember 2011 schrieb Goffredo Baroncelli:> On Friday, 16 December, 2011 18:54:46 you wrote: > > Am Freitag, 16. Dezember 2011 schrieb Martin Steigerwald: > > > Its not critical for me to fix these issues (soon), but I am > > > curious whether its possible to get the filesystem speedier by > > > some maintenance. > > > > Maybe after it is clear why it is so slow in the first place ;). > > I had the same experience. apt-get upgrade was a frustrating > experience! > > IIRC the copy-on-write file-system in order to have good performance > have to merge the write requests most as possible. > > Instead apt-get makes a lot of sync calls which don''t allow btrfs to > merge the write requests. This explains why btrfs is slow in this > case.Ah, I see. AFAIR there have been added an option for apt/aptitude to omit the fsync itself. Hmmm, a co-worker had the issue of Iceweasel with lots of tabs open being slow and I suspected that high fsync() usage of SQLite3 databases for bookmarks and stuff might be the culprit. The issue went away for him after switching to Ext4.> I found a solution, but requires a bit of setup. > > The idea is to avoid do perform sync during the package installation. > In order to avoid data loss in case of failure, I create a snapshot > before the upgrading. If something goes wrong (i.e. a power failure) I > rebooot the system from the snapshot. If the installation finish > without problem, I flush all the data to the disk and remove the > snapshot. > > For the detail, see a my old post titled "[RFC] aptitude & BTRFS slow" > (2011-10-19)Sounds more like a workaround to me than a solution. I feel reluctant about working around what seems to be a filesystem limitation. (A filesystem should not break, i.e. slow down an existing user space application beyond a certain limit I think). I wonder whether it might be a good idea to have nodatacow for /: nodatacow - Do not copy-on-write data. datacow is used to ensure the user either has access to the old version of a file, or to the newer version of the file. datacow makes sure we never have partially updated files written to disk. nodatacow gives slight performance boost by directly overwriting data (like ext[234]), at the expense of potentially getting partially updated files on system failures. Performance gain is usually < 5% unless the workload is random writes to large database files, where the difference can become very large (see https://btrfs.wiki.kernel.org/articles/g/e/t/Getting_started.html) Then writing of files would be back to the Ext3/4 way of doing it. What do you think? PS: I am not sure whether its just aptitude. I have occassional audio stalls even while not upgrading the system. But then that might be pulseaudio although sound playback threads are running with realtime priority. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Freitag, 16. Dezember 2011 schrieb Martin Steigerwald:> I wonder whether it might be a good idea to have nodatacow for /:Nope. Doesn´t seem to help much. How to turn it off, after turning it on? deepdance:~> LANG=C mount -o remount,datacow / mount: / not mounted already, or bad option Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 16 Dec 2011 21:58:45 +0100 Martin Steigerwald <Martin@lichtvoll.de> wrote:> Nope. Doesn´t seem to help much. > > How to turn it off, after turning it on? > > deepdance:~> LANG=C mount -o remount,datacow / > mount: / not mounted already, or bad optionIn debian you can disable syncing on per-process basis: http://packages.debian.org/sid/eatmydata $ eatmydata apt-get install foo $ eatmydata firefox $ eatmydata liferea makes things more bearable HTH -- Sergei
Am Samstag, 17. Dezember 2011 schrieb Sergei Trofimovich:> On Fri, 16 Dec 2011 21:58:45 +0100 > > Martin Steigerwald <Martin@lichtvoll.de> wrote: > > Nope. Doesn´t seem to help much. > > > > How to turn it off, after turning it on? > > > > deepdance:~> LANG=C mount -o remount,datacow / > > mount: / not mounted already, or bad option > > In debian you can disable syncing on per-process basis: > http://packages.debian.org/sid/eatmydata > > $ eatmydata apt-get install foo > $ eatmydata firefox > $ eatmydata liferea > > makes things more bearableI am not ready to accept that this is the proper answer to what I experience. Applications using fsync() are realistic real world scenarios and I think BTRFS has to cope with that. Yesterday I upgraded the laptop to 3.2-rc4. After converting the inode cache the filesystem appeared to be faster, but I have to wait for some Debian packages to pile up on the repository servers to get a real impression. I think I will scrub / balance / defragment the filesystem after a backup. But I am not sure in what order. I understand that defragment defragments files. But then what does balance do? For RAID setup I have seen it distributing data evenly across drives when I echo > /sys/block/sda/[…]/delete a drive before and BTRFS had to distribute unevenly cause of that. But what does it do on a filesystem on a single drive? I bet it would balance out trees? Will it resize trees with lots of unused space as well? According to deepdance:~> btrfs filesystem df / Data: total=11.23GB, used=6.98GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.86GB, used=511.35MB deepdance:~> btrfs filesystem show […] Label: ''debian'' uuid: 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 Total devices 1 FS bytes used 7.48GB devid 1 size 15.00GB used 14.97GB path /dev/dm-0 Btrfs Btrfs v0.19 the filesystem might have had some chances to fragment heavily, cause the tree sizes add up almost to the 15 GB of space available. I also remember that for some time the filesystem was nearly full which would explain the tree sizes. Ciao, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 17 Dec 2011 04:51:51 AM Martin Steigerwald wrote:> Currently I have: > > deepdance:~> cat /proc/version > Linux version 3.0.0-2-686-pae (Debian 3.0.0-6)You are using a fairly old kernel btrfs-wise, I believe there''s been work done in the 3.2 rc''s to improve performance so I''d suggest it''s well worth testing with 3.2-rc6 to see whether that helps. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP
On Sat, Dec 17, 2011 at 12:09:56PM +0100, Martin Steigerwald wrote:> I think I will scrub / balance / defragment the filesystem after a backup. > But I am not sure in what order. > > I understand that defragment defragments files. But then what does balance > do? For RAID setup I have seen it distributing data evenly across drives > when I echo > /sys/block/sda/[…]/delete a drive before and BTRFS had to > distribute unevenly cause of that. But what does it do on a filesystem on a > single drive? I bet it would balance out trees? Will it resize trees with > lots of unused space as well?The metadata trees are automatically balanced, simply by the nature of the B-tree algorithms used. Balance won''t, in general, affect them. The only thing that a balance will achieve on a single-disk filesystem is to reclaim unused space from allocated block groups -- so the "total" value in your Data and Metadata entries below will go down.> According to > > deepdance:~> btrfs filesystem df / > Data: total=11.23GB, used=6.98GB > System, DUP: total=8.00MB, used=4.00KB > System: total=4.00MB, used=0.00 > Metadata, DUP: total=1.86GB, used=511.35MB > deepdance:~> btrfs filesystem show > […] > Label: ''debian'' uuid: 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 > Total devices 1 FS bytes used 7.48GB > devid 1 size 15.00GB used 14.97GB path /dev/dm-0 > > Btrfs Btrfs v0.19 > > the filesystem might have had some chances to fragment heavily, cause the > tree sizes add up almost to the 15 GB of space available. > > I also remember that for some time the filesystem was nearly full which > would explain the tree sizes.For metadata, the lower bound on size is 0.1% of the data size (because checksums are computed at 4 bytes for every 4096 bytes of data). However, data usage can be very much greater than this with inline extents, where small files can get embedded directly in the metadata section. This is probably more likely what explains the tree sizes. I understand (although I''ve not done the analysis myself) that the maximum "wasted" space in btrfs''s B-tree implementation is 50%. To the best of my knowledge, there''s no compaction process for btrfs''s trees available -- nor, in general, should you need one, as a fully- compacted tree would only have to be rearranged when more data is added to it, thus slowing the system down after compaction. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I''ll take your bet, but make it ten thousand francs. I''m only --- a _poor_ corrupt official.
Am Samstag, 17. Dezember 2011 schrieb Hugo Mills:> On Sat, Dec 17, 2011 at 12:09:56PM +0100, Martin Steigerwald wrote: > > I think I will scrub / balance / defragment the filesystem after a > > backup. But I am not sure in what order. > > > > I understand that defragment defragments files. But then what does > > balance do? For RAID setup I have seen it distributing data evenly > > across drives when I echo > /sys/block/sda/[…]/delete a drive before > > and BTRFS had to distribute unevenly cause of that. But what does it > > do on a filesystem on a single drive? I bet it would balance out > > trees? Will it resize trees with lots of unused space as well? > > The metadata trees are automatically balanced, simply by the nature > of the B-tree algorithms used. Balance won''t, in general, affect them. > The only thing that a balance will achieve on a single-disk filesystem > is to reclaim unused space from allocated block groups -- so the > "total" value in your Data and Metadata entries below will go down.But thats only for optical viewing pleasure as far as I understood you? Only if there would be not enough free space for one tree to extend then a balance would make sense? I.e. when I had a lot of metadata so that the metadata would need to extend (which seems unlikely given below figures).> > According to > > > > deepdance:~> btrfs filesystem df / > > Data: total=11.23GB, used=6.98GB > > System, DUP: total=8.00MB, used=4.00KB > > System: total=4.00MB, used=0.00 > > Metadata, DUP: total=1.86GB, used=511.35MB > > deepdance:~> btrfs filesystem show > > […] > > Label: ''debian'' uuid: 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 > > > > Total devices 1 FS bytes used 7.48GB > > devid 1 size 15.00GB used 14.97GB path /dev/dm-0 > > > > Btrfs Btrfs v0.19 > > > > the filesystem might have had some chances to fragment heavily, cause > > the tree sizes add up almost to the 15 GB of space available. > > > > I also remember that for some time the filesystem was nearly full > > which would explain the tree sizes. > > For metadata, the lower bound on size is 0.1% of the data size > (because checksums are computed at 4 bytes for every 4096 bytes of > data). However, data usage can be very much greater than this with > inline extents, where small files can get embedded directly in the > metadata section. This is probably more likely what explains the tree > sizes. > > I understand (although I''ve not done the analysis myself) that the > maximum "wasted" space in btrfs''s B-tree implementation is 50%. To the > best of my knowledge, there''s no compaction process for btrfs''s trees > available -- nor, in general, should you need one, as a fully- > compacted tree would only have to be rearranged when more data is > added to it, thus slowing the system down after compaction.If I understand this correctly this means I can skip the balance step completely. I might still be doing the balance for that optical viewing pleasure ;). Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, 16 December, 2011 20:53:58 Martin Steigerwald wrote:> > I found a solution, but requires a bit of setup. > > > > > > > > The idea is to avoid do perform sync during the package installation. > > In order to avoid data loss in case of failure, I create a snapshot > > before the upgrading. If something goes wrong (i.e. a power failure) I > > rebooot the system from the snapshot. If the installation finish > > without problem, I flush all the data to the disk and remove the > > snapshot. > > > > > > > > For the detail, see a my old post titled "[RFC] aptitude & BTRFS slow" > > (2011-10-19) > > Sounds more like a workaround to me than a solution.Sorry but I strongly disagree. Aptitude was designed for an ordinary filesystem. Where the only way to have a filesystem consistency is to issue a lot of sync for every package. But this doesn''t prevent to have an half package installed:(think about to an "openoffice" upgrade: in case of power failure, you could not have nor the old openoffice, nor the new one. Instead with the snapshot you can always have the old system or the new system. No half packages With BTRFS, I can say that the workaround[*] is using the sync and not the snapshot The true is that BTRFS is different from ext4 (or ext3, xfs....). You can use BTRFS like ext4 and you will find a lot of regression like this. BTRFS is very different from an ordinary filesystem, and you have to change some behaviour to take advantages with is peculiarities. Using the snapshot during an upgrade open a lot of possibility which are not allowed with EXT4. With snapshot you can always go back if during an upgrade if something goes wrong (like strange packages dependencies). Or you can have the previous configuration to go back in case of trouble. [*] Of course this is due to the fact that the most part of the filesystem is like ext4. Supporting BTRFS could be not the highest priority. -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 17, 2011 at 12:38:07PM +0100, Martin Steigerwald wrote:> Am Samstag, 17. Dezember 2011 schrieb Hugo Mills: > > On Sat, Dec 17, 2011 at 12:09:56PM +0100, Martin Steigerwald wrote: > > > I think I will scrub / balance / defragment the filesystem after a > > > backup. But I am not sure in what order. > > > > > > I understand that defragment defragments files. But then what does > > > balance do? For RAID setup I have seen it distributing data evenly > > > across drives when I echo > /sys/block/sda/[…]/delete a drive before > > > and BTRFS had to distribute unevenly cause of that. But what does it > > > do on a filesystem on a single drive? I bet it would balance out > > > trees? Will it resize trees with lots of unused space as well? > > > > The metadata trees are automatically balanced, simply by the nature > > of the B-tree algorithms used. Balance won''t, in general, affect them. > > The only thing that a balance will achieve on a single-disk filesystem > > is to reclaim unused space from allocated block groups -- so the > > "total" value in your Data and Metadata entries below will go down. > > But thats only for optical viewing pleasure as far as I understood you? > > Only if there would be not enough free space for one tree to extend then a > balance would make sense? I.e. when I had a lot of metadata so that the > metadata would need to extend (which seems unlikely given below figures).From the context, I think you''re misusing the term "tree" here to mean "block group type" (i.e. data or metadata). That aside, though, yes, you''re right, it''s effectively only cosmetic -- although it can be useful if you have a fully-allocated filesystem where (for example) data is full and there''s lots of metadata space free, and you want to write more data. In that case, the FS wants to allocate another Data block group, but can''t because there''s no raw storage left to allocate from, despite there being lots of free space in the allocated Metadata block groups. A balance in that case would free up some of the metadata block groups and allow that space to be reallocated as data. (I think it tries to do this anyway, but I''m not 100% sure about that).> > > According to > > > > > > deepdance:~> btrfs filesystem df / > > > Data: total=11.23GB, used=6.98GB > > > System, DUP: total=8.00MB, used=4.00KB > > > System: total=4.00MB, used=0.00 > > > Metadata, DUP: total=1.86GB, used=511.35MB > > > deepdance:~> btrfs filesystem show > > > […] > > > Label: ''debian'' uuid: 2bf5b1dc-1d89-4f0d-a561-1a5551a27275 > > > > > > Total devices 1 FS bytes used 7.48GB > > > devid 1 size 15.00GB used 14.97GB path /dev/dm-0 > > > > > > Btrfs Btrfs v0.19 > > > > > > the filesystem might have had some chances to fragment heavily, cause > > > the tree sizes add up almost to the 15 GB of space available. > > > > > > I also remember that for some time the filesystem was nearly full > > > which would explain the tree sizes. > > > > For metadata, the lower bound on size is 0.1% of the data size > > (because checksums are computed at 4 bytes for every 4096 bytes of > > data). However, data usage can be very much greater than this with > > inline extents, where small files can get embedded directly in the > > metadata section. This is probably more likely what explains the tree > > sizes. > > > > I understand (although I''ve not done the analysis myself) that the > > maximum "wasted" space in btrfs''s B-tree implementation is 50%. To the > > best of my knowledge, there''s no compaction process for btrfs''s trees > > available -- nor, in general, should you need one, as a fully- > > compacted tree would only have to be rearranged when more data is > > added to it, thus slowing the system down after compaction. > > If I understand this correctly this means I can skip the balance step > completely.Pretty much.> I might still be doing the balance for that optical viewing pleasure ;).:) It can''t hurt, and with such a small FS it probably won''t take long. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I''ll take your bet, but make it ten thousand francs. I''m only --- a _poor_ corrupt official.
Am Samstag, 17. Dezember 2011 schrieben Sie:> On Friday, 16 December, 2011 20:53:58 Martin Steigerwald wrote: > > > I found a solution, but requires a bit of setup. > > > > > > > > > > > > The idea is to avoid do perform sync during the package > > > installation. In order to avoid data loss in case of failure, I > > > create a snapshot before the upgrading. If something goes wrong > > > (i.e. a power failure) I rebooot the system from the snapshot. If > > > the installation finish without problem, I flush all the data to > > > the disk and remove the snapshot. > > > > > > > > > > > > For the detail, see a my old post titled "[RFC] aptitude & BTRFS > > > slow" (2011-10-19) > > > > Sounds more like a workaround to me than a solution. > > Sorry but I strongly disagree. > > Aptitude was designed for an ordinary filesystem. Where the only way to > have a filesystem consistency is to issue a lot of sync for every > package. But this doesn''t prevent to have an half package > installed:(think about to an "openoffice" upgrade: in case of power > failure, you could not have nor the old openoffice, nor the new one. > Instead with the snapshot you can always have the old system or the new > system. No half packages > > With BTRFS, I can say that the workaround[*] is using the sync and not > the snapshot > > The true is that BTRFS is different from ext4 (or ext3, xfs....). You > can use BTRFS like ext4 and you will find a lot of regression like > this. > > BTRFS is very different from an ordinary filesystem, and you have to > change some behaviour to take advantages with is peculiarities.This reminds me of the delayed allocation discussion as Ext4 introduced that feature. Ext3/4 developer Theodore T´so said if the applications are not using fsync() its their fault. But before OTOH applications began to avoid using fsync() since it has had serious performance drawbacks on ext3 (not ext4) with data=ordered. Ext4 now has workarounds for the rename and truncate cases, after Linus requested boldly to not break existing userspace. Now applications that use fsync() the way Theodore T´so and other see it correctly used should now skip the fsync() on a BTRFS? I find it *highly* problematic when applications are required to adapt their behavior depending of the filesystem being in use. This just doesn´t make sense to me. If BTRFS has other means to guarantee filesystem consistency that is faster it might still make fsync() a no-op or just creating a snapshot temporarily automatically.> Using the snapshot during an upgrade open a lot of possibility which > are not allowed with EXT4. With snapshot you can always go back if > during an upgrade if something goes wrong (like strange packages > dependencies). Or you can have the previous configuration to go back > in case of trouble.Adding new possibilities is one thing. And supporting snapshots properly would depend on some support side from the applications. I think that using snapshots for upgrades is a good idea. But OTOH I think that BTRFS should not break or slow down existing userspace. I think that existing approaches like using fsync() like according to quite some filesystem developers it should be used should continue to work nicely. Similar goes for the hardlink limit.> [*] Of course this is due to the fact that the most part of the > filesystem is like ext4. Supporting BTRFS could be not the highest > priority.I do think that a if fs=ext4 then do this if fs=btrfs then do this and if fs=ext3 + data=ordered then do this if fs=ext3 + data=ordered + kernel=whatnot then do it a tad bit differently if fs=unkown then assume this in a application is just kind about broken and always thought that one main task of a filesystem would be to lift off the burden on the details on how data is saved from the application. Ok, some guidelines might be needed like if you save 10 bytes 1000 times it might be less performant than saving 10000 bytes at once, but aside from that… So I think BTRFS should have a fast fsync - that fullfils the consistency guarentee by whatever compatible way it sees fit - and for the system partition I would even trade in the cow functionality. I didn´t have it with Ext4 anyway. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb Hugo Mills:> > > The metadata trees are automatically balanced, simply by the > > >nature > > > > > > of the B-tree algorithms used. Balance won''t, in general, affect > > > them. The only thing that a balance will achieve on a single-disk > > > filesystem is to reclaim unused space from allocated block groups > > > -- so the "total" value in your Data and Metadata entries below > > > will go down. > > > > > > > > But thats only for optical viewing pleasure as far as I understood > > you? > > > > > > > > Only if there would be not enough free space for one tree to extend > > then a balance would make sense? I.e. when I had a lot of metadata > > so that the metadata would need to extend (which seems unlikely > > given below figures). > > From the context, I think you''re misusing the term "tree" here to > mean "block group type" (i.e. data or metadata). > > That aside, though, yes, you''re right, it''s effectively only > cosmetic -- although it can be useful if you have a fully-allocated > filesystem where (for example) data is full and there''s lots of > metadata space free, and you want to write more data. In that case, > the FS wants to allocate another Data block group, but can''t because > there''s no raw storage left to allocate from, despite there being lots > of free space in the allocated Metadata block groups. A balance in > that case would free up some of the metadata block groups and allow > that space to be reallocated as data. (I think it tries to do this > anyway, but I''m not 100% sure about that).Okay, thats the more likely case then ;). Thanks for clearing that up, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb Chris Samuel:> On Sat, 17 Dec 2011 04:51:51 AM Martin Steigerwald wrote: > > Currently I have: > > > > deepdance:~> cat /proc/version > > Linux version 3.0.0-2-686-pae (Debian 3.0.0-6) > > You are using a fairly old kernel btrfs-wise, I believe there''s been > work done in the 3.2 rc''s to improve performance so I''d suggest it''s > well worth testing with 3.2-rc6 to see whether that helps.I am now using 3.2-rc4 from Debian package already. Currently I do not build own kernels. I have the subjective impression that after the initial rebuild of the inode cache it became faster. I have the following mount options: deepdance:~> grep btrfs /proc/mounts /dev/mapper/deepdance-debian / btrfs rw,relatime,space_cache,inode_cache 0 0 /dev/mapper/deepdance-home /home btrfs rw,relatime,space_cache,inode_cache 0 0 Might be good to use noatime for harddisks as well. BTW on my ThinkPad T520 I do not perceive performance issues for BTRFS as /. But then thats located on an Intel SSD 320 where seeks should not matter much. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb Martin Steigerwald:> If BTRFS has other means to guarantee filesystem consistency that is > faster it might still make fsync() a no-op or just creating a > snapshot temporarily automatically.To clear this up: It should only make it a no-op if it guarentees the consistency without it. Otherwise it should do whatever is necessary to guarantee it. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2011-12-17 at 13:00 +0100, Martin Steigerwald wrote:> BTW on my ThinkPad T520 I do not perceive performance issues for BTRFS as > /. But then thats located on an Intel SSD 320 where seeks should not > matter much.Okay, that would be consistent with the slow behaviour observed by others on fsync()-heavy workloads. Presumably this produces much more seeky IO patterns than current common filesystems; I wonder if this is a limitation of the current implementation or something that is an inherent properties of the data-structures being used? Cheers, David -- David McBride <dwm@doc.ic.ac.uk> Department of Computing, Imperial College, London -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Saturday, 17 December, 2011 12:54:47 you wrote: [...]> > This reminds me of the delayed allocation discussion as Ext4 introduced > that feature. > > Ext3/4 developer Theodore T´so said if the applications are not using > fsync() its their fault. But before OTOH applications began to avoid using > fsync() since it has had serious performance drawbacks on ext3 (not ext4) > with data=ordered. > > Ext4 now has workarounds for the rename and truncate cases, after Linus > requested boldly to not break existing userspace.IIRC the problem was data loss. Instead you was blaming (correctly) a slowness problem. Are two very different problem.> Now applications that use fsync() the way Theodore T´so and other see it > correctly used should now skip the fsync() on a BTRFS?I never say to not use the fsync() call. I am only arguing that for a package manager the fsync() call is not the best API. The package manager were designed with capabilities of the old file-systems in mind. At the time the sync(2) API was the only available. With this API it is impossible to have an atomic upgrade (all or nothing) of a package. With the new filesystems (BTRFS and ZFS ), the package manager have more options. They can create a snapshot at the beginning (of the old filesystem) and rollback if something goes wrong (I am simplifying a bit) . But the package manager have to be updated. As bonus you can avoid to use sync(2) which has performance drawbacks (specially with BTRFS). [...]> > > Using the snapshot during an upgrade open a lot of possibility which > > are not allowed with EXT4. With snapshot you can always go back if > > during an upgrade if something goes wrong (like strange packages > > dependencies). Or you can have the previous configuration to go back > > in case of trouble. > > Adding new possibilities is one thing. And supporting snapshots properly > would depend on some support side from the applications. I think that > using snapshots for upgrades is a good idea. > > But OTOH I think that BTRFS should not break or slow down existing > userspace. I think that existing approaches like using fsync() like > according to quite some filesystem developers it should be used should > continue to work nicely.Nobody wants to slowdown the application. But the life is full of compromises. If you want the speed of ext4, you can use ext4. If you want the snapshot capability and the COW guarantee you can use BTRFS, but you have some slowness. Of course the best would be have the speed of the ext4 with the capabilities of btrfs.... :-) Unfortunately today this is not available. [....]> > Thanks,Regards -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb Goffredo Baroncelli:> > Adding new possibilities is one thing. And supporting snapshots > > properly would depend on some support side from the applications. I > > think that using snapshots for upgrades is a good idea. > > > > > > > > But OTOH I think that BTRFS should not break or slow down existing > > userspace. I think that existing approaches like using fsync() like > > according to quite some filesystem developers it should be used > > should continue to work nicely. > > Nobody wants to slowdown the application. But the life is full of > compromises. If you want the speed of ext4, you can use ext4. If you > want the snapshot capability and the COW guarantee you can use BTRFS, > but you have some slowness. > > Of course the best would be have the speed of the ext4 with the > capabilities of btrfs.... :-) Unfortunately today this is not > available.Its perfectly acceptable for me that BTRFS does not deliver this yet. I understood your initial answer that its just that BTRFS is different and thus performs poorly in fsync() based workloads and thats about it. That its a principal issue. That part I didn´t agree too. Heck from the design differences of COW filesystem it might even be some sort of a principal issue. But then I like to see this as a challenge, not as a show stopper. Actually for me especially for that Amarok Thinkpad T23 there is no hurry. Its play BTRFS play machine. I just want to see what I can do with filesystem maintenance to bring it up to speed. Everything else is following development and upgrading kernels ;). Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb David McBride:> On Sat, 2011-12-17 at 13:00 +0100, Martin Steigerwald wrote: > > BTW on my ThinkPad T520 I do not perceive performance issues for > > BTRFS as /. But then thats located on an Intel SSD 320 where seeks > > should not matter much. > > Okay, that would be consistent with the slow behaviour observed by > others on fsync()-heavy workloads. Presumably this produces much more > seeky IO patterns than current common filesystems; I wonder if this is > a limitation of the current implementation or something that is an > inherent properties of the data-structures being used?All I can say is that the ThinkPad T520 doesn´t seem the best machine for testing the performance of software. I have seen nothing thats actually been slow on that machine yet. Its not a machine for triggering bottlenecks easily it seems to me. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Samstag, 17. Dezember 2011 schrieb Hugo Mills:> > I might still be doing the balance for that optical viewing pleasure > > ;). > > :) > > It can''t hurt, and with such a small FS it probably won''t take > long.Now I first did a defrag and then a balance. The balance was heavier I had music stalls of about 5 to 10 seconds at time. The defrag aborted quickly with a non-zero return code on second run: deepdance:~> btrfs filesystem defragment / ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C I wanted to start it via time. deepdance:~> /usr/bin/time btrfs filesystem defragment / Command exited with non-zero status 20 0.00user 1.26system 0:03.86elapsed 32%CPU (0avgtext+0avgdata 2160maxresident)k 2656inputs+70712outputs (2major+184minor)pagefaults 0swaps Nothing in dmesg. Does 20 as return code mean "already defragmented"? ;) I am looking forward to the new asynchronous defrag interface I read about somewhere. Current state now is: deepdance:~> btrfs filesystem df / Data: total=7.75GB, used=6.91GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=896.00MB, used=506.47MB Lets see how that fares. Balance did log something: [24065.740937] btrfs: found 4207 extents [24075.581494] btrfs: found 4207 extents [24077.982099] btrfs: relocating block group 24465375232 flags 1 [24090.418623] btrfs: found 1152 extents [24099.195646] btrfs: found 1152 extents [24100.994087] btrfs: relocating block group 24196939776 flags 1 [24124.823654] btrfs: found 3857 extents [24140.208385] btrfs: found 3857 extents [24142.334232] btrfs: relocating block group 23928504320 flags 1 [24164.219827] btrfs: found 534 extents [24171.483027] btrfs: found 534 extents [24176.021604] btrfs: relocating block group 23391633408 flags 1 [24230.123062] btrfs: found 8607 extents [24255.193673] btrfs: found 8607 extents [24258.142945] btrfs: relocating block group 22586327040 flags 1 [24271.875868] btrfs: relocating block group 22452109312 flags 36 [24322.334007] btrfs: found 19112 extents [24324.253074] btrfs: relocating block group 22317891584 flags 36 [24361.999904] btrfs: found 6934 extents [24362.927413] btrfs: relocating block group 22183673856 flags 36 [24393.151548] btrfs: found 9031 extents [24395.447755] btrfs: relocating block group 22049456128 flags 36 [24432.611355] btrfs: found 13216 extents [24435.508280] btrfs: relocating block group 20975714304 flags 1 [24574.903545] btrfs: found 14600 extents [24642.613698] btrfs: found 14586 extents [24647.144462] btrfs: relocating block group 20841496576 flags 36 [24730.473343] btrfs: found 19754 extents [24735.912210] btrfs: relocating block group 20707278848 flags 36 [24852.827906] btrfs: found 26482 extents [24853.838002] btrfs: relocating block group 20698890240 flags 34 [24854.825685] btrfs: found 1 extents [24855.858015] btrfs: relocating block group 20564672512 flags 36 [25001.321705] btrfs: found 31648 extents [25002.330616] btrfs: relocating block group 20430454784 flags 36 [25170.694953] btrfs: found 30709 extents [25173.027484] btrfs: relocating block group 20296237056 flags 36 [25240.022780] btrfs: found 19729 extents [25242.373217] btrfs: relocating block group 20162019328 flags 36 [25293.659547] btrfs: found 11857 extents [25294.514415] btrfs: relocating block group 20027801600 flags 36 [25381.873449] btrfs: found 20892 extents [25382.837313] btrfs: relocating block group 18954059776 flags 1 [25407.731124] btrfs: relocating block group 17880317952 flags 1 [25528.179185] btrfs: found 13850 extents [25572.737920] btrfs: found 13831 extents [25574.017807] btrfs: found 1 extents [25577.603801] btrfs: relocating block group 16806576128 flags 1 [25667.266953] btrfs: found 2448 extents [25689.503862] btrfs: found 2448 extents [25691.924348] btrfs: relocating block group 15732834304 flags 1 [25796.270409] btrfs: found 11264 extents [25838.860555] btrfs: found 11264 extents [25843.971106] btrfs: relocating block group 14659092480 flags 1 [25959.486034] btrfs: found 18680 extents [26037.370148] btrfs: found 18680 extents [26040.637078] btrfs: relocating block group 13585350656 flags 1 [26131.997384] btrfs: found 26798 extents [26211.759652] btrfs: found 26787 extents [26215.846016] btrfs: relocating block group 12511608832 flags 1 [26331.196068] btrfs: found 33247 extents [26470.846542] btrfs: found 33197 extents [26479.487194] btrfs: relocating block group 12377391104 flags 36 [26503.391492] btrfs: found 4410 extents [26507.133189] btrfs: relocating block group 11303649280 flags 1 [26607.401285] btrfs: found 32999 extents [26770.759705] btrfs: found 32926 extents [26778.218628] btrfs: relocating block group 11169431552 flags 36 [26921.757006] btrfs: found 23449 extents [26922.956668] btrfs: relocating block group 11035213824 flags 36 [27047.652332] btrfs: found 21526 extents Appears quite fragmented to me, but as I do not understand whats exactly behind this numbers I leave it as it. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Dec 17, 2011 at 05:35:15PM +0100, Martin Steigerwald wrote:> Am Samstag, 17. Dezember 2011 schrieb Hugo Mills: > > > I might still be doing the balance for that optical viewing pleasure > > > ;). > > > > :) > > > > It can''t hurt, and with such a small FS it probably won''t take > > long. > > Now I first did a defrag and then a balance. The balance was heavier I had > music stalls of about 5 to 10 seconds at time. > > The defrag aborted quickly with a non-zero return code on second run: > > deepdance:~> btrfs filesystem defragment / > ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C > > I wanted to start it via time. > > deepdance:~> /usr/bin/time btrfs filesystem defragment / > Command exited with non-zero status 20 > 0.00user 1.26system 0:03.86elapsed 32%CPU (0avgtext+0avgdata > 2160maxresident)k > 2656inputs+70712outputs (2major+184minor)pagefaults 0swaps > > Nothing in dmesg. Does 20 as return code mean "already defragmented"? ;)I''d have to check what return code 20 means, but... btrfs fi defrag is *not* recursive, so what you did is effectively a no-op anyway.> I am looking forward to the new asynchronous defrag interface I read about > somewhere. > > Current state now is: > > deepdance:~> btrfs filesystem df / > Data: total=7.75GB, used=6.91GB > System, DUP: total=8.00MB, used=4.00KB > System: total=4.00MB, used=0.00 > Metadata, DUP: total=896.00MB, used=506.47MB > > Lets see how that fares. > > Balance did log something: > > [24065.740937] btrfs: found 4207 extents > [24075.581494] btrfs: found 4207 extents > [24077.982099] btrfs: relocating block group 24465375232 flags 1[snip]> [24730.473343] btrfs: found 19754 extents > [24735.912210] btrfs: relocating block group 20707278848 flags 36 > [24852.827906] btrfs: found 26482 extents > [24853.838002] btrfs: relocating block group 20698890240 flags 34[snip]> Appears quite fragmented to me, but as I do not understand whats exactly > behind this numbers I leave it as it.The long numbers are block group IDs. These correspond to a position in the FS''s internal address space (which doesn''t, in the general case, map directly to anything -- there''s an internal tree that holds the map). The flags indicate what type of block group is being moved. These correspond to the line headings in "btrfs fi df", and are a bitmap. "flags 1" is a non-RAIDed data block group. "flags 34" is a DUP system block group, and "flags 36" is a DUP metadata block group. You''ll probably find a single reference to a block group with flags 2, which is the vestigial non-RAID System group you can see in your "btrfs fi df" output above. Extents are simply contiguous regions of storage, corresponding to parts (or all) of a file, or to individual tree blocks (which are 4k in size). The "found <N> extents" messages just indicate how many extents there are to move in the block group it''s currently looking at. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "I lost my leg in 1942. Some bastard stole it in a --- pub in Pimlico."
2011/12/16 Goffredo Baroncelli <kreijack@inwind.it>:> I found a solution, but requires a bit of setup.Did you try: echo force-unsafe-io >> /etc/dpkg/dpkg.cfg You need dpkg 1.16. Ciao, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ciao Andrea, On Sunday, 18 December, 2011 19:41:49 you wrote:> 2011/12/16 Goffredo Baroncelli <kreijack@inwind.it>: > > I found a solution, but requires a bit of setup. > > Did you try: > echo force-unsafe-io >> /etc/dpkg/dpkg.cfgstracing an apt-get update, it seems that --force-unsafe-io doesn''t stop all the sync command. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=613428> You need dpkg 1.16.# dpkg --version Debian `dpkg'' package management program version 1.16.1.2 (amd64). This is free software; see the GNU General Public License version 2 or later for copying conditions. There is NO warranty.> > Ciao, > Andrea-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html