Martin Steigerwald
2012-Dec-09 11:12 UTC
How to refresh degraded BTRFS? free space fragmentation, file fragmentation...
Hi! I have BTRFS on some systems since more than two years. My experience so far is: Performance at the beginning is pretty good, but some of my more often used BTRFS filesystem degrade badly in different areas. On some workloads pretty quickly. There are also some fs however that did not degrade that badly. These were some that have way more free space left than the ones that degraded badly. About 900 GB freespace left on my eSATA backup disk with BTRFS that is also quite new. About 80 GB left on my BTRFS RAID 1 local home disk where I can build debian packages or kernels and such without the restrictions NFS brings (root squash). These still appear to be fine, but I redid the local home one with mkfs.btrfs -n 32768 and -l 32768 not to long ago, but I think it was quite fine before anyway, so I might have overdone it here. This already points at a way to prevent some degradation BTRFS filesystems: Leave more free space. 1) fsync speed on my ThinkPad T23 has gone down that much that I use eatmydata with apt-get dist-upgrade and Co. For that I intend to try out 3.7 kernel as soon as its packaged for Debian Experimental. (And hope that it resumes nicely again, all kernels since 3.3 didn´t and I do not really feel like bisecting this.) So I put this aside for now cause it may not be applicable with most recent kernel. And fsync performance hasn´t been good in the beginning. I think it degraded, but I am not completely sure. 2) File fragmentation: Example with a SUSE Manager VirtualBox on an BTRFS filesystem. The SUSE Manager box received packages for the software channels and put the metadata inside a PostgreSQL database. A SLES Client just installed. filefrag showed fragments to go up quickly to 20000, 30000, 40000 and more. The performance in the VMs was abysmal. I tried mount -o remount,autodefrag and then BTRFS got down the fragments to some thousands instead of ten thousands while raising disk activity quite a lot. The 2,5 inch external eSATA disk was almost all the time completely in use. But the VM performance was better. Not nice, but better. I do not have more exact data right now. 3) Freespace fragmentation on the / filesystem on this ThinkPad T520 with Intel SSD 320: === fstrim == merkaba:~> /usr/bin/time fstrim -v / /: 6849871872 bytes were trimmed 0.00user 5.99system 0:44.69elapsed 13%CPU (0avgtext+0avgdata 752maxresident)k 0inputs+0outputs (0major+237minor)pagefaults 0swaps It took a second or two in the beginning. atop: LVM | rkaba-debian | busy 91% | read 0 | write 10313 | MBw/s 67.48 | avio 0.20 ms | […] DSK | sda | busy 90% | read 0 | write 10319 | MBw/s 67.54 | avio 0.19 ms | […] PID TID RUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/2 6085 - root 1 0.29s 0.00s 0K 0K 0K 0K -- - D 0 13% fstrim 10000 write requests in 10 seconds. vmstat 1: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 3 0 1963688 3943380 156972 1827836 0 0 0 0 5421 15781 6 6 88 0 0 0 1963688 3943132 156972 1827852 0 0 0 0 5733 16478 9 7 83 0 1 0 1963688 3943008 156972 1827992 0 0 0 0 5050 14434 0 4 96 0 1 0 1963688 3949768 156972 1826708 0 0 0 0 5246 14960 2 5 93 0 0 0 1963688 3949644 156980 1826712 0 0 0 36 5104 14996 1 4 94 0 0 0 1963688 3949768 156980 1826720 0 0 0 0 5102 15210 2 4 94 0 3 0 1963688 3949644 156980 1826720 0 0 0 0 5321 15995 4 7 89 0 0 0 1963688 3949396 156980 1827188 0 0 0 0 5316 15616 6 5 88 0 1 0 1963688 3949148 156980 1827188 0 0 0 0 5102 14944 1 4 95 0 1 0 1963688 3949272 156980 1827188 0 0 0 0 5510 15928 5 6 89 0 1 0 1963688 3949272 156980 1827188 0 0 0 52 5107 15054 2 4 94 0 0 0 1963688 3949396 156980 1826868 0 0 0 4 4930 14567 1 4 95 0 1 0 1963688 3949396 156988 1826828 0 0 0 52 5132 15014 2 5 93 0 3 0 1963688 3949396 156988 1826836 0 0 0 0 5015 14447 1 4 95 0 0 0 1963688 3949520 156988 1826836 0 0 0 0 5233 15652 3 6 91 0 1 0 1963684 3949612 156988 1827172 0 0 0 3032 2546 7555 6 4 84 6 After fstrim: 0 0 1963684 3944244 157016 1827752 0 0 0 0 357 1018 2 1 97 0 1 0 1963684 3943776 157024 1827776 0 0 0 64 634 1660 4 2 93 0 0 0 1963684 3943872 157024 1827784 0 0 0 0 180 473 0 0 99 0 The I/O activity does not seem to be reflected in vmstat, I bet due to page cache not involved. === fallocate == merkaba:/var/tmp> /usr/bin/time fallocate -l 2G fallocate-test 0.00user 118.85system 2:00.50elapsed 98%CPU (0avgtext+0avgdata 720maxresident)k 14912inputs+49112outputs (0major+227minor)pagefaults 0swaps Peaks or CPU usage: cpu | sys 98% | user 0% | irq 0% | idle 0% | cpu002 w 2% | avgscal 100% | CPU | sys 102% | user 3% | irq 0% | idle 295% | wait 1% | avgscal 52% | cpu | sys 46% | user 1% | irq 0% | idle 53% | cpu001 w 0% | avgscal 63% | cpu | sys 29% | user 1% | irq 0% | idle 70% | cpu003 w 0% | avgscal 57% | cpu | sys 26% | user 1% | irq 0% | idle 73% | cpu002 w 0% | avgscal 55% | cpu | sys 1% | user 1% | irq 0% | idle 99% | cpu000 w 0% | avgscal 32% | PID TID RUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/3 6458 - root 0 2m00s 0.00s 0K 0K - - NE 0 E - 100% <fallocate> martin@merkaba:~> vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 1963676 3949112 157168 1827504 14 30 137 47 93 28 11 5 83 0 0 0 1963676 3943148 157168 1828228 0 0 0 0 508 1177 4 2 94 0 1 0 1963676 3943088 157168 1828164 0 0 0 0 584 1381 4 2 94 0 0 0 1963676 3942892 157168 1828164 0 0 0 0 712 1627 6 3 91 0 1 0 1963676 3942508 157168 1828420 0 0 168 0 1252 1432 0 17 82 0 1 0 1963676 3941800 157168 1829012 0 0 136 0 1661 1700 1 26 73 0 1 0 1963676 3940980 157176 1829796 0 0 172 44 1800 1842 1 25 74 0 1 0 1963676 3941088 157176 1829656 0 0 92 0 1701 1101 0 25 75 0 1 0 1963676 3945848 157176 1830092 0 0 140 0 1715 1300 0 25 75 0 1 0 1963676 3945848 157176 1829912 0 0 76 0 1506 1163 0 25 75 0 1 0 1963676 3939168 157176 1831120 0 0 40 0 1840 1164 1 25 74 0 1 0 1963676 3938528 157176 1831440 0 0 172 0 1652 1617 1 25 74 0 1 0 1963676 3939056 157176 1831224 0 0 44 48 1698 1798 1 27 73 0 1 0 1963676 3944452 157176 1831264 0 0 104 0 1383 1106 1 25 74 0 2 0 1963676 3944064 157176 1831644 0 0 88 0 1597 1301 1 26 74 0 1 0 1963676 3943816 157176 1831832 0 0 64 0 1572 1179 1 26 74 0 1 0 1963676 3943304 157176 1832232 0 0 148 0 2009 2600 1 25 74 0 1 0 1963676 3942932 157176 1832752 0 0 8 0 1917 2300 1 26 73 0 2 0 1963668 3942932 157184 1832816 0 0 36 148 1885 2269 2 26 72 0 1 0 1963668 3942428 157184 1833076 0 0 136 0 2063 2823 1 26 73 0 2 0 1963668 3942172 157184 1833628 0 0 84 0 2037 3236 4 26 69 0 1 0 1963668 3941924 157184 1833692 0 0 56 0 1982 2167 1 26 73 0 2 0 1963668 3927648 157184 1835672 0 0 124 0 2214 2734 6 26 68 0 1 0 1963668 3927648 157184 1835756 0 0 80 72 1638 1668 1 25 74 0 Filesystem type is: 9123683e File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096) ext logical physical expected length flags 0 0 2626450 2048 1 2048 3215128 2628498 2040 2 4088 3408631 3217168 2032 3 6120 3430045 3410663 2024 4 8144 3439999 3432069 2016 5 10160 3474610 3442015 1004 6 11164 3743715 3475614 1002 7 12166 2108412 3744717 1000 8 13166 2943991 2109412 998 9 14164 3107711 2944989 996 10 15160 3217168 3108707 994 11 16154 3324557 3218162 496 12 16650 3349504 3325053 495 13 17145 3350737 3349999 495 14 17640 3352158 3351232 494 15 18134 3355223 3352652 494 16 18628 3359558 3355717 493 17 19121 3367645 3360051 493 18 19614 3369156 3368138 492 19 20106 3382494 3369648 492 20 20598 3383027 3382986 491 21 21089 3385838 3383518 491 22 21580 3442449 3386329 490 23 22070 3470434 3442939 490 24 22560 3500244 3470924 489 25 23049 3532609 3500733 489 26 23538 3559176 3533098 489 27 24027 3561437 3559665 488 28 24515 3565004 3561925 488 29 25003 3569963 3565492 487 30 25490 3573446 3570450 487 31 25977 3735991 3573933 486 32 26463 3745098 3736477 486 33 26949 3901106 3745584 485 34 27434 3901681 3901591 485 35 27919 956052 3902166 484 36 28403 984140 956536 484 37 28887 1017986 984624 483 38 29370 1032244 1018469 483 39 29853 1478810 1032727 482 40 30335 1479480 1479292 482 41 30817 1480016 1479962 481 42 31298 1512813 1480497 481 43 31779 1515627 1513294 480 44 32259 1759660 1516107 480 45 32739 1866977 1760140 480 46 33219 2025589 1867457 479 47 33698 2044003 2026068 479 48 34177 2233664 2044482 478 49 34655 2246706 2234142 478 50 35133 2336760 2247184 477 51 35610 2348377 2337237 477 52 36087 2396156 2348854 476 53 36563 2453672 2396632 476 54 37039 2505829 2454148 475 55 37514 2559971 2506304 475 56 37989 2568049 2560446 474 57 38463 2569417 2568523 474 58 38937 2575922 2569891 473 59 39410 2578488 2576395 473 60 39883 2989056 2578961 946 61 40829 2995464 2990002 472 62 41301 3197446 2995936 471 63 41772 3206085 3197917 471 64 42243 3467053 3206556 470 65 42713 2579027 3467523 470 66 43183 2727531 2579497 469 67 43652 2729381 2728000 469 68 44121 2730137 2729850 468 69 44589 2875164 2730605 468 70 45057 2902010 2875632 467 71 45524 2917719 2902477 467 72 45991 2920037 2918186 467 73 46458 2930483 2920504 466 74 46924 2931689 2930949 466 75 47390 2941544 2932155 465 76 47855 2943422 2942009 465 77 48320 2955072 2943887 464 78 48784 2962691 2955536 464 79 49248 2964241 2963155 463 80 49711 2965864 2964704 463 81 50174 2979347 2966327 463 82 50637 2985719 2979810 462 83 51099 3033228 2986181 462 84 51561 4096111 3033690 461 85 52022 2913433 4096572 461 86 52483 2914231 2913894 230 87 52713 2915298 2914461 230 88 52943 2917405 2915528 230 89 53173 2918359 2917635 230 90 53403 2087430 2918589 459 91 53862 2109512 2087889 229 92 54091 2110584 2109741 229 93 54320 2111695 2110813 229 94 54549 2157184 2111924 229 95 54778 2158300 2157413 229 96 55007 2165613 2158529 229 97 55236 2167222 2165842 229 98 55465 2196837 2167451 228 99 55693 2199378 2197065 228 […] 306 106611 1243376 1168146 203 307 106814 1245114 1243579 203 308 107017 1294949 1245317 203 309 107220 1408543 1295152 203 310 107423 1408788 1408746 203 311 107626 1448445 1408991 203 312 107829 1451116 1448648 203 313 108032 1453560 1451319 203 314 108235 1459015 1453763 203 315 108438 1460375 1459218 203 316 108641 1461372 1460578 202 317 108843 1471758 1461574 202 […] 4526 522694 2939615 3455932 49 4527 522743 2517410 2939664 48 4528 522791 2460124 2517458 46 4529 522837 2458204 2460170 45 4530 522882 2479853 2458249 43 4531 522925 1687125 2479896 42 4532 522967 646064 1687167 41 4533 523008 497470 646105 40 4534 523048 4111482 497510 77 4535 523125 4097378 4111559 72 4536 523197 3949964 4097450 68 4537 523265 3499481 3950032 63 4538 523328 3499660 3499544 60 4539 523388 3495885 3499720 56 4540 523444 3498714 3495941 52 4541 523496 2960575 3498766 49 4542 523545 2482351 2960624 46 4543 523591 2481927 2482397 43 4544 523634 532779 2481970 40 4545 523674 4170769 532819 76 4546 523750 3935305 4170845 67 4547 523817 3498776 3935372 58 4548 523875 3502955 3498834 51 4549 523926 2489644 3503006 45 4550 523971 338996 2489689 39 4551 524010 4035101 339035 69 4552 524079 3506596 4035170 52 4553 524131 399363 3506648 39 4554 524170 3550735 399402 59 4555 524229 3553226 3550794 59 eof fallocate-test: 4556 extents found But: merkaba:/var/tmp> /usr/bin/time rm fallocate-test 0.00user 0.24system 0:00.38elapsed 63%CPU (0avgtext+0avgdata 784maxresident)k 4464inputs+36184outputs (0major+243minor)pagefaults 0swaps Some more information on the filesystem in question: merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi sh failed to read /dev/sr0 Label: ''debian'' uuid: […] Total devices 1 FS bytes used 13.56GB devid 1 size 18.62GB used 18.62GB path /dev/dm-0 Btrfs v0.19-239-g0155e84 merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi df / Disk size: 18.62GB Disk allocated: 18.62GB Disk unallocated: 0.00 Used: 13.56GB Free (Estimated): 3.31GB (Max: 3.31GB, min: 3.31GB) Data to disk ratio: 91 % merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi disk-usage / Data,Single: Size:15.10GB, Used:12.94GB /dev/dm-0 15.10GB Metadata,Single: Size:8.00MB, Used:0.00 /dev/dm-0 8.00MB Metadata,DUP: Size:1.75GB, Used:630.11MB /dev/dm-0 3.50GB System,Single: Size:4.00MB, Used:0.00 /dev/dm-0 4.00MB System,DUP: Size:8.00MB, Used:4.00KB /dev/dm-0 16.00MB Unallocated: /dev/dm-0 0.00 merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs dev disk-usage / /dev/dm-0 18.62GB Data,Single: 15.10GB Metadata,Single: 8.00MB Metadata,DUP: 3.50GB System,Single: 4.00MB System,DUP: 16.00MB Unallocated: 0.00 Compared to that an Ext4 in /home on the SSD that is almost full doesn´t show much signs aging degradation yet inspite being quite full and nepomuk trashes it quite good at times (virtuoso database): merkaba:/home> /usr/bin/time fallocate -l 2G fallocate-test 0.00user 0.01system 0:00.01elapsed 100%CPU (0avgtext+0avgdata 720maxresident)k 0inputs+0outputs (0major+229minor)pagefaults 0swaps (without FL_NO_HIDE_STALE stuff of course :) merkaba:/home> filefrag -v fallocate-test Filesystem type is: ef53 File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096) ext logical physical expected length flags 0 0 22091776 2048 unwritten 1 2048 22102016 22093824 2048 unwritten 2 4096 22149120 22104064 2048 unwritten 3 6144 22224896 22151168 2048 unwritten 4 8192 22261760 22226944 2048 unwritten 5 10240 22274048 22263808 2048 unwritten 6 12288 22278144 22276096 4096 unwritten 7 16384 22292480 22282240 2048 unwritten 8 18432 22306816 22294528 8192 unwritten 9 26624 22355968 22315008 4096 unwritten 10 30720 22411264 22360064 2048 unwritten 11 32768 22425600 22413312 4096 unwritten 12 36864 22476800 22429696 2048 unwritten 13 38912 22577152 22478848 2048 unwritten 14 40960 22603776 22579200 2048 unwritten 15 43008 22607872 22605824 2048 unwritten 16 45056 22620160 22609920 2048 unwritten 17 47104 22614016 22622208 4096 unwritten 18 51200 22646784 22618112 2048 unwritten 19 53248 22697984 22648832 2048 unwritten 20 55296 22738944 22700032 2048 unwritten 21 57344 22769664 22740992 4096 unwritten 22 61440 22775808 22773760 6144 unwritten 23 67584 22818816 22781952 4096 unwritten 24 71680 22867968 22822912 4096 unwritten 25 75776 22896640 22872064 8192 unwritten […] 150 483328 29599744 29501440 2048 unwritten 151 485376 29632512 29601792 2048 unwritten 152 487424 29646848 29634560 8192 unwritten 153 495616 29669376 29655040 10240 unwritten 154 505856 29685760 29679616 2048 unwritten 155 507904 29691904 29687808 2048 unwritten 156 509952 29696000 29693952 2048 unwritten 157 512000 29700096 29698048 2048 unwritten 158 514048 29712384 29702144 2048 unwritten 159 516096 29718528 29714432 2048 unwritten 160 518144 29736960 29720576 2048 unwritten 161 520192 29743104 29739008 2048 unwritten 162 522240 29767680 29745152 2048 unwritten,eof fallocate-test: 163 extents found merkaba:/home> /usr/bin/time rm fallocate-test 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 784maxresident)k 0inputs+0outputs (0major+244minor)pagefaults 0swaps merkaba:~> LANG=C df -hT /home Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/merkaba-home ext4 221G 209G 8.8G 96% /home I know this is still twice as much free space than with the BTRFS volume. And a different workload. And the BTRFS filesystem has been a bit fuller at times. I just do not have two filesystems that degrade by exactly the same workload at hand. merkaba:~> e2freefrag /dev/merkaba/home Device: /dev/merkaba/home Blocksize: 4096 bytes Total blocks: 58593280 Free blocks: 2921471 (5.0%) Min. free extent: 4 KB Max. free extent: 57344 KB Avg. free extent: 224 KB Num. free extent: 51323 HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range : Free extents Free Blocks Percent 4K... 8K- : 12118 12118 0.41% 8K... 16K- : 13221 29823 1.02% 16K... 32K- : 8431 42289 1.45% 32K... 64K- : 5952 63186 2.16% 64K... 128K- : 3657 80646 2.76% 128K... 256K- : 2483 109538 3.75% 256K... 512K- : 1740 154664 5.29% 512K... 1024K- : 1404 255117 8.73% 1M... 2M- : 1302 468132 16.02% 2M... 4M- : 487 335516 11.48% 4M... 8M- : 255 357015 12.22% 8M... 16M- : 182 455025 15.58% 16M... 32M- : 76 385687 13.20% 32M... 64M- : 15 140176 4.80% merkaba:/home> e4defrag -c /home <Fragmented files> now/best size/ext 1. /home/martin/Mail/[… some kmail index …] 7/1 4 KB 2. /home/martin/Mail/[… some kmail index …] 4/1 4 KB 3. /home/martin/[…]/.bzr/checkout/dirstate 4/1 4 KB 4. /home/martin/[… some small kexi database …].kexi 6/1 4 KB 5. /home/martin/.kde/share/apps/kraft/sqlite/kraft.db 15/1 5 KB Total/best extents 926792/904756 Average size per extent 238 KB Fragmentation score 0 [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag] This directory (/home) does not need defragmentation. Done. My questions now are: a) Are there other ways a BTRFS filesystem can degrade? b) How to diagnose degradation of BTRFS? How to diagnose which kind of aging slows down a given BTRFS filesystem? - some tool to measure free space fragmentation like e2freefrag (see above)? - some tool to measure file fragmentation like e4defrag -c (see above)? I think there might be some of this already? - btrfs-calc-size for tree diagnosing? merkaba:~> btrfs-calc-size /dev/merkaba/debian Calculating size of root tree 16.00KB total size, 0.00 inline data, 1 nodes, 3 leaves, 2 levels Calculating size of extent tree 53.66MB total size, 0.00 inline data, 222 nodes, 13515 leaves, 4 levels Calculating size of csum tree 19.58MB total size, 0.00 inline data, 76 nodes, 4936 leaves, 3 levels Calculatin'' size of fs tree 554.04MB total size, 198.86MB inline data, 2142 nodes, 139693 leaves, 4 levels Levels seem sane to me. - btrfs-debug-tree? But thats too much output for a regular admin I think. b) What do to about it? While I understand that the fragmentation issues are quite deeply related to the copy on write nature of BTRFS they are still an issue. Even on SSDs as I showed above. Granted boot speed of the filesystem and creating small files are still fast enough on the SSD. The SSD seems to compensate over that fragmentation quite well. But still is there anything that can be done? i) By enhancing BTRFS? - insert your suggestion here ii) By some admin tasks or filesystem maintenance? Are the safe ones that do really improve the FS layout instead of making it worse? Once I tried btrfs filesystem balance on the root filesystem mentioned above, and the net result was, that boot speed doubled according to systemd-analyze. - maybe (still) some btrfs filesystem balance run? - maybe some btrfs filesystem defragment runs? By a script recursively, as long as thats not implemented within btrfs command itself. I would prefer that (along the lines of e4defrag also with option -c to first diagnose whether defragmentation does make sense] What kinds of degradation are important to performance and which ones are not? iii) And what can be done to prevent degradation? - leave more free space, maybe lots more free space? I then just reformatted the volume and never tried a balance again so far. But I am ready to try suggestions on this FS, as I plan to redo it with mkfs.btrfs -l 16384 -n 16384 (big metadata) anyway. I think this will be questions admins will have when first production BTRFS filesystem start to degrade. And I thought it might be a good idea to think about good answers to them. Feel free to split out the thread by changing subject lines into free space fragmentation, file fragmentation ... where it makes sense. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2012-Dec-09 11:20 UTC
Re: How to refresh degraded BTRFS? free space fragmentation, file fragmentation...
Am Sonntag, 9. Dezember 2012 schrieb Martin Steigerwald:> Hi! > > I have BTRFS on some systems since more than two years. My experience so > far is: Performance at the beginning is pretty good, but some of my more > often used BTRFS filesystem degrade badly in different areas. On some > workloads pretty quickly. > > There are also some fs however that did not degrade that badly. These were > some that have way more free space left than the ones that degraded > badly. About 900 GB freespace left on my eSATA backup disk with BTRFS > that is also quite new. About 80 GB left on my BTRFS RAID 1 local home disk > where I can build debian packages or kernels and such without the restrictions > NFS brings (root squash). These still appear to be fine, but I redid the local > home one with mkfs.btrfs -n 32768 and -l 32768 not to long ago, but I > think it was quite fine before anyway, so I might have overdone it here. > This already points at a way to prevent some degradation BTRFS filesystems: > Leave more free space.I also do not use them regularily as in each day. Backup disk just every two weeks or so. Local home sometimes each day a week, then not at all for weeks. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Martin Steigerwald
2013-Jan-16 20:39 UTC
Re: How to refresh degraded BTRFS? free space fragmentation, file fragmentation...
Am Sonntag, 9. Dezember 2012 schrieb Martin Steigerwald:> Hi! > > I have BTRFS on some systems since more than two years. My experience so > far is: Performance at the beginning is pretty good, but some of my more > often used BTRFS filesystem degrade badly in different areas. On some > workloads pretty quickly. > > There are also some fs however that did not degrade that badly. These were > some that have way more free space left than the ones that degraded > badly. About 900 GB freespace left on my eSATA backup disk with BTRFS > that is also quite new. About 80 GB left on my BTRFS RAID 1 local home disk > where I can build debian packages or kernels and such without the restrictions > NFS brings (root squash). These still appear to be fine, but I redid the local > home one with mkfs.btrfs -n 32768 and -l 32768 not to long ago, but I > think it was quite fine before anyway, so I might have overdone it here. > This already points at a way to prevent some degradation BTRFS filesystems: > Leave more free space. > > > 1) fsync speed on my ThinkPad T23 has gone down that much that I use[…] Interesting to try after latest fsync improvements.> 2) File fragmentation: Example with a SUSE Manager VirtualBox on an[…]> 3) Freespace fragmentation on the / filesystem on this ThinkPad T520 with > Intel SSD 320: > > === fstrim ==> > merkaba:~> /usr/bin/time fstrim -v / > /: 6849871872 bytes were trimmed > 0.00user 5.99system 0:44.69elapsed 13%CPU (0avgtext+0avgdata 752maxresident)k > 0inputs+0outputs (0major+237minor)pagefaults 0swaps > > It took a second or two in the beginning. > > > atop: > > LVM | rkaba-debian | busy 91% | read 0 | write 10313 | MBw/s 67.48 | avio 0.20 ms | > […] > DSK | sda | busy 90% | read 0 | write 10319 | MBw/s 67.54 | avio 0.19 ms | > […] > > PID TID RUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/2 > 6085 - root 1 0.29s 0.00s 0K 0K 0K 0K -- - D 0 13% fstrim > > > 10000 write requests in 10 seconds.I was able to refresh my BTRFS regarding this issue on 11th of January: merkaba:~> btrfs filesystem df / Data: total=15.10GB, used=11.06GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.75GB, used=654.12MB Metadata: total=8.00MB, used=0.00 merkaba:~> btrfs balance start -dusage=5 / Done, had to relocate 0 out of 25 chunks merkaba:~> btrfs filesystem df / Data: total=15.01GB, used=11.06GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.75GB, used=654.05MB Metadata: total=8.00MB, used=0.00 merkaba:~> btrfs balance start -d / Done, had to relocate 16 out of 25 chunks merkaba:~> btrfs filesystem df / Data: total=11.09GB, used=11.06GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.75GB, used=647.72MB Metadata: total=8.00MB, used=0.00 merkaba:~> /usr/bin/time -v fstrim -v / /: 2246623232 bytes were trimmed Command being timed: "fstrim -v /" User time (seconds): 0.00 System time (seconds): 2.34 Percent of CPU this job got: 10% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.84 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 748 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 239 Voluntary context switches: 110690 Involuntary context switches: 1426 Swaps: 0 File system inputs: 16 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 merkaba:~> btrfs balance start -fmconvert=single / Done, had to relocate 8 out of 20 chunks merkaba:~> btrfs filesystem df / Data: total=11.09GB, used=11.06GB System: total=36.00MB, used=4.00KB Metadata: total=1.75GB, used=642.92MB [406005.831307] btrfs: balance will reduce metadata integrity, use force if you want this [406129.187057] btrfs: force reducing metadata integrity [406129.199133] btrfs: relocating block group 9290383360 flags 36 [406132.645299] btrfs: found 6989 extents [406132.673390] btrfs: relocating block group 8082423808 flags 36 [406135.807065] btrfs: found 6906 extents [406135.841572] btrfs: relocating block group 7948206080 flags 36 [406138.413270] btrfs: found 4514 extents [406138.435382] btrfs: relocating block group 6740246528 flags 36 [406142.572004] btrfs: found 10667 extents [406142.638079] btrfs: relocating block group 6606028800 flags 36 [406146.272095] btrfs: found 19844 extents [406146.289729] btrfs: relocating block group 6471811072 flags 36 [406149.136422] btrfs: found 14850 extents [406149.159510] btrfs: relocating block group 29360128 flags 36 [406183.637010] btrfs: found 116645 extents [406183.653225] btrfs: relocating block group 20971520 flags 34 [406183.671958] btrfs: found 1 extents Metadata tree still on old size, thus a regular rebalance: merkaba:~> btrfs balance start -m / Done, had to relocate 8 out of 20 chunks merkaba:~> btrfs filesystem df / Data: total=11.09GB, used=11.06GB System: total=36.00MB, used=4.00KB Metadata: total=768.00MB, used=643.38MB [406270.880962] btrfs: relocating block group 31801212928 flags 2 [406270.961955] btrfs: found 1 extents [406270.976857] btrfs: relocating block group 31532777472 flags 4 [406270.990729] btrfs: relocating block group 31264342016 flags 4 [406271.006172] btrfs: relocating block group 30995906560 flags 4 [406271.020158] btrfs: relocating block group 30727471104 flags 4 [406271.480442] btrfs: found 5187 extents [406271.515768] btrfs: relocating block group 30459035648 flags 4 [406277.158280] btrfs: found 54593 extents [406277.173024] btrfs: relocating block group 30190600192 flags 4 [406284.680294] btrfs: found 63749 extents [406284.756582] btrfs: relocating block group 29922164736 flags 4 [406290.907101] btrfs: found 59530 extents merkaba:~> df -hT / Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf /dev/dm-0 btrfs 19G 12G 6,8G 64% / merkaba:~> /usr/bin/time -v fstrim -v / /: 5472256 bytes were trimmed Command being timed: "fstrim -v /" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 50% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 748 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 238 Voluntary context switches: 12 Involuntary context switches: 3 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Today Still fast: merkaba:~#1> /usr/bin/time -v fstrim / Command being timed: "fstrim /" User time (seconds): 0.00 System time (seconds): 0.03 Percent of CPU this job got: 17% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 708 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 227 Voluntary context switches: 736 Involuntary context switches: 35 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 Boot time seems a tad bid slower tough: merkaba:~> systemd-analyze Startup finished in 5495ms (kernel) + 6331ms (userspace) = 11827ms merkaba:~> systemd-analyze blame 3051ms cups.service 2330ms dirmngr.service 2267ms postfix.service 1411ms schroot.service 1385ms lvm2.service 1230ms network-manager.service 1128ms ssh.service 1117ms acpi-fakekey.service 1112ms avahi-daemon.service 1061ms privoxy.service 1010ms systemd-logind.service 721ms loadcpufreq.service 646ms colord.service 552ms kdm.service 533ms networking.service 532ms keyboard-setup.service 463ms remount-rootfs.service 368ms bootlogs.service 349ms udev.service 327ms console-kit-log-system-start.service 326ms postgresql.service 322ms binfmt-support.service 316ms acpi-support.service 315ms qemu-kvm.service 310ms sys-kernel-debug.mount 309ms dev-mqueue.mount 309ms anacron.service 303ms atd.service 297ms sys-kernel-security.mount 282ms cron.service 282ms dev-hugepages.mount 272ms lightdm.service 271ms console-kit-daemon.service 271ms lirc.service 268ms lxc.service 259ms cpufrequtils.service 259ms mdadm.service 252ms openntpd.service 240ms smartmontools.service 240ms alsa-utils.service 237ms run-user.mount 237ms speech-dispatcher.service 230ms udftools.service 229ms run-lock.mount 229ms systemd-remount-api-vfs.service 224ms ebtables.service 214ms openbsd-inetd.service 208ms motd.service 199ms hdparm.service 198ms irqbalance.service 190ms mountdebugfs.service 181ms saned.service 160ms systemd-user-sessions.service 157ms polkitd.service 147ms screen-cleanup.service 146ms console-setup.service 141ms networking-routes.service 140ms pppd-dns.service 130ms rc.local.service 130ms jove.service 128ms sysstat.service 112ms rsyslog.service 111ms udev-trigger.service 103ms home.mount 93ms systemd-sysctl.service 89ms boot.mount 85ms dns-clean.service 84ms kbd.service 66ms upower.service 60ms systemd-tmpfiles-setup.service 53ms openvpn.service 37ms boot-efi.mount 27ms udisks.service 22ms sysfsutils.service 22ms mdadm-raid.service 20ms proc-sys-fs-binfmt_misc.mount 18ms tmp.mount 2ms sys-fs-fuse-connections.mount> vmstat 1: > > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 3 0 1963688 3943380 156972 1827836 0 0 0 0 5421 15781 6 6 88 0 > 0 0 1963688 3943132 156972 1827852 0 0 0 0 5733 16478 9 7 83 0 > 1 0 1963688 3943008 156972 1827992 0 0 0 0 5050 14434 0 4 96 0 > 1 0 1963688 3949768 156972 1826708 0 0 0 0 5246 14960 2 5 93 0 > 0 0 1963688 3949644 156980 1826712 0 0 0 36 5104 14996 1 4 94 0 > 0 0 1963688 3949768 156980 1826720 0 0 0 0 5102 15210 2 4 94 0 > 3 0 1963688 3949644 156980 1826720 0 0 0 0 5321 15995 4 7 89 0 > 0 0 1963688 3949396 156980 1827188 0 0 0 0 5316 15616 6 5 88 0 > 1 0 1963688 3949148 156980 1827188 0 0 0 0 5102 14944 1 4 95 0 > 1 0 1963688 3949272 156980 1827188 0 0 0 0 5510 15928 5 6 89 0 > 1 0 1963688 3949272 156980 1827188 0 0 0 52 5107 15054 2 4 94 0 > 0 0 1963688 3949396 156980 1826868 0 0 0 4 4930 14567 1 4 95 0 > 1 0 1963688 3949396 156988 1826828 0 0 0 52 5132 15014 2 5 93 0 > 3 0 1963688 3949396 156988 1826836 0 0 0 0 5015 14447 1 4 95 0 > 0 0 1963688 3949520 156988 1826836 0 0 0 0 5233 15652 3 6 91 0 > 1 0 1963684 3949612 156988 1827172 0 0 0 3032 2546 7555 6 4 84 6 > > After fstrim: > > 0 0 1963684 3944244 157016 1827752 0 0 0 0 357 1018 2 1 97 0 > 1 0 1963684 3943776 157024 1827776 0 0 0 64 634 1660 4 2 93 0 > 0 0 1963684 3943872 157024 1827784 0 0 0 0 180 473 0 0 99 0 > > > The I/O activity does not seem to be reflected in vmstat, I bet due to page > cache not involved.> === fallocate ==> > merkaba:/var/tmp> /usr/bin/time fallocate -l 2G fallocate-test > 0.00user 118.85system 2:00.50elapsed 98%CPU (0avgtext+0avgdata 720maxresident)k > 14912inputs+49112outputs (0major+227minor)pagefaults 0swapsNow, lets try this: merkaba:/var/tmp> /usr/bin/time -v fallocate -l 2G fallocate-test Command being timed: "fallocate -l 2G fallocate-test" User time (seconds): 0.00 System time (seconds): 0.00 Percent of CPU this job got: 80% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 724 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 231 Voluntary context switches: 5 Involuntary context switches: 6 Swaps: 0 File system inputs: 80 File system outputs: 72 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 There we go :)> Filesystem type is: 9123683e > File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096) > ext logical physical expected length flags > 0 0 2626450 2048 > 1 2048 3215128 2628498 2040 > 2 4088 3408631 3217168 2032 > 3 6120 3430045 3410663 2024 > 4 8144 3439999 3432069 2016 > 5 10160 3474610 3442015 1004 > 6 11164 3743715 3475614 1002[…]> fallocate-test: 4556 extents foundmerkaba:/var/tmp> filefrag -v fallocate-test Filesystem type is: 9123683e File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096) ext logical physical expected length flags 0 0 8501248 524288 eof fallocate-test: 1 extent found Yes, thats the same filesystem :)> But: > > merkaba:/var/tmp> /usr/bin/time rm fallocate-test > 0.00user 0.24system 0:00.38elapsed 63%CPU (0avgtext+0avgdata 784maxresident)k > 4464inputs+36184outputs (0major+243minor)pagefaults 0swapsmerkaba:/var/tmp> /usr/bin/time rm fallocate-test 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 784maxresident)k 0inputs+24outputs (0major+243minor)pagefaults 0swaps> Some more information on the filesystem in question: > > merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi sh > failed to read /dev/sr0 > Label: ''debian'' uuid: […] > Total devices 1 FS bytes used 13.56GB > devid 1 size 18.62GB used 18.62GB path /dev/dm-0 > > Btrfs v0.19-239-g0155e84 > > > merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi df / > Disk size: 18.62GB > Disk allocated: 18.62GB > Disk unallocated: 0.00 > Used: 13.56GB > Free (Estimated): 3.31GB (Max: 3.31GB, min: 3.31GB) > Data to disk ratio: 91 % > > > merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi disk-usage / > Data,Single: Size:15.10GB, Used:12.94GB > /dev/dm-0 15.10GB > > Metadata,Single: Size:8.00MB, Used:0.00 > /dev/dm-0 8.00MB > > Metadata,DUP: Size:1.75GB, Used:630.11MB > /dev/dm-0 3.50GB > > System,Single: Size:4.00MB, Used:0.00 > /dev/dm-0 4.00MB > > System,DUP: Size:8.00MB, Used:4.00KB > /dev/dm-0 16.00MB > > Unallocated: > /dev/dm-0 0.00 > merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs dev disk-usage / > /dev/dm-0 18.62GB > Data,Single: 15.10GB > Metadata,Single: 8.00MB > Metadata,DUP: 3.50GB > System,Single: 4.00MB > System,DUP: 16.00MB > Unallocated: 0.00Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html