Adam Ryczkowski
2013-Jan-30 14:57 UTC
Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
Welcome, I''ve been using btrfs for over a 3 months to store my personal data on my NAS server. Almost all interactions with files on the server are done using unison synchronizer. After another use of bedup (https://github.com/g2p/bedup) on my btrfs volume I experienced huge perfomance loss with synchronization. It now takes over 3 hours what have taken only 15 minutes! File browsing is not affected; but it takes forever to read contents of the files! When I use `iotop -o -d 30` (which measures I/O activity for 30-second interval) I can see: Total DISK READ: 98.66 K/s | Total DISK WRITE: 826.55 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 4296 be/4 root 3.99 K/s 408.59 K/s 0.00 % 98.64 % [btrfs-transacti] 6407 be/4 adam 94.14 K/s 0.00 B/s 0.00 % 85.24 % unison -server 311 be/4 root 0.00 B/s 0.00 B/s 0.00 % 58.20 % [md1_raid6] 354 be/3 root 0.00 B/s 2.26 K/s 0.00 % 24.29 % [jbd2/md0-8] 306 be/4 root 0.00 B/s 0.00 B/s 0.00 % 4.79 % [md0_raid1] 1229 be/4 syslog 0.00 B/s 136.15 B/s 0.00 % 0.00 % rsyslogd -c5 1744 be/4 root 0.00 B/s 136.15 B/s 0.00 % 0.00 % console-kit-daemon --no-daemon I expect no writes at all since the statistics were taken during the "Looking for changes" phse. Normally, the `unison -server` process shold have at least 5 M/s disk read speed. (The block device the btrfs is build on has a measured capability of 50 M/s sequential throughput) When I pause the `unison -server` process (with htop), the disk activity persists of another 5-30 seconds, so I am infer, that the btrfs is doing some house-keeping work, and this is the reason I decided to post the email on this list. I suspect, that this house-keeping work has a time granularity of 5-30 seconds, and during this time access to the filesystem is delayed. The problem is not specific to the unison. This background process is triggered by just reading the file contents. Once the system is through and the file is read, than all subsequent attempts to read it are fine, even if I drop the cache (i.e. echo 3 > /proc/sys/vm/drop_caches). But after a while (after reboot) the performance hit recurs. The questions are: 1. What sort of work is btrfs doing? What is it writing (and why is it writing 100x bytes more than reading)? 2. Why does it take it so long? 3. What can I do to speed-up the process? 4. What can I do to prevent it from happening again? Here are details about my system that might help you with the diagnose. If it is not enough, I suspect it has something to do with snapshots I make for backup. I have 35 of them, and I ask bedup to find duplicates across all subvolumes. But on the other hand it is supposed to work since kernel 3.5, and the filesystem has never seen kernel older than 3.6. My filesystem /dev/vg-adama-docs/lv-adama-docs is 372GB in size, and is a quite complex setup: It is based on logical volume (LVM2), which has a single physical volume made by dm-crypt device /dev/dm-1, which subsequently sits on top of /dev/md1 linux raid 6, which is built with 4 identical 186GB GPT partitions on each of my SATA 3TB hard drives. There are 272k files on the system (excluding 35 snaphosts), 23k folders and 104 GB data. $ df /mnt/adama-docs -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg--adama--docs-lv--adama--docs 373G 85G 288G 23% /mnt/adama-docs I was always using the latest kernel (its 3.7.1-030701-generic at the moment) on my Ubuntu Quantal server. -- Adam Ryczkowski Skype:sisteczko <skype:sisteczko> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Jan-30 23:58 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On Jan 30, 2013, at 7:57 AM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote:> > I suspect it has something to do with snapshots I make for backup. I have 35 of them, and I ask bedup to find duplicates across all subvolumes.Assuming most files do have identical duplicates, implies the same file in all 35 subvolumes is actually in the same physical location; it differs only in subvol reference. But it''s not btrfs that determines the "duplicate" vs "unique" state of those 35 file instances, but unison. The fs still must send all 35x instances for the state to be determined, as if they were unique files. Another thing, I''d expect this to scale very poorly if the 35 subvolumes contain any appreciable uniqueness, because searches can''t be done in parallel. So the more subvolumes you add, the more disk contention you get, but also enormous amounts of latency as possibly 35 locations on the disk are being searched if they happen to be unique. So in either case "duplicate" vs "unique" you have a problem, just different kinds. And as the storage grows, it increasingly encounters both problems at the same time. Small problem. What size are the files? And that''s on a bare drive before you went and did this:> My filesystem /dev/vg-adama-docs/lv-adama-docs is 372GB in size, and is a quite complex setup: > It is based on logical volume (LVM2), which has a single physical volume made by dm-crypt device /dev/dm-1, which subsequently sits on top of /dev/md1 linux raid 6, which is built with 4 identical 186GB GPT partitions on each of my SATA 3TB hard drives.Why are you using raid6 for four disks, instead of raid10? What''s the chunk size for the raid 6? What''s the btrfs leaf size? What''s the dedup chunk size? Why are you using LVM at all, while the /dev/dm-1 is the same size as the LV? You say the btrfs volume on LV is on dm-1 which means they''re all the same size, obviating the need for LVM in this case entirely. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adam Ryczkowski
2013-Jan-31 01:02 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
Than you, Chris, for your time. On 2013-01-31 00:58, Chris Murphy wrote:> On Jan 30, 2013, at 7:57 AM, Adam Ryczkowski<adam.ryczkowski@statystyka.net> wrote: >> I suspect it has something to do with snapshots I make for backup. I have 35 of them, and I ask bedup to find duplicates across all subvolumes. > Assuming most files do have identical duplicates, implies the same file in all 35 subvolumes is actually in the same physical location; it differs only in subvol reference. But it''s not btrfs that determines the "duplicate" vs "unique" state of those 35 file instances, but unison. The fs still must send all 35x instances for the state to be determined, as if they were unique files.I''m sorry if I didn''t put my question more clearly. I tried to write, that the problem is not specific to the unison; I am able to reproduce the problem using other means of reading contents of the file. I tried ''cat'' many small files, and previewing under Midnight Commander some large ones. I didn''t take precise measurements, but I can tell, that reading 500 50-byte files (ca. 25kB of data) took way longer that reading one 3MB file, so I suspect the problem is with metadata access times rather than with data. I am aware, that reading 1MB distributed in small files takes longer than 1MB of sequential reading. The problem is that _suddenly_ this speed got at least 20 times longer than usual. And from what iotop and systat told me, the harddrives were busy _writing_ something, not _reading_! The amount of time I wait for scanning the whole harddrive with unison is comparable with time that full balance takes. Anyway, I synchronize only the "working copy" part of my file system. All the backup subvolumes sit in a separate path, not seen by the unison. Moreover, once I wait long enough for the system to finish scanning the file system, file access speeds are back to normal, even after I drop read cache or even reboot the system. It is only after making another snapshot, when the problems recurs.> Another thing, I''d expect this to scale very poorly if the 35 subvolumes contain any appreciable uniqueness, because searches can''t be done in parallel. So the more subvolumes you add, the more disk contention you get, but also enormous amounts of latency as possibly 35 locations on the disk are being searched if they happen to be unique.*The severity of my problem is proportional to time*. It happens immediately after making snaphot, and persists for each file until I try to read its contents. Than, even after the reboot, timing is back to normal. With my limited knowledge about the internals of btrfs I suspect, that the bedup has messed my metadata somehow. Maybe I should balance only the metadata part (if that is possible at all)?> So in either case "duplicate" vs "unique" you have a problem, just different kinds. And as the storage grows, it increasingly encounters both problems at the same time. Small problem. What size are the files? > > And that''s on a bare drive before you went and did this: > >> My filesystem /dev/vg-adama-docs/lv-adama-docs is 372GB in size, and is a quite complex setup: >> It is based on logical volume (LVM2), which has a single physical volume made by dm-crypt device /dev/dm-1, which subsequently sits on top of /dev/md1 linux raid 6, which is built with 4 identical 186GB GPT partitions on each of my SATA 3TB hard drives. > Why are you using raid6 for four disks, instead of raid10?Because I plan to add another 4 in the future. It''s way easier to add another disk to the array, than to change the RAID layout.> What''s the chunk size for the raid 6? What''s the btrfs leaf size? What''s the dedup chunk size?I''ll tell you tomorrow, but I hardly think that the misalignment could be any problem here. As I said, everything was fine and the problem didn''t appear in gradual fashion.> Why are you using LVM at all, while the /dev/dm-1 is the same size as the LV? You say the btrfs volume on LV is on dm-1 which means they''re all the same size, obviating the need for LVM in this case entirely.Yes, I agree, that at the moment I don''t need it. But when partition sits on logical volume I keep the option to extend the filesystem, when I the need comes. My current needs are more complex, I don''t keep all the date in the same redundancy and security level. It is also hard to tell in advance the relative sizes of each combination of redundancy and security levels. So I allocate only as much space on the GPT partitions as I immediately need, and in the future, when need comes, I can relatively easily make more partitions, arrange them in the appropriate raid/mdcrypt combination, and expand the filesystem that ran out space. I am aware, that this setup is very complex. I can say, that my application is not life-critical, and this complexity serves me well on another Linux server, which I am using over 5 years (without the btrfs, of course).> Chris Murphy >-- Adam Ryczkowski +48505919892 <callto:+48505919892> Skype:sisteczko <skype:sisteczko> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Jan-31 01:50 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On Jan 30, 2013, at 6:02 PM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote:> I didn''t take precise measurements, but I can tell, that reading 500 50-byte files (ca. 25kB of data) took way longer that reading one 3MB file, so I suspect the problem is with metadata access times rather than with data.For 50 byte files, btrfs writes the data with metadata. Depending on their location relative to each other, this could mean 250MB of reads because of the large raid6 chunk size, yet only ~ 2MB is needed by btrfs.> I am aware, that reading 1MB distributed in small files takes longer than 1MB of sequential reading. The problem is that _suddenly_ this speed got at least 20 times longer than usual.How does dedup work on 50 byte files? How does it contribute to fragmentation? And then how does that fragmentation turn into gross read inefficiencies at the md chunk level?> And from what iotop and systat told me, the harddrives were busy _writing_ something, not _reading_!Seems like you need to find out what''s being written, how many and how big the requests are. Small writes mean huge RWM penalty on raid6, especially a 4 disk raid 6 where you''re practically guaranteed to have either data or metadata request halted for a parity rewrite.> > Anyway, I synchronize only the "working copy" part of my file system. All the backup subvolumes sit in a separate path, not seen by the unison.You''re syncing what to what, in physical terms? I know one of the what''s is a btrfs volume on top of LVM, on top of LUKs, on top of md raid6, on top of partitions located on four 3TB drives. YOu said there are other partitions on these drives so are there other read/writes occurring on those drives at the same time? It doesn''t look like that''s the case from iotop, the md0> Moreover, once I wait long enough for the system to finish scanning the file system, file access speeds are back to normal, even after I drop read cache or even reboot the system. It is only after making another snapshot, when the problems recurs. >> Another thing, I''d expect this to scale very poorly if the 35 subvolumes contain any appreciable uniqueness, because searches can''t be done in parallel. So the more subvolumes you add, the more disk contention you get, but also enormous amounts of latency as possibly 35 locations on the disk are being searched if they happen to be unique. > > *The severity of my problem is proportional to time*. It happens immediately after making snaphot, and persists for each file until I try to read its contents. Than, even after the reboot, timing is back to normal. With my limited knowledge about the internals of btrfs I suspect, that the bedup has messed my metadata somehow. Maybe I should balance only the metadata part (if that is possible at all)?It''s possible to balance just metadata chunks. But I think this is a spaghetti on the wall approach, rather than understanding how all of these layers are interacting with each other. https://btrfs.wiki.kernel.org/index.php/Balance_Filters>>> >> Why are you using raid6 for four disks, instead of raid10? > Because I plan to add another 4 in the future. It''s way easier to add another disk to the array, than to change the RAID layout.If this is happening imminently perhaps, in the meantime you have a terribly inefficient raid setup.>> What''s the chunk size for the raid 6? What''s the btrfs leaf size? What''s the dedup chunk size? > I''ll tell you tomorrow, but I hardly think that the misalignment could be any problem here. As I said, everything was fine and the problem didn''t appear in gradual fashion.It also depends on what mysterious stuff is being written during what''s ostensibly a read only event.>> Why are you using LVM at all, while the /dev/dm-1 is the same size as the LV? You say the btrfs volume on LV is on dm-1 which means they''re all the same size, obviating the need for LVM in this case entirely. > Yes, I agree, that at the moment I don''t need it. But when partition sits on logical volume I keep the option to extend the filesystem, when I the need comes.This is not an ideal way to extend a btrfs file system however. You''re adding unnecessarily layers and complexity while also not taking advantage of what LVM can do that btrfs cannot when it comes to logical volume management.> My current needs are more complex, I don''t keep all the date in the same redundancy and security level. It is also hard to tell in advance the relative sizes of each combination of redundancy and security levels. So I allocate only as much space on the GPT partitions as I immediately need, and in the future, when need comes, I can relatively easily make more partitions, arrange them in the appropriate raid/mdcrypt combination, and expand the filesystem that ran out space.It sounds unnecessarily complex, but what do I know. Hopefully you have everything backed up to something that is comparatively simple. There are more failure points here than I can count.> > I am aware, that this setup is very complex. I can say, that my application is not life-critical, and this complexity serves me well on another Linux server, which I am using over 5 years (without the btrfs, of course).Well the with btrfs plus dedup adds a lot. And if the problem is disk contention, you may find drive heads dying a lot sooner than you''d otherwise expect. When this problem is happening, with the low bandwidth writing, can you hear disk chatter? On all of the drives at the same time or just one or two at a time? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adam Ryczkowski
2013-Jan-31 09:45 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On 2013-01-31 04:33, Andrew Wade wrote:> Hi Adam, > > Is btrfs mounted relatime? I''m wondering if you''re seeing metadata > writes from atime updates. I''ve got my filesystem mounted noatime to > avoid breaking metadata sharing between subvolumes. > > Apologies for the broken threading - I''m not subscribed to the list. > > regards, > AndrewThank you, thank you!!! Hurray!! That was the problem!! I''m so happy you''ve helped me out!!! After mounting the system with noatime the problem disappeared, like in magic. All the writes must have came from the dealyed metadata copy process. Once all the metadata copy-update was done, file system speed was back to normal, but once the new day broke out, all the copying business needed to done again... This in 100% describes all the odd behavior. In particular apparently the problem had nothing to do with my complex block device setup, nor with bedup, nor with unison. Thank you again, Andrew! P.S. Maybe it is not be decided by me, but this small message about performance (not even labeled as warning) in https://btrfs.wiki.kernel.org/index.php/Mount_options IMHO should have been made more conspicuous, maybe put somewhere when the snapshot mechanism is described or in FAQ. I''ll try to fix it. -- Adam Ryczkowski +48505919892 <callto:+48505919892> Skype:sisteczko <skype:sisteczko> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adam Ryczkowski
2013-Jan-31 10:56 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
My original problem got solved, but you answer has a set of interesting performance hints, and I am very grateful for you input. Here are my answers and further questions if you are willing to continue this topic. On 2013-01-31 02:50, Chris Murphy wrote:> On Jan 30, 2013, at 6:02 PM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote: > >> I didn''t take precise measurements, but I can tell, that reading 500 50-byte files (ca. 25kB of data) took way longer that reading one 3MB file, so I suspect the problem is with metadata access times rather than with data. > For 50 byte files, btrfs writes the data with metadata. Depending on their location relative to each other, this could mean 250MB of reads because of the large raid6 chunk size, yet only ~ 2MB is needed by btrfs.Yes, good point. I never stated that my setup gives me the best I can get from my hardware.>> I am aware, that reading 1MB distributed in small files takes longer than 1MB of sequential reading. The problem is that _suddenly_ this speed got at least 20 times longer than usual. > How does dedup work on 50 byte files? How does it contribute to fragmentation? And then how does that fragmentation turn into gross read inefficiencies at the md chunk level?I really don''t know. It is interesting to know that, though. But whatever are the results, at the current state of affairs the defrag will ruin all benefits of bedup, so even if the filesystem gets fragmented, I can do nothing about it.>> And from what iotop and systat told me, the harddrives were busy _writing_ something, not _reading_! > Seems like you need to find out what''s being written, how many and how big the requests are. Small writes mean huge RWM penalty on raid6, especially a 4 disk raid 6 where you''re practically guaranteed to have either data or metadata request halted for a parity rewrite.Yes, you are right. It is important contributing factor, why relatime mount option killed my performance so badly.>> Anyway, I synchronize only the "working copy" part of my file system. All the backup subvolumes sit in a separate path, not seen by the unison. > You''re syncing what to what, in physical terms? I know one of the what''s is a btrfs volume on top of LVM, on top of LUKs, on top of md raid6, on top of partitions located on four 3TB drives. YOu said there are other partitions on these drives so are there other read/writes occurring on those drives at the same time? It doesn''t look like that''s the case from iotop, the md0No, I synchronize across network with my desktop machines and backup file server :-). But even if I didn''t, the unison is kind enough to detect local sync and it makes them in sequence (not asynchronously).>>> What''s the chunk size for the raid 6? What''s the btrfs leaf size? What''s the dedup chunk size? >> I''ll tell you tomorrow, but I hardly think that the misalignment could be any problem here. As I said, everything was fine and the problem didn''t appear in gradual fashion. > It also depends on what mysterious stuff is being written during what''s ostensibly a read only event.The dedup chunk size isn''t clearly stated, but from the README I infer it deduplicates files as a whole; here is an excerpt from the README (https://github.com/g2p/bedup/blob/master/README.rst)> Deduplication is implemented using a Btrfs feature that allows for > cloning data from one file to the other. The cloned ranges become > shared on disk, saving space.This is a summary of the granurality of the allocation pieces in the storage hierarchy. On mdadm I have chunk size of 512K, the dm-crypt volume uses 512 byte sectors, and all lvm physical volumes'' PE Sizes: 4MiB, but it shouldn''t affect efficiency. I couldn''t find any command that tells me the leaf size of already created btrfs system. Maybe you can tell me? I will also check, if there is an alignment problem as well. When I was reading a manual for each of the layer I came to the conclusion that each layer is supposed to align to the underlying one automatically. But I try to can check it.>>> Why are you using LVM at all, while the /dev/dm-1 is the same size as the LV? You say the btrfs volume on LV is on dm-1 which means they''re all the same size, obviating the need for LVM in this case entirely. >> Yes, I agree, that at the moment I don''t need it. But when partition sits on logical volume I keep the option to extend the filesystem, when I the need comes. > This is not an ideal way to extend a btrfs file system however. You''re adding unnecessarily layers and complexity while also not taking advantage of what LVM can do that btrfs cannot when it comes to logical volume management.Can you tell me more? Because I have only learned, that btrfs multi-device support cannot join two volumes without striping. And striping in this case is equivalent to fragmentation, which we want to avoid. In contrast to what LVM can do. LVM can concatenate the underlying storage together, without striping. -- Adam Ryczkowski www.statystyka.net <http://www.google.com/> +48505919892 <callto:+48505919892> Skype:sisteczko <skype:sisteczko> Aktualny kalendarz <https://www.google.com/calendar/b/0/embed?src=adam.ryczkowski@statystyka.net&ctz&gsessionid=OK> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Gabriel
2013-Jan-31 19:06 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
Hi,> After mounting the system with noatime the problem disappeared, like in > magic.Incidentally, the current version of bedup uses a private mountpoint with noatime whenever you don''t give it the path to a mounted volume. You can use it with no arguments or designate a filesystem by its uuid or /dev path.> All the writes must have came from the dealyed metadata copy process. > Once all the metadata copy-update was done, file system speed was back > to normal, but once the new day broke out, all the copying business > needed to done again... This in 100% describes all the odd behavior. > > In particular apparently the problem had nothing to do with my complex > block device setup, nor with bedup, nor with unison. > > Thank you again, Andrew! > > P.S. Maybe it is not be decided by me, but this small message about > performance (not even labeled as warning) in > https://btrfs.wiki.kernel.org/index.php/Mount_options IMHO should have > been made more conspicuous, maybe put somewhere when the snapshot > mechanism is described or in FAQ. I''ll try to fix it.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Jan-31 19:08 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On Jan 31, 2013, at 2:45 AM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote:>> > Yes, you are right. It is important contributing factor, why relatime mount option killed my performance so badly.So is this what was causing the problem?>> > The dedup chunk size isn''t clearly stated, but from the README I infer it deduplicates files as a whole; here is an excerpt from the README (https://github.com/g2p/bedup/blob/master/README.rst)I wouldn''t expect reading file metadata,> This is a summary of the granurality of the allocation pieces in the storage hierarchy. > On mdadm I have chunk size of 512K,It''s quite large for your use case. It''s large for most any use case, actually.> I couldn''t find any command that tells me the leaf size of already created btrfs system. Maybe you can tell me?I don''t know that it''s easily determined after mkfs time, someone else can maybe answer. Default is 4KB. Otherwise you use flags to set it.> I will also check, if there is an alignment problem as well. When I was reading a manual for each of the layer I came to the conclusion that each layer is supposed to align to the underlying one automatically. But I try to can check it.I''m not thinking of an alignment problem, but a poor chosen chunk size for the usage problem. Changing 50 bytes (could be metadata or data), means in your case at least 2MB of RMW with a 512KB chunk. And this gets worse with more disks, because you have more chunks to read. The whole stripe is read, modified, and written on md raid6 currently. You''re planning to add four more disks, so that''s now 8 disks, and a 4MB full stripe RMW for 50 bytes of changed data. Depending on what GPT partitioned these 3TB disks, it''s remotely possible they aren''t aligned to 4K sectors however. gdisk should do this correctly by starting the first partition at LBA 2048, and aligning to 16 sector boundaries. parted of recent versions does something similar, but I forget the details. Older versions can misalign by starting at LBA 63, as can other older non-Linux tools. OS X''s Disk Utility starts the first partition at LBA 40 which is OK.> Can you tell me more? Because I have only learned, that btrfs multi-device support cannot join two volumes without striping. And striping in this case is equivalent to fragmentation, which we want to avoid. In contrast to what LVM can do. LVM can concatenate the underlying storage together, without striping.When you create a btrfs file system, by default the data profile is single, and metadata profile is dup. When you add another device to the volume, it stays this way. The single data profile behaves similar to LVM linear, except btrfs will alternate chunk allocations between devices, so that one isn''t just sitting there spinning for a month and not being used at all. So it''s not striping. But even if it were striping, that would help you on write performance in particular because now it''s effectively RAID 60. I don''t see why striping is considered fragmentation. To change the profile for the volume, you use -dconvert and/or -mconvert with a rebalance operation. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adam Ryczkowski
2013-Jan-31 19:17 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On 2013-01-31 20:08, Chris Murphy wrote:> On Jan 31, 2013, at 2:45 AM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote: >> Yes, you are right. It is important contributing factor, why relatime mount option killed my performance so badly. > So is this what was causing the problem?Yes.>> Can you tell me more? Because I have only learned, that btrfs multi-device support cannot join two volumes without striping. And striping in this case is equivalent to fragmentation, which we want to avoid. In contrast to what LVM can do. LVM can concatenate the underlying storage together, without striping. > When you create a btrfs file system, by default the data profile is single, and metadata profile is dup. When you add another device to the volume, it stays this way. The single data profile behaves similar to LVM linear, except btrfs will alternate chunk allocations between devices, so that one isn''t just sitting there spinning for a month and not being used at all. > > So it''s not striping. But even if it were striping, that would help you on write performance in particular because now it''s effectively RAID 60. I don''t see why striping is considered fragmentation.Well, if the devices are on the same physical hard-drive, than sequential file reading would cause hard drive heads to seek between the first and the other partition on every extent. This is something equivalent to defragmentation; it is only good if the partitions are on separate hard drives.> To change the profile for the volume, you use -dconvert and/or -mconvert with a rebalance operation.Once again, thank you very much, Chris. -- Adam Ryczkowski +48505919892 <callto:+48505919892> Skype:sisteczko <skype:sisteczko><https://www.google.com/calendar/b/0/embed?src=adam.ryczkowski@statystyka.net&ctz&gsessionid=OK> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Murphy
2013-Jan-31 20:35 UTC
Re: Poor performance of btrfs. Suspected unidentified btrfs housekeeping process which writes a lot
On Jan 31, 2013, at 12:17 PM, Adam Ryczkowski <adam.ryczkowski@statystyka.net> wrote:>>> >> When you create a btrfs file system, by default the data profile is single, and metadata profile is dup. When you add another device to the volume, it stays this way. The single data profile behaves similar to LVM linear, except btrfs will alternate chunk allocations between devices, so that one isn''t just sitting there spinning for a month and not being used at all. >> >> So it''s not striping. But even if it were striping, that would help you on write performance in particular because now it''s effectively RAID 60. I don''t see why striping is considered fragmentation. > Well, if the devices are on the same physical hard-drive, than sequential file reading would cause hard drive heads to seek between the first and the other partition on every extent. This is something equivalent to defragmentation;You wouldn''t make the volume larger by adding devices in this case regardless of the profile used. You''d first grow the underlying layers. And then resize the file system.> it is only good if the partitions are on separate hard drives.Yes obviously. But even better is to not partition your devices at all if you''re concerned about efficiency. Just use the whole drive as the device. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possibly Parallel Threads
- btrfs send & receive produces "Too many open files in system"
- BTRFS in laptop-mode
- Possible to dedpulicate read-only snapshots for space-efficient backups
- Deduplication
- Problem with building instructions for btrfs-tools in https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories