Hi, I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine. While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up. How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem. I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state? As far as I can see, you wind up with this: - No outstanding writes when power down - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be) - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck. Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven''t I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4? -- Regards, Berend Dekens -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.08.2011 15:11, Berend Dekens wrote:> Hi, > > I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine. > > While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up. > > How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem. > > I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state? > > As far as I can see, you wind up with this: > - No outstanding writes when power down > - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be) > - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck. > > Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven''t I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4? >Problems arise when in your scenario writes from higher levels in the tree hit the disk earlier than updates on lower levels. In this case the tree is broken and the fs is unmountable. Of course btrfs takes care of the order it writes, but problems arise when the disk is lying about whether a write is stable on disk, i.e. about cache flushes or barriers. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24/08/11 15:31, Arne Jansen wrote:> On 24.08.2011 15:11, Berend Dekens wrote: >> Hi, >> >> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine. >> >> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up. >> >> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem. >> >> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state? >> >> As far as I can see, you wind up with this: >> - No outstanding writes when power down >> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be) >> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck. >> >> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven''t I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4? > Problems arise when in your scenario writes from higher levels in the > tree hit the disk earlier than updates on lower levels. In this case > the tree is broken and the fs is unmountable. > Of course btrfs takes care of the order it writes, but problems arise > when the disk is lying about whether a write is stable on disk, i.e. > about cache flushes or barriers.Ah, I see. So the issue is not with the software implementation at all but only arises when hardware acknowledges flushes and barriers before they actually complete? Is this a common problem of hard disks? Regards, Berend Dekens -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arne Jansen
2011-Aug-24 15:04 UTC
Re: *** GMX Spamverdacht *** Re: BTRFS and power loss ~= corruption?
On 24.08.2011 17:01, Berend Dekens wrote:> On 24/08/11 15:31, Arne Jansen wrote: >> On 24.08.2011 15:11, Berend Dekens wrote: >>> Hi, >>> >>> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine. >>> >>> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up. >>> >>> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem. >>> >>> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state? >>> >>> As far as I can see, you wind up with this: >>> - No outstanding writes when power down >>> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be) >>> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck. >>> >>> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven''t I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4? >> Problems arise when in your scenario writes from higher levels in the >> tree hit the disk earlier than updates on lower levels. In this case >> the tree is broken and the fs is unmountable. >> Of course btrfs takes care of the order it writes, but problems arise >> when the disk is lying about whether a write is stable on disk, i.e. >> about cache flushes or barriers. > Ah, I see. So the issue is not with the software implementation at all but only arises when hardware acknowledges flushes and barriers before they actually complete?It doesn''t mean there aren''t any bugs left in the software stack ;)> > Is this a common problem of hard disks?Only of very cheap ones. USB enclosures might add to the problem, too. Also some SSDs are rumored to be bad in this regard. Another problem are layers between btrfs and the hardware, like encryption.> > Regards, > Berend Dekens > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24/08/11 17:04, Arne Jansen wrote:> On 24.08.2011 17:01, Berend Dekens wrote: >> On 24/08/11 15:31, Arne Jansen wrote: >>> On 24.08.2011 15:11, Berend Dekens wrote: >>>> Hi, >>>> >>>> I have followed the progress made in the btrfs filesystem over time and while I have experimented with it a little in a VM, I have not yet used it in a production machine. >>>> >>>> While the lack of a complete fsck was a major issue (I read the update that the first working version is about to be released) I am still worried about an issue I see popping up. >>>> >>>> How is it possible that a copy-on-write filesystem becomes corrupted if a power failure occurs? I assume this means that even (hard) resetting a computer can result in a corrupt filesystem. >>>> >>>> I thought the idea of COW was that whatever happens, you can always mount in a semi-consistent state? >>>> >>>> As far as I can see, you wind up with this: >>>> - No outstanding writes when power down >>>> - File write complete, tree structure is updated. Since everything is hashed and duplicated, unless the update propagates to the highest level, the write will simply disappear upon failure. While this might be rectified with a fsck, there should be no problems mounting the filesystem (read-only if need be) >>>> - Writes are not completed on all disks/partitions at the same time. The checksums will detect these errors and once again, the write disappears unless it is salvaged by a fsck. >>>> >>>> Am I missing something? How come there seem to be plenty people with a corrupt btrfs after a power failure? And why haven''t I experienced similar issues where a filesystem becomes unmountable with say NTFS or Ext3/4? >>> Problems arise when in your scenario writes from higher levels in the >>> tree hit the disk earlier than updates on lower levels. In this case >>> the tree is broken and the fs is unmountable. >>> Of course btrfs takes care of the order it writes, but problems arise >>> when the disk is lying about whether a write is stable on disk, i.e. >>> about cache flushes or barriers. >> Ah, I see. So the issue is not with the software implementation at all but only arises when hardware acknowledges flushes and barriers before they actually complete? > It doesn''t mean there aren''t any bugs left in the software stack ;)Naturally, but the fact that its very likely that the corruption stories I''ve been reading about are caused by misbehaving hardware set my mind at ease about experimenting further with btrfs (although I will await the fsck before attempting things in production).>> Is this a common problem of hard disks? > Only of very cheap ones. USB enclosures might add to the problem, too. > Also some SSDs are rumored to be bad in this regard. > Another problem are layers between btrfs and the hardware, like > encryption.I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid, encryption or other layers. My hard drives are not that fancy (no 15k raptors here); I usually buy hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also, until the fast cache mode for SSDs in combination with rotating hardware becomes stable, I''ll stick to ordinary hard drives. Thank you for clarifying things. Regards, Berend Dekens -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote:> On 24/08/11 17:04, Arne Jansen wrote: >> >> On 24.08.2011 17:01, Berend Dekens wrote: >>> >>> On 24/08/11 15:31, Arne Jansen wrote: >>>> >>>> On 24.08.2011 15:11, Berend Dekens wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have followed the progress made in the btrfs filesystem over time and >>>>> while I have experimented with it a little in a VM, I have not yet used it >>>>> in a production machine. >>>>> >>>>> While the lack of a complete fsck was a major issue (I read the update >>>>> that the first working version is about to be released) I am still worried >>>>> about an issue I see popping up. >>>>> >>>>> How is it possible that a copy-on-write filesystem becomes corrupted if >>>>> a power failure occurs? I assume this means that even (hard) resetting a >>>>> computer can result in a corrupt filesystem. >>>>> >>>>> I thought the idea of COW was that whatever happens, you can always >>>>> mount in a semi-consistent state? >>>>> >>>>> As far as I can see, you wind up with this: >>>>> - No outstanding writes when power down >>>>> - File write complete, tree structure is updated. Since everything is >>>>> hashed and duplicated, unless the update propagates to the highest level, >>>>> the write will simply disappear upon failure. While this might be rectified >>>>> with a fsck, there should be no problems mounting the filesystem (read-only >>>>> if need be) >>>>> - Writes are not completed on all disks/partitions at the same time. >>>>> The checksums will detect these errors and once again, the write disappears >>>>> unless it is salvaged by a fsck. >>>>> >>>>> Am I missing something? How come there seem to be plenty people with a >>>>> corrupt btrfs after a power failure? And why haven''t I experienced similar >>>>> issues where a filesystem becomes unmountable with say NTFS or Ext3/4? >>>> >>>> Problems arise when in your scenario writes from higher levels in the >>>> tree hit the disk earlier than updates on lower levels. In this case >>>> the tree is broken and the fs is unmountable. >>>> Of course btrfs takes care of the order it writes, but problems arise >>>> when the disk is lying about whether a write is stable on disk, i.e. >>>> about cache flushes or barriers. >>> >>> Ah, I see. So the issue is not with the software implementation at all >>> but only arises when hardware acknowledges flushes and barriers before they >>> actually complete? >> >> It doesn''t mean there aren''t any bugs left in the software stack ;) > > Naturally, but the fact that its very likely that the corruption stories > I''ve been reading about are caused by misbehaving hardware set my mind at > ease about experimenting further with btrfs (although I will await the fsck > before attempting things in production). >>> >>> Is this a common problem of hard disks? >> >> Only of very cheap ones. USB enclosures might add to the problem, too. >> Also some SSDs are rumored to be bad in this regard. >> Another problem are layers between btrfs and the hardware, like >> encryption. > > I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid, > encryption or other layers. > > My hard drives are not that fancy (no 15k raptors here); I usually buy > hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also, > until the fast cache mode for SSDs in combination with rotating hardware > becomes stable, I''ll stick to ordinary hard drives. > > Thank you for clarifying things. >I have to admit I''ve been beginning to wonder if we picked up a regression somewhere along the way with respect to corruptions after power outages. I''m lucky enough to have very unreliable power. Btrfs was always robust for me on power outages until recently. Now I''ve recently had two corrupted volumes on unclean shutdowns and power outages. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AFAIK, ZFS compats lying disks by rolling back to the latest mountable uber block (i.e. the latest tree that was completely and successfully written to disk), does btrfs do something similar today ? On Wed, Aug 24, 2011 at 7:06 PM, Mitch Harder <mitch.harder@sabayonlinux.org> wrote:> > On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote: > > On 24/08/11 17:04, Arne Jansen wrote: > >> > >> On 24.08.2011 17:01, Berend Dekens wrote: > >>> > >>> On 24/08/11 15:31, Arne Jansen wrote: > >>>> > >>>> On 24.08.2011 15:11, Berend Dekens wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I have followed the progress made in the btrfs filesystem over time and > >>>>> while I have experimented with it a little in a VM, I have not yet used it > >>>>> in a production machine. > >>>>> > >>>>> While the lack of a complete fsck was a major issue (I read the update > >>>>> that the first working version is about to be released) I am still worried > >>>>> about an issue I see popping up. > >>>>> > >>>>> How is it possible that a copy-on-write filesystem becomes corrupted if > >>>>> a power failure occurs? I assume this means that even (hard) resetting a > >>>>> computer can result in a corrupt filesystem. > >>>>> > >>>>> I thought the idea of COW was that whatever happens, you can always > >>>>> mount in a semi-consistent state? > >>>>> > >>>>> As far as I can see, you wind up with this: > >>>>> - No outstanding writes when power down > >>>>> - File write complete, tree structure is updated. Since everything is > >>>>> hashed and duplicated, unless the update propagates to the highest level, > >>>>> the write will simply disappear upon failure. While this might be rectified > >>>>> with a fsck, there should be no problems mounting the filesystem (read-only > >>>>> if need be) > >>>>> - Writes are not completed on all disks/partitions at the same time. > >>>>> The checksums will detect these errors and once again, the write disappears > >>>>> unless it is salvaged by a fsck. > >>>>> > >>>>> Am I missing something? How come there seem to be plenty people with a > >>>>> corrupt btrfs after a power failure? And why haven''t I experienced similar > >>>>> issues where a filesystem becomes unmountable with say NTFS or Ext3/4? > >>>> > >>>> Problems arise when in your scenario writes from higher levels in the > >>>> tree hit the disk earlier than updates on lower levels. In this case > >>>> the tree is broken and the fs is unmountable. > >>>> Of course btrfs takes care of the order it writes, but problems arise > >>>> when the disk is lying about whether a write is stable on disk, i.e. > >>>> about cache flushes or barriers. > >>> > >>> Ah, I see. So the issue is not with the software implementation at all > >>> but only arises when hardware acknowledges flushes and barriers before they > >>> actually complete? > >> > >> It doesn''t mean there aren''t any bugs left in the software stack ;) > > > > Naturally, but the fact that its very likely that the corruption stories > > I''ve been reading about are caused by misbehaving hardware set my mind at > > ease about experimenting further with btrfs (although I will await the fsck > > before attempting things in production). > >>> > >>> Is this a common problem of hard disks? > >> > >> Only of very cheap ones. USB enclosures might add to the problem, too. > >> Also some SSDs are rumored to be bad in this regard. > >> Another problem are layers between btrfs and the hardware, like > >> encryption. > > > > I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid, > > encryption or other layers. > > > > My hard drives are not that fancy (no 15k raptors here); I usually buy > > hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also, > > until the fast cache mode for SSDs in combination with rotating hardware > > becomes stable, I''ll stick to ordinary hard drives. > > > > Thank you for clarifying things. > > > > I have to admit I''ve been beginning to wonder if we picked up a > regression somewhere along the way with respect to corruptions after > power outages. > > I''m lucky enough to have very unreliable power. Btrfs was always > robust for me on power outages until recently. Now I''ve recently had > two corrupted volumes on unclean shutdowns and power outages. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
We have a bit of documentation on the disk power failure and corruption here: https://btrfs.wiki.kernel.org/index.php/FAQ Ref to the 2nd faq in the list. Things would have been a lot easier for the filesystem (in terms of maintaining the its consistency) if disks could have some kind of atomic write (between disk-cache and disk) for a given block size. anyways, solutions containing disk-write-cache disabled and SSD is quite popular now a days. And in terms of random synchronous write performance they are awesome. HTH Cheers, Anand On 08/25/2011 01:06 AM, Mitch Harder wrote:> On Wed, Aug 24, 2011 at 10:13 AM, Berend Dekens<btrfs@cyberwizzard.nl> wrote: >> On 24/08/11 17:04, Arne Jansen wrote: >>> >>> On 24.08.2011 17:01, Berend Dekens wrote: >>>> >>>> On 24/08/11 15:31, Arne Jansen wrote: >>>>> >>>>> On 24.08.2011 15:11, Berend Dekens wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have followed the progress made in the btrfs filesystem over time and >>>>>> while I have experimented with it a little in a VM, I have not yet used it >>>>>> in a production machine. >>>>>> >>>>>> While the lack of a complete fsck was a major issue (I read the update >>>>>> that the first working version is about to be released) I am still worried >>>>>> about an issue I see popping up. >>>>>> >>>>>> How is it possible that a copy-on-write filesystem becomes corrupted if >>>>>> a power failure occurs? I assume this means that even (hard) resetting a >>>>>> computer can result in a corrupt filesystem. >>>>>> >>>>>> I thought the idea of COW was that whatever happens, you can always >>>>>> mount in a semi-consistent state? >>>>>> >>>>>> As far as I can see, you wind up with this: >>>>>> - No outstanding writes when power down >>>>>> - File write complete, tree structure is updated. Since everything is >>>>>> hashed and duplicated, unless the update propagates to the highest level, >>>>>> the write will simply disappear upon failure. While this might be rectified >>>>>> with a fsck, there should be no problems mounting the filesystem (read-only >>>>>> if need be) >>>>>> - Writes are not completed on all disks/partitions at the same time. >>>>>> The checksums will detect these errors and once again, the write disappears >>>>>> unless it is salvaged by a fsck. >>>>>> >>>>>> Am I missing something? How come there seem to be plenty people with a >>>>>> corrupt btrfs after a power failure? And why haven''t I experienced similar >>>>>> issues where a filesystem becomes unmountable with say NTFS or Ext3/4? >>>>> >>>>> Problems arise when in your scenario writes from higher levels in the >>>>> tree hit the disk earlier than updates on lower levels. In this case >>>>> the tree is broken and the fs is unmountable. >>>>> Of course btrfs takes care of the order it writes, but problems arise >>>>> when the disk is lying about whether a write is stable on disk, i.e. >>>>> about cache flushes or barriers. >>>> >>>> Ah, I see. So the issue is not with the software implementation at all >>>> but only arises when hardware acknowledges flushes and barriers before they >>>> actually complete? >>> >>> It doesn''t mean there aren''t any bugs left in the software stack ;) >> >> Naturally, but the fact that its very likely that the corruption stories >> I''ve been reading about are caused by misbehaving hardware set my mind at >> ease about experimenting further with btrfs (although I will await the fsck >> before attempting things in production). >>>> >>>> Is this a common problem of hard disks? >>> >>> Only of very cheap ones. USB enclosures might add to the problem, too. >>> Also some SSDs are rumored to be bad in this regard. >>> Another problem are layers between btrfs and the hardware, like >>> encryption. >> >> I am - and will be - using btrfs straight on hard disks, no lvm, (soft)raid, >> encryption or other layers. >> >> My hard drives are not that fancy (no 15k raptors here); I usually buy >> hardware from the major suppliers (WD, Maxtor, Seagate, Hitachi etc). Also, >> until the fast cache mode for SSDs in combination with rotating hardware >> becomes stable, I''ll stick to ordinary hard drives. >> >> Thank you for clarifying things. >> > > I have to admit I''ve been beginning to wonder if we picked up a > regression somewhere along the way with respect to corruptions after > power outages. > > I''m lucky enough to have very unreliable power. Btrfs was always > robust for me on power outages until recently. Now I''ve recently had > two corrupted volumes on unclean shutdowns and power outages. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Donnerstag, 25. August 2011 schrieb Anand Jain:> anyways, solutions containing disk-write-cache disabled and SSD > is quite popular now a days. And in terms of random synchronous > write performance they are awesome.There are SSD with capacitors such as Intel SSD 320. These according to the vendor should write out all remaining writes that made it to the disk cache should a power loss occur. I did not have any issues with BTRFS on / with a ThinkPad T520 and an Intel SSD 320. /home is still Ext4, as I want a fsck first. Thats with Linux 3.0.0-2 amd64 debian package. That said I also do not have any issues with BTRFS on a ThinkPad T23 for / and /home. But then the machine has an hibernate-to-disk-and-resume uptime of almost 120 days, so it didn´t see a power loss for a long time. Thats still with 2.6.38.4. -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2011-08-25 at 19:55 +0200, Martin Steigerwald wrote:> That said I also do not have any issues with BTRFS on a ThinkPad T23 > for / > and /home. But then the machine has an hibernate-to-disk-and-resume > uptime > of almost 120 days, so it didn´t see a power loss for a long time. > Thats > still with 2.6.38.4. >Which method of hibernation are you using? I got enormous problems with btrfs+toi including: - Freezes resulting in umountable partition (twice so far, 2 results in google including one message I sent to list) - Sometimes a random program (Skype, Firefox) cannot be frozen and stacktracke includes btrfs AIO. regards
On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote: [snip]> I thought the idea of COW was that whatever happens, you can always mount in > a semi-consistent state?[snip] It seems to me that if someone created a block device which recorded all write operations a rather excellent test could be constructed where a btrfs filesystem is recorded under load and then every partial replay is mounted and checked for corruption/data loss. This would result in high confidence that no power loss event could destroy data given the offered load assuming well behaved (non-reordering hardware). If it recorded barrier operations the a tool could also try many (but probably not all) permissible reorderings at every truncation offset. It seems to me that the existence of this kind of testing is something that should be expected of a modern filesystem before it sees widescale production use. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.08.2011 01:01, Gregory Maxwell wrote:> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote: > [snip] >> I thought the idea of COW was that whatever happens, you can always mount in >> a semi-consistent state? > [snip] > > > It seems to me that if someone created a block device which recorded > all write operations a rather excellent test could be constructed > where a btrfs filesystem is recorded under load and then every partial > replay is mounted and checked for corruption/data loss. > > This would result in high confidence that no power loss event could > destroy data given the offered load assuming well behaved > (non-reordering hardware). If it recorded barrier operations the a > tool could also try many (but probably not all) permissible > reorderings at every truncation offset. >I like the idea. Some more thoughts: - instead of trying all reorderings it might be enough to just always deliver the oldest possible copy - the order in which btrfs writes the data probably depends on the order in which the device acknowledges the request. You might need to add some reordering there, too - you need to produce a wide variety of workloads, as problems might only occur at a special kind of it (directIO, fsync, snapshots...) - if there really is a regression somewhere, it would be good to also include the full block layer into the test, as the regression might not be in btrfs at all - as a first small step one could just use blktrace to record the write order and analyze the order on mount as well> It seems to me that the existence of this kind of testing is something > that should be expected of a modern filesystem before it sees > widescale production use. > ---- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26 August 2011 07:37, Arne Jansen <sensille@gmx.net> wrote:> On 26.08.2011 01:01, Gregory Maxwell wrote: >> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote: >> >> It seems to me that if someone created a block device which recorded >> all write operations a rather excellent test could be constructed >> where a btrfs filesystem is recorded under load and then every partial >> replay is mounted and checked for corruption/data loss. >> >> This would result in high confidence that no power loss event could >> destroy data given the offered load assuming well behaved >> (non-reordering hardware). If it recorded barrier operations the a >> tool could also try many (but probably not all) permissible >> reorderings at every truncation offset. >> > > I like the idea. Some more thoughts: > - instead of trying all reorderings it might be enough to just always > deliver the oldest possible copy > - the order in which btrfs writes the data probably depends on the > order in which the device acknowledges the request. You might need > to add some reordering there, too > - you need to produce a wide variety of workloads, as problems might > only occur at a special kind of it (directIO, fsync, snapshots...) > - if there really is a regression somewhere, it would be good to also > include the full block layer into the test, as the regression might > not be in btrfs at all > - as a first small step one could just use blktrace to record the write > order and analyze the order on mount as well > >> It seems to me that the existence of this kind of testing is something >> that should be expected of a modern filesystem before it sees >> widescale production use.This article describes evaluating ext3, reiserfs and jfs using fault injection using a custom Linux block device driver. "Model-Based Failure Analysis of Journaling File Systems" http://www.cs.wisc.edu/adsl/Publications/sfa-dsn05.pdf -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26.08.2011 09:48, Mike Fleetwood wrote:>> On 26.08.2011 01:01, Gregory Maxwell wrote: > > This article describes evaluating ext3, reiserfs and jfs using fault > injection using a custom Linux block device driver. > "Model-Based Failure Analysis of Journaling File Systems" > http://www.cs.wisc.edu/adsl/Publications/sfa-dsn05.pdf > --While the article is interesting, it describes a quite different failure mode. I/O-error handling is not very sophisticated in btrfs yet. The tests Gregory were taking about are on completely well-behaving hardware. Failure injection would be the second step. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/26/2011 1:01 AM, Gregory Maxwell wrote:> On Wed, Aug 24, 2011 at 9:11 AM, Berend Dekens <btrfs@cyberwizzard.nl> wrote: > [snip] >> I thought the idea of COW was that whatever happens, you can always mount in >> a semi-consistent state? > [snip] > > > It seems to me that if someone created a block device which recorded > all write operations a rather excellent test could be constructed > where a btrfs filesystem is recorded under load and then every partial > replay is mounted and checked for corruption/data loss. > > This would result in high confidence that no power loss event could > destroy data given the offered load assuming well behaved > (non-reordering hardware). If it recorded barrier operations the a > tool could also try many (but probably not all) permissible > reorderings at every truncation offset. > > It seems to me that the existence of this kind of testing is something > that should be expected of a modern filesystem before it sees > widescale production use.Gregory, Thank you for the idea to implement a tool that verifies the file system consistency. Following your idea, I have just written a runtime tool for this purpose, refer to the message-id <cover.1320849129.git.sbehrens@giantdisaster.de> in the btrfs mailing list. The tool examines all btrfs disk write operations at runtime. It verifies that the on-disk data is always in a consistent state, in order to create confidence that power loss (or kernel panics) cannot cause corrupted file systems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Maciej, Am Freitag, 26. August 2011 schrieb Maciej Marcin Piechotka:> On Thu, 2011-08-25 at 19:55 +0200, Martin Steigerwald wrote: > > That said I also do not have any issues with BTRFS on a ThinkPad T23 > > for / > > and /home. But then the machine has an hibernate-to-disk-and-resume > > uptime > > of almost 120 days, so it didn´t see a power loss for a long time. > > Thats > > still with 2.6.38.4. > > Which method of hibernation are you using? > > I got enormous problems with btrfs+toi including: > > - Freezes resulting in umountable partition (twice so far, 2 results > in google including one message I sent to list) > - Sometimes a random program (Skype, Firefox) cannot be frozen and > stacktracke includes btrfs AIO.I do not use TOI since quite some time anymore since I had some issues with it I do not remember anymore, I think it didn´t work reliably back then. I like to try again, but did not come around to do it. I use in-kernel-suspend, now with 3.0, debian packaged, thus not even self compiled anymore, and its just rock solid on the T23. Its not that rock solide on my new ThinkPad T520, but that might also be some driver issues and I do not use BTRFS on the new one, only on the T23. Ciao, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html