Michael Raskin
2009-Jun-11 12:44 UTC
cleanup after a small data loss on incorrect shutdown.
Hello. I am continuing my tests of BtrFS under a practical workload. Recently an incorrect poweroff (or maybe a small bug in BtrFS) caused a small data loss. The actual damage was non-existent. I used old branch, so maybe the relevant code is already improved. 1. Why btrfsck says "bad block" on that partition? What does it mean? My fist reaction was to use badblocks. It found no badblocks in its own sense, so I assume btrfsck means something else. It would be nice to explain that to user. Maybe "damaged FS data block" ? 2. I found a file which is listed in the directory, but stat on it returns "No such file or directory". Certainly, rm and unlink cannot remove it. The partition has 14G in use. What can I do to provide a useful piece of FS structure information? How can I remove the file afterwards. 3. On a 30G partition with 14G used btrfsck was left overnight. It has neither finished nor printed any meaningful request for interaction. Is it normal? Michael Raskin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2009-Jun-12 10:53 UTC
Re: cleanup after a small data loss on incorrect shutdown.
On Thu, Jun 11, 2009 at 04:44:30PM +0400, Michael Raskin wrote:> Hello. > > I am continuing my tests of BtrFS under a practical workload. Recently > an incorrect poweroff (or maybe a small bug in BtrFS) caused a small > data loss. The actual damage was non-existent. > I used old branch, so maybe the relevant code is already improved. > > 1. Why btrfsck says "bad block" on that partition? What does it mean? > My fist reaction was to use badblocks. It found no badblocks in its own > sense, so I assume btrfsck means something else. It would be nice to > explain that to user. Maybe "damaged FS data block" ?Yes, it would make sense to make these more informative.> > 2. I found a file which is listed in the directory, but stat on it > returns "No such file or directory". Certainly, rm and unlink cannot > remove it. The partition has 14G in use. What can I do to provide a > useful piece of FS structure information? How can I remove the file > afterwards.I''d say to send us the btrfsck output, it will help answer these questions.> > 3. On a 30G partition with 14G used btrfsck was left overnight. It has > neither finished nor printed any meaningful request for interaction. Is > it normal?Definitely not ;) You can check with vmstat to see if btrfsck is actually doing anything, but it sounds like you hit a bug. Which version of the kernel and tools are you using? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Raskin
2009-Jun-12 11:08 UTC
Re: cleanup after a small data loss on incorrect shutdown.
Chris Mason wrote:>> 2. I found a file which is listed in the directory, but stat on it >> returns "No such file or directory". Certainly, rm and unlink cannot >> remove it. The partition has 14G in use. What can I do to provide a >> useful piece of FS structure information? How can I remove the file >> afterwards. > > I''d say to send us the btrfsck output, it will help answer these > questions.Oh, easily. "Bad block <number way beyond partition block count>". That''s all. Reading one of the damaged file actually returned "Input/output error" - probably it tried to read beyond end-of-device. I had to kill this file (practical testing means that to continue to use my notebook normally I had to nuke the damaged file and get intact copies). The "no such file except in readdir" is still there right now.>> 3. On a 30G partition with 14G used btrfsck was left overnight. It has >> neither finished nor printed any meaningful request for interaction. Is >> it normal? > > Definitely not ;) You can check with vmstat to see if btrfsck is > actually doing anything, but it sounds like you hit a bug. Which > version of the kernel and tools are you using?v0.18 release tools. 2.6.30-rc8 kernel. (It seems to include everything before newformat patches). "top" said 99% of single-core Celeron was used by btrfsck. I didn''t run vmstat, but I remember that e2fsck random reads would make much more noise (and for linear reading it took way too much time). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2009-Jun-12 11:42 UTC
Re: cleanup after a small data loss on incorrect shutdown.
On Fri, Jun 12, 2009 at 03:08:58PM +0400, Michael Raskin wrote:> Chris Mason wrote: > >> 2. I found a file which is listed in the directory, but stat on it > >> returns "No such file or directory". Certainly, rm and unlink cannot > >> remove it. The partition has 14G in use. What can I do to provide a > >> useful piece of FS structure information? How can I remove the file > >> afterwards. > > > > I''d say to send us the btrfsck output, it will help answer these > > questions. > > Oh, easily. "Bad block <number way beyond partition block count>".Btrfs deals in byte numbers not block numbers ;)> That''s all. Reading one of the damaged file actually returned > "Input/output error" - probably it tried to read beyond end-of-device. I > had to kill this file (practical testing means that to continue to use > my notebook normally I had to nuke the damaged file and get intact > copies). The "no such file except in readdir" is still there right now.Ok, btrfsck will give us more output when it finishes, but it hasn''t finished. It would help to use btrfs-image to send us a coyp of the metadata so we can fix the btrfsck bug. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Raskin
2009-Jun-12 11:48 UTC
Re: cleanup after a small data loss on incorrect shutdown.
Chris Mason wrote:>> 2. I found a file which is listed in the directory, but stat on it >> returns "No such file or directory". Certainly, rm and unlink cannot >> remove it. The partition has 14G in use. What can I do to provide a >> useful piece of FS structure information? How can I remove the file >> afterwards. > > I''d say to send us the btrfsck output, it will help answer these > questions.OK, after cleaning the FS from bad files I managed to run btrfsck. It reports unresolved reference. The files I cleaned were related to libattr, so I do not believe a kernel header would be a hardlink to them.
Michael Raskin
2009-Jun-12 11:56 UTC
Re: cleanup after a small data loss on incorrect shutdown.
Chris Mason wrote:>>> I''d say to send us the btrfsck output, it will help answer these >>> questions. >> Oh, easily. "Bad block <number way beyond partition block count>". > > Btrfs deals in byte numbers not block numbers ;)Interesting to know. Maybe just adding "at" in the message would reduce confusion. It doesn''t look like it is a canonical bad block anyway.>> That''s all. Reading one of the damaged file actually returned >> "Input/output error" - probably it tried to read beyond end-of-device. I >> had to kill this file (practical testing means that to continue to use >> my notebook normally I had to nuke the damaged file and get intact >> copies). The "no such file except in readdir" is still there right now. > > Ok, btrfsck will give us more output when it finishes, but it hasn''t > finished. It would help to use btrfs-image to send us a coyp of the > metadata so we can fix the btrfsck bug.Well, as the partition fills up at ~25 G of 30 G used, I guess that average metadata size is >=1G for that partition. And now I destroyed the evidence to make the notebook boot. The disappearing file, though, is a minor annoyance so I can keep it and do whatever is needed with btrfs-image.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Raskin
2009-Jun-12 18:52 UTC
Re: cleanup after a small data loss on incorrect shutdown.
Chris Mason wrote:>> That''s all. Reading one of the damaged file actually returned >> "Input/output error" - probably it tried to read beyond end-of-device. I >> had to kill this file (practical testing means that to continue to use >> my notebook normally I had to nuke the damaged file and get intact >> copies). The "no such file except in readdir" is still there right now. > > Ok, btrfsck will give us more output when it finishes, but it hasn''t > finished. It would help to use btrfs-image to send us a coyp of the > metadata so we can fix the btrfsck bug.I have a 74M compressed btrfs-image of a partition with a ghost file (I sent btrfsck logs earlier). Would they be of any use in debugging handling of such situations? If yes - how should I transmit the image file? How can I kill the ghost file? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html