Hi there, I''ve started "btrfs scrub start /" on one of my machines (Kernel 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn''t suspected any disk issue. After having ran for only 165 seconds, "scrub status" shows it pretends having found and corrected 22926 CSUM errors ??!?!?!?!!??? This is a rather new HDD, in perfect shape (SMART all OK, never reallocated a single sector, less than 200 hours total runtime...) WTF ?!? I''ve cancelled scrub for now, until I get further understanding of what can be happening... -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swâmi Petaramesh wrote:> Hi there, > > I''ve started "btrfs scrub start /" on one of my machines (Kernel > 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn''t > suspected any disk issue. > > After having ran for only 165 seconds, "scrub status" shows it pretends > having found and corrected 22926 CSUM errors ??!?!?!?!!??? > > This is a rather new HDD, in perfect shape (SMART all OK, never > reallocated a single sector, less than 200 hours total runtime...) > > WTF ?!? > > I''ve cancelled scrub for now, until I get further understanding of what > can be happening... >So this is probably because of the extent tree corruption you had, it''s just cleaning things up and you should be fine once it finishes. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, Le 29/03/2013 13:58, Josef Bacik a écrit :> So this is probably because of the extent tree corruption you had, it''s just > cleaning things up and you should be fine once it finishes. Thanks,Er... It''s on a different machine ! Current (at the time I write) status is : # btrfs scrub status / scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70 scrub resumed at Fri Mar 29 11:52:43 2013 and finished after 7470 seconds total bytes scrubbed: 231.96GB with 149691 errors error details: csum=149691 corrected errors: 149691, uncorrectable errors: 0, unverified errors: 0 I have to say that scrub completely froze the machine at least 4 times (disk had ceased activity and any command that would imply a disk access would hang forever), but at least after a (quite brutal) reboot it could be resumed... The only thing about this FS is that it had been imaged, then restored, using partclone.btrfs (which itself is supposed to use the BTRFS libraries). I have a screenshot of "last thing I saw when it hanged", I can upload it somewhere, should it be relevant... Kind regards. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On that note, is btrfs doing automatic background scrubs of its own or do I have to use crontab to schedule scrubs? Thanks! On Fri, Mar 29, 2013 at 1:58 PM, Josef Bacik <jbacik@fusionio.com> wrote:> On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swāmi Petaramesh wrote: >> Hi there, >> >> I''ve started "btrfs scrub start /" on one of my machines (Kernel >> 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn''t >> suspected any disk issue. >> >> After having ran for only 165 seconds, "scrub status" shows it pretends >> having found and corrected 22926 CSUM errors ??!?!?!?!!??? >> >> This is a rather new HDD, in perfect shape (SMART all OK, never >> reallocated a single sector, less than 200 hours total runtime...) >> >> WTF ?!? >> >> I''ve cancelled scrub for now, until I get further understanding of what >> can be happening... >> > > So this is probably because of the extent tree corruption you had, it''s just > cleaning things up and you should be fine once it finishes. Thanks, > > Josef > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 29, 2013 at 02:06:39PM +0100, Harald Glatt wrote:> On that note, is btrfs doing automatic background scrubs of its own or > do I have to use crontab to schedule scrubs?If you want a full-disk scrub, you''ll need to schedule it yourself with cron (I run mine once a month). However, if a problem is detected during normal operation -- e.g. you read a piece of data and it''s got bad checksums -- then the FS will fix it if it can, in the same way that it would with a scrub. Hugo.> Thanks! > > On Fri, Mar 29, 2013 at 1:58 PM, Josef Bacik <jbacik@fusionio.com> wrote: > > On Fri, Mar 29, 2013 at 03:50:15AM -0600, Swāmi Petaramesh wrote: > >> Hi there, > >> > >> I''ve started "btrfs scrub start /" on one of my machines (Kernel > >> 3.8.0-15 Ubuntu AMD64), which typically "behaves well" so I wasn''t > >> suspected any disk issue. > >> > >> After having ran for only 165 seconds, "scrub status" shows it pretends > >> having found and corrected 22926 CSUM errors ??!?!?!?!!??? > >> > >> This is a rather new HDD, in perfect shape (SMART all OK, never > >> reallocated a single sector, less than 200 hours total runtime...) > >> > >> WTF ?!? > >> > >> I''ve cancelled scrub for now, until I get further understanding of what > >> can be happening... > >> > > > > So this is probably because of the extent tree corruption you had, it''s just > > cleaning things up and you should be fine once it finishes. Thanks, > > > > Josef-- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal)
On Fri, Mar 29, 2013 at 07:06:33AM -0600, Swâmi Petaramesh wrote:> Hi Josef, > > Le 29/03/2013 13:58, Josef Bacik a écrit : > > So this is probably because of the extent tree corruption you had, it''s just > > cleaning things up and you should be fine once it finishes. Thanks, > > Er... It''s on a different machine ! > > Current (at the time I write) status is : > > # btrfs scrub status / > scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70 > scrub resumed at Fri Mar 29 11:52:43 2013 and finished after > 7470 seconds > total bytes scrubbed: 231.96GB with 149691 errors > error details: csum=149691 > corrected errors: 149691, uncorrectable errors: 0, unverified > errors: 0 > > I have to say that scrub completely froze the machine at least 4 times > (disk had ceased activity and any command that would imply a disk access > would hang forever), but at least after a (quite brutal) reboot it could > be resumed... > > The only thing about this FS is that it had been imaged, then restored, > using partclone.btrfs (which itself is supposed to use the BTRFS libraries). >This is where I go "AHA!" and just assume that it wasn''t our fault ;).> I have a screenshot of "last thing I saw when it hanged", I can upload > it somewhere, should it be relevant... >Screenshots are welcome, I have no doubt scrub is fixing actual problems, but it definitely shouldn''t be hanging the box so I''d like to get those fixed if possible. Sysrq+w during hangs are very usefull but may be too much output for screenshots, netconsole works very nicely for this http://fedoraproject.org/wiki/Netconsole Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 29/03/2013 14:12, Josef Bacik a écrit :> Screenshots are welcome,I posted one to http://dl.free.fr/jsRQ8JXZh (use your email address or the list''s one to fetch it) It may or may not be very interesting, but that''s all I got.> I have no doubt scrub is fixing actual problemsLooks like it actually is. First times it hanged, I restarted it from the start and it wasn''t finding errors during the first GBs anymore, so I assumed it has fixed them in the previous pass (even though it eventually crashed the disk susbsystem).> , but it > definitely shouldn''t be hanging the box so I''d like to get those fixed if > possible. Sysrq+w during hangs are very usefull but may be too much output for > screenshots, netconsole works very nicely for thisI''ll restart the complete scrub (now everything is supposedly fixed...?) and let you know if it hangs agan and I can put the hand on something. Kind regards. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 29, 2013 at 07:06:33AM -0600, Swâmi Petaramesh wrote:> Hi Josef, > > Le 29/03/2013 13:58, Josef Bacik a écrit : > > So this is probably because of the extent tree corruption you had, it''s just > > cleaning things up and you should be fine once it finishes. Thanks, > > Er... It''s on a different machine ! > > Current (at the time I write) status is : > > # btrfs scrub status / > scrub status for 346b81b2-0735-4c4d-a137-1995bc78ad70 > scrub resumed at Fri Mar 29 11:52:43 2013 and finished after > 7470 seconds > total bytes scrubbed: 231.96GB with 149691 errors > error details: csum=149691 > corrected errors: 149691, uncorrectable errors: 0, unverified > errors: 0 > > I have to say that scrub completely froze the machine at least 4 times > (disk had ceased activity and any command that would imply a disk access > would hang forever), but at least after a (quite brutal) reboot it could > be resumed... > > The only thing about this FS is that it had been imaged, then restored, > using partclone.btrfs (which itself is supposed to use the BTRFS libraries). > > I have a screenshot of "last thing I saw when it hanged", I can upload > it somewhere, should it be relevant... >Actually instead of netconsole we have an awesome service provided by Carey, you can just do nc cwillu.com 10101 < /dev/kmsg after you''ve run sysrq+w and then reply with the URL it spits out. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Actually instead of netconsole we have an awesome service provided by Carey, you > can just do > > nc cwillu.com 10101 < /dev/kmsg... at a root prompt.> after you''ve run sysrq+w and then reply with the URL it spits out. Thanks,-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 29/03/2013 14:12, Josef Bacik a écrit :> Screenshots are welcomeThis time I good a real nice kernel Ooops during scrub... http://dl.free.fr/hjAdOH3mG (use your email address or the list''s one to fetch it) -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 29/03/2013 14:26, Josef Bacik a écrit :> after you''ve run sysrq+w and then reply with the URL it spits out. Thanks,I''m afraid I won''t be able to do this this afternoon : I also need to work on my machine ;-) so for now I will avoid to restart a scrub that would possibly crash it once more... I''ll hopefully be back on this soon. Kind regards. -- Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E Ne cherchez pas : Je ne suis pas sur Facebook. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 29, 2013 at 07:39:06AM -0600, Swâmi Petaramesh wrote:> Le 29/03/2013 14:26, Josef Bacik a écrit : > > after you''ve run sysrq+w and then reply with the URL it spits out. Thanks, > I''m afraid I won''t be able to do this this afternoon : I also need to > work on my machine ;-) so for now I will avoid to restart a scrub that > would possibly crash it once more... > > I''ll hopefully be back on this soon. >Yeah that picture was enough, I see what''s going on, I''ll send a patch. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html