Hallo, I''m just copying about 1.5 TByte from a 3-disks-btrfs directory (data: raid0) to another disk. And there seem to be 2 damaged files, they stop the copying process. Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Oct 7 18:16:55 Arktur kernel: ata5.00: BMDMA2 stat 0x80d2009 Oct 7 18:16:55 Arktur kernel: ata5.00: failed command: READ DMA Oct 7 18:16:55 Arktur kernel: ata5.00: cmd c8/00:40:57:d0:34/00:00:00:00:00/ee tag 0 dma 32768 in Oct 7 18:16:55 Arktur kernel: res 51/40:40:57:d0:34/00:03:0e:00:00/fe Emask 0x9 (media error) Oct 7 18:16:55 Arktur kernel: ata5.00: status: { DRDY ERR } Oct 7 18:16:55 Arktur kernel: ata5.00: error: { UNC } Oct 7 18:16:55 Arktur kernel: ata5.00: configured for UDMA/100 Oct 7 18:16:55 Arktur kernel: ata5: EH complete (repeating every 3 seconds) The files contain no valuable data (*.mpeg files, reproducable). But how can I tell the disk not to use the damaged sector(s)? On an ext2/3 system I used "badblocks" - is there some comparable tool for btrfs? Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 07, 2011 at 06:51:00PM +0200, Helmut Hullen wrote:> Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask 0x0 SAct 0x0 > SErr 0x0 action 0x0 > Oct 7 18:16:55 Arktur kernel: ata5.00: BMDMA2 stat 0x80d2009 > Oct 7 18:16:55 Arktur kernel: ata5.00: failed command: READ DMA > Oct 7 18:16:55 Arktur kernel: ata5.00: cmd c8/00:40:57:d0:34/00:00:00:00:00/ee tag 0 dma 32768 in > Oct 7 18:16:55 Arktur kernel: res 51/40:40:57:d0:34/00:03:0e:00:00/fe Emask 0x9 (media error) > Oct 7 18:16:55 Arktur kernel: ata5.00: status: { DRDY ERR } > Oct 7 18:16:55 Arktur kernel: ata5.00: error: { UNC } > Oct 7 18:16:55 Arktur kernel: ata5.00: configured for UDMA/100 > Oct 7 18:16:55 Arktur kernel: ata5: EH complete > > (repeating every 3 seconds) > > The files contain no valuable data (*.mpeg files, reproducable). But how > can I tell the disk not to use the damaged sector(s)? > > On an ext2/3 system I used "badblocks" - is there some comparable tool > for btrfs?No there isn''t, but it''s a good topic for a btrfs project :) (I see lots of interesting problems like relocating superblocks, damaged allocator structures, ...) david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, David, Du meintest am 10.10.11:>> Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask 0x0 SAct 0x0 >> SErr 0x0 action 0x0 >> Oct 7 18:16:55 Arktur kernel: ata5.00: BMDMA2 stat 0x80d2009 >> Oct 7 18:16:55 Arktur kernel: ata5.00: failed command: READ DMA >> Oct 7 18:16:55 Arktur kernel: ata5.00: cmd >> c8/00:40:57:d0:34/00:00:00:00:00/ee tag 0 dma 32768 in Oct 7 >> 18:16:55 Arktur kernel: res >> 51/40:40:57:d0:34/00:03:0e:00:00/fe Emask 0x9 (media error) Oct 7 >> 18:16:55 Arktur kernel: ata5.00: status: { DRDY ERR } Oct 7 >> 18:16:55 Arktur kernel: ata5.00: error: { UNC } Oct 7 18:16:55 >> Arktur kernel: ata5.00: configured for UDMA/100 Oct 7 18:16:55 >> Arktur kernel: ata5: EH complete >> >> (repeating every 3 seconds) >> >> The files contain no valuable data (*.mpeg files, reproducable). But >> how can I tell the disk not to use the damaged sector(s)? >> >> On an ext2/3 system I used "badblocks" - is there some comparable >> tool for btrfs?> No there isn''t, but it''s a good topic for a btrfs project :)> (I see lots of interesting problems like relocating superblocks, > damaged allocator structures, ...)I''ve just worked again with the 2 unreadable files. Copying them to another partition stopped somewhere, one time/file at about 98%, the other time at about 2%. I had to kill the "cp" order with "killall cp". The same problem with deleting: I had to use "killall rm". "I''m not amused" ... And I''m curious what the system will do with the 2 unreadable sectors. In about 1 year I have to add the next 2 TByte disk, with "add" and "balance". Maybe I have to copy the 3-disks cluster to a 4-disks-cluster ... Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/10/2011 09:28 AM, Helmut Hullen wrote:> Hallo, David, > > Du meintest am 10.10.11: > >>> Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask 0x0 >>> SAct 0x0 SErr 0x0 action 0x0 Oct 7 18:16:55 Arktur kernel: >>> ata5.00: BMDMA2 stat 0x80d2009 Oct 7 18:16:55 Arktur kernel: >>> ata5.00: failed command: READ DMA Oct 7 18:16:55 Arktur >>> kernel: ata5.00: cmd c8/00:40:57:d0:34/00:00:00:00:00/ee tag 0 >>> dma 32768 in Oct 7 18:16:55 Arktur kernel: res >>> 51/40:40:57:d0:34/00:03:0e:00:00/fe Emask 0x9 (media error) Oct >>> 7 18:16:55 Arktur kernel: ata5.00: status: { DRDY ERR } Oct 7 >>> 18:16:55 Arktur kernel: ata5.00: error: { UNC } Oct 7 >>> 18:16:55 Arktur kernel: ata5.00: configured for UDMA/100 Oct 7 >>> 18:16:55 Arktur kernel: ata5: EH complete >>> >>> (repeating every 3 seconds) >>> >>> The files contain no valuable data (*.mpeg files, >>> reproducable). But how can I tell the disk not to use the >>> damaged sector(s)? >>> >>> On an ext2/3 system I used "badblocks" - is there some >>> comparable tool for btrfs? > >> No there isn''t, but it''s a good topic for a btrfs project :) > >> (I see lots of interesting problems like relocating superblocks, >> damaged allocator structures, ...) > > I''ve just worked again with the 2 unreadable files. > > Copying them to another partition stopped somewhere, one time/file > at about 98%, the other time at about 2%. > > I had to kill the "cp" order with "killall cp". > > The same problem with deleting: I had to use "killall rm". "I''m not > amused" ... > > And I''m curious what the system will do with the 2 unreadable > sectors. In about 1 year I have to add the next 2 TByte disk, with > "add" and "balance". Maybe I have to copy the 3-disks cluster to a > 4-disks-cluster ...I''d try replacing the SATA cable and if that doesn''t fix it up, you may be out of luck. The thing is that marking sectors bad is a (pretty poor) band-aid for a much bigger problem: If you''re hitting persistent read errors and re-writing the blocks doesn''t fix it, your disk is already close to being completely kaput and no amount of software is going to help with that. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6S/DUACgkQLPWxlyuTD7LSXgCfZDTgMjg4mc/cbRBZeYLbmlKS A08An0DoPONviCz64sYq9H9HL3Xt0ywZ =p/lR -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Jeff, Du meintest am 10.10.11:>>>> Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask 0x0 >>>> SAct 0x0 SErr 0x0 action 0x0[...]>> I''ve just worked again with the 2 unreadable files. >> >> Copying them to another partition stopped somewhere, one time/file >> at about 98%, the other time at about 2%.[...]> I''d try replacing the SATA cable and if that doesn''t fix it up, you > may be out of luck.There are 2 unreadable sectors (reproducable). Changing or re-mounting the cables doesn''t help.> The thing is that marking sectors bad is a > (pretty poor) band-aid for a much bigger problem: If you''re hitting > persistent read errors and re-writing the blocks doesn''t fix it, your > disk is already close to being completely kaput and no amount of > software is going to help with that.The next steps could be: - adding a new 2-TByte disk (now there are 3 2-TByte disks) - balancing - removing the bad 2-TByte disk But I''m afraid when I run balancing then the bad sectors damage big parts of the contents. I''ve had such bad luck about 1 year ago, losing about 2 TByte of data (ok - I had a kind of backup in a neighbout town). I don''t like to reproduce this experience. I''m afraid I have to buy 3 (or 4) 2-TByte disks, building them as a new raid0-data cluster and copy the complete contents from the old cluster to the new one. Doesn''t sound good. ----------------------- 2 bad sectors from a total of 4*10^9 sectors is (in another point of view) no bad error rate ... Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/10/2011 11:58 AM, Helmut Hullen wrote:> Hallo, Jeff, > > Du meintest am 10.10.11: > >>>>> Oct 7 18:16:55 Arktur kernel: ata5.00: exception Emask >>>>> 0x0 SAct 0x0 SErr 0x0 action 0x0 > > [...] > >>> I''ve just worked again with the 2 unreadable files. >>> >>> Copying them to another partition stopped somewhere, one >>> time/file at about 98%, the other time at about 2%. > > [...] > >> I''d try replacing the SATA cable and if that doesn''t fix it up, >> you may be out of luck. > > There are 2 unreadable sectors (reproducable). Changing or > re-mounting the cables doesn''t help. > >> The thing is that marking sectors bad is a (pretty poor) band-aid >> for a much bigger problem: If you''re hitting persistent read >> errors and re-writing the blocks doesn''t fix it, your disk is >> already close to being completely kaput and no amount of software >> is going to help with that. > > The next steps could be: > > - adding a new 2-TByte disk (now there are 3 2-TByte disks) - > balancing - removing the bad 2-TByte disk > > But I''m afraid when I run balancing then the bad sectors damage big > parts of the contents. I''ve had such bad luck about 1 year ago, > losing about 2 TByte of data (ok - I had a kind of backup in a > neighbout town). I don''t like to reproduce this experience. > > I''m afraid I have to buy 3 (or 4) 2-TByte disks, building them as a > new raid0-data cluster and copy the complete contents from the old > cluster to the new one. Doesn''t sound good. > > ----------------------- > > 2 bad sectors from a total of 4*10^9 sectors is (in another point > of view) no bad error rate ...Well, it''s worse than that. The disk will try to correct for bad sectors itself internally and will remap them. By the time you start to see bad sectors on disk (where writing and then reading fails), the disk''s internal remap table has been filled. That hides the true defect rate but it also means that it''s only a matter of time before you get more bad sectors. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6YkbwACgkQLPWxlyuTD7LrTACeJFBbYZtJrUVBwDM8+R2BBrHS moIAn3wIZd2Q9TEo8mUkAhVtdZnHgYdr =hpBv -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Helmut, Am Montag, 10. Oktober 2011 schrieb Helmut Hullen:> > The thing is that marking sectors bad is a > > (pretty poor) band-aid for a much bigger problem: If you''re hitting > > persistent read errors and re-writing the blocks doesn''t fix it, your > > disk is already close to being completely kaput and no amount of > > software is going to help with that. > > The next steps could be: > > - adding a new 2-TByte disk (now there are 3 2-TByte disks) > - balancing > - removing the bad 2-TByte disk > > But I''m afraid when I run balancing then the bad sectors damage big > parts of the contents. I''ve had such bad luck about 1 year ago, > losing about 2 TByte of data (ok - I had a kind of backup in a > neighbout town). I don''t like to reproduce this experience. > > I''m afraid I have to buy 3 (or 4) 2-TByte disks, building them as a > new raid0-data cluster and copy the complete contents from the old > cluster to the new one. Doesn''t sound good.RAID-0 and valuable (?) data does not match together. So if you go 4 disks, consider a RAID 10 ;). Then you could set the disk faulty, put in a new one and let BTRFS resync/balance the RAID. But if everything is only stored on one disk thats not possible. A RAID 5 might also be an alternative, but I am not sure, whether RAID-5 is already working with BTRFS. I heard about plans to borrow some SoftRAID code for that. Ciao, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Martin, Du meintest am 15.10.11:> RAID-0 and valuable (?) data does not match together.I know. The data isn''t valuable. It''s *.mpeg2 from DVB-T, repeated at least every two years. It''s a kind of old LP or old VHS cassette. But that doesn''t solve the problem with errors on one of the disks. I don''t like to throw away a disk if it has (perhaps) repairable read errors. I''d like to use a tool like "badblocks". Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2011-10-15 at 21:59 +0200, Helmut Hullen wrote:> Hallo, Martin, > > Du meintest am 15.10.11: > > > RAID-0 and valuable (?) data does not match together. > > I know. The data isn''t valuable. It''s *.mpeg2 from DVB-T, repeated at > least every two years. It''s a kind of old LP or old VHS cassette. > > But that doesn''t solve the problem with errors on one of the disks. I > don''t like to throw away a disk if it has (perhaps) repairable read > errors. I''d like to use a tool like "badblocks".Well, lets take a look at the state of your drive. Install smartmontools, and run ''smartctl -A /dev/sdX''. One a properly operational drive, you''ll see these: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 First things first. If the VALUE of Reallocated_Sector_Ct is less than or equal to THRES, then your drive is garbage; all of the reallocation space has been used. This means many errors have occured, and more will keep happening. Get it replaced ASAP. If the RAW_VALUE of Reallocated_Sector_Ct is above 0, then the drive has in the past dynamically reallocated some sectors - i.e. it has had errors, but they have been repaired. The Current_Pending_Sector value is interesting. It counts the number of sectors which have had read errors, but have not been remapped internally in the drive, because it couldn''t recover the data using error correction. These result in Read errors in the OS - this is probably what you are seeing. If you have pending sectors, causing the drive to reallocate them is very simple. Write data (any data) over the sector in question - the drive will then remap it onto the spare area to do the write. (The easiest way is to do something like dd if=/dev/zero of=/dev/sdX; but if you know the exact sector number, "hdparm --write-sector" can remap it quickly.) Keep in mind, though - if you have a single reallocated sector on a drive, it means that the drive medium is deteriorating. It''s very likely that you will have additional failures in the future, resulting in more IO errors and lost data. For your sanity, I recommend replacing a drive as soon as you see any one error on it. -- Calvin Walton <calvin.walton@kepstin.ca> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Calvin, Du meintest am 16.10.11:>> I don''t like to throw away a disk if it has (perhaps) repairable >> read errors. I''d like to use a tool like "badblocks".> Well, lets take a look at the state of your drive. Install > smartmontools, and run ''smartctl -A /dev/sdX''. One a properly > operational drive, you''ll see these: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > 197 Current_Pending_Sector 0x0032 200 200 000 Old_age > Always - 0Here (WDC WD20EARS): 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 26 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 25 -------------------> First things first. If the VALUE of Reallocated_Sector_Ct is less > than or equal to THRES, then your drive is garbage; all of the > reallocation space has been used. This means many errors have > occured, and more will keep happening. Get it replaced ASAP.There may be hope ...> The Current_Pending_Sector value is interesting. It counts the number > of sectors which have had read errors, but have not been remapped > internally in the drive, because it couldn''t recover the data using > error correction. These result in Read errors in the OS - this is > probably what you are seeing.> If you have pending sectors, causing the drive to reallocate them is > very simple. Write data (any data) over the sector in question - the > drive will then remap it onto the spare area to do the write. (The > easiest way is to do something like dd if=/dev/zero of=/dev/sdX; but > if you know the exact sector number, "hdparm --write-sector" can > remap it quickly.)Ok - I''ll take a try.> Keep in mind, though - if you have a single reallocated sector on a > drive, it means that the drive medium is deteriorating. It''s very > likely that you will have additional failures in the future, > resulting in more IO errors and lost data. For your sanity, I > recommend replacing a drive as soon as you see any one error on it.In the past most (nearly all) such problems came from a bad power supply and/or bad cables, "dd if=/dev/zero" or "badblocks" fixed them ... Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Calvin, Du meintest am 16.10.11: [...]> If you have pending sectors, causing the drive to reallocate them is > very simple. Write data (any data) over the sector in question - the > drive will then remap it onto the spare area to do the write. (The > easiest way is to do something like dd if=/dev/zero of=/dev/sdX; but > if you know the exact sector number, "hdparm --write-sector" can > remap it quickly.)I have to try in the next days ...> Keep in mind, though - if you have a single reallocated sector on a > drive, it means that the drive medium is deteriorating. It''s very > likely that you will have additional failures in the future, > resulting in more IO errors and lost data. For your sanity, I > recommend replacing a drive as soon as you see any one error on it.Actually "dd if=/dev/sdg of=/dev/zero " tells (in "/var/log/warn") strange things like Oct 18 14:42:48 Arktur kernel: Buffer I/O error on device sdg, logical block 29792786 Oct 18 14:42:48 Arktur kernel: Buffer I/O error on device sdg, logical block 29792787 Oct 18 14:43:04 Arktur kernel: end_request: I/O error, dev sdg, sector 238342224 Oct 18 14:43:04 Arktur kernel: Buffer I/O error on device sdg, logical block 29792778 Oct 18 14:43:20 Arktur kernel: end_request: I/O error, dev sdg, sector 238342224 Oct 18 14:43:20 Arktur kernel: Buffer I/O error on device sdg, logical block 29792778 ------------------------- From yesterday to this morning the number of offline uncorrectable has grown from 25 to 26 - no good omen. Maybe there are some files damaged - the disk is filled with about 1.4 TByte, it''s part of a btrfs cluster with more than 4 TByte data. What about "btrfsck" - can it help? Or may it lead to one more crash? When I try to copy the whole cluster to another place (I had this problem some days ago) then the system crashes when it tries to access that special file that uses such a defect sector. When I can first detect the name of this file and then exclude it from copying then "cp" works. Nice problems ... Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hallo, Calvin, Du meintest am 16.10.11: [...]> If you have pending sectors, causing the drive to reallocate them is > very simple. Write data (any data) over the sector in question - the > drive will then remap it onto the spare area to do the write. (The > easiest way is to do something like dd if=/dev/zero of=/dev/sdX; but > if you know the exact sector number, "hdparm --write-sector" can > remap it quickly.)(instead of a blog ... and please excuse my gerlish) I''ve buyed another 2-TByte disk (Samsung - seems to be bullet proofed). dd if=/dev/baddisk of=/dev/gooddisk bs=8M conv=noerror worked (about 30 hours for 2 TByte), it produced many error messages. Unplugged /dev/baddisk, plugged /dev/gooddisk, mounted the 3-disk- cluster: worked. Looking into the directories: showed no problem (with the bad disk even that produced error messages). Trying to play an *.mpg: nothing. Shit. Some error messages. Next adventure: Removed the good disk, plugged the bad disk. Extracted the bad sectors (for baddisk = sdd) with grep ''I/O error'' /var/log/warn | grep ''dev sdd'' | \ cut -d'' '' -f11- | sort -u > /home/tmp/WDC-20111021.txt "repaired" them with #! /bin/bash # Geruest: Joerg Sommer, de.comp.os.unix.linux.hardware 18.10.2011 Platte=/dev/sdd # WD 20EARS # bad sector 778550400 for blk in $(seq 778550000 778551000) do hdparm --read-sector $blk $Platte > /dev/null test $? -eq 5 || continue hdparm --write-sector $blk --yes-i-know-what-i-am-doing $Platte done # Seems to work as desired; it seems to be a good idea to "repair" not only the sectors shown in "/var/log/warn" but their probably environment too. Ok - the file that uses this sector may be badly damaged. But people who have worked with LPs or MCs know such a behaviour ... no real problem. Run btrfsck. Many error messages. When btrfs filesystem show shows 3 disks in the cluster: have I to run btrfsck for each disk, or runs btrfsck over all disks of this cluster? But now I can not only the the directories but can "open" the contents too - much better than nothing! Possible next adventure: for Datei in /Path/to/*.mpg do cat "$Datei" > /dev/null done produces (if necessary) not only the error messages in "/var/log/warn" but the names of the damaged files too. Viele Gruesse! Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html