Hi list, from ZFS documentation it appears unclear to me if a "zpool scrub" will black list any found bad blocks so they won''t be used anymore. I know Netapp''s WAFL scrub does reallocate bad blocks and mark them as unsable. Does ZFS have this kind of strategy ? Thanks. -- Didier
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Didier Rebeix > > from ZFS documentation it appears unclear to me if a "zpool > scrub" will black list any found bad blocks so they won''t be used > anymore.If there are any physically bad blocks, such that the hardware (hard disk) will return an error every time that block is used, then the disk should be replaced. All disks have a certain amount of error detection/correction built in, and remap bad blocks internally and secretly behind the scenes, transparent to the OS. So if there are any blocks regularly reporting bad to the OS, then it means there is a growing problem inside the disk. Offline the disk and replace it. It is ok to get an occasional cksum error. Say, once a year. Because the occasional cksum error will be re-read and as long as the data is correct the second time, no problem.
ZFS detects far more errors that traditional filesystems will simply miss. This means that many of the possible causes for those errors will be something other than a real bad block on the disk. As Edward said, the disk firmware should automatically remap real bad blocks, so if ZFS did that too, we''d not use the remapped block, which is probably fine. For other errors, there''s nothing wrong with the real block on the disk - it''s going to be firmware, driver, cache corruption, or something else, so blacklisting the block will not solve the issue. Also, with some types of disk (SSD), block numbers are moved around to achieve wear leveling, so blacklistinng a block number won''t stop you reusing that real block. -- Andrew Gabriel (from mobile) ------- Original message -------> From: Edward Ned Harvey > <opensolarisisdeadlongliveopensolaris at nedharvey.com> > To: Didier.Rebeix at u-bourgogne.fr, zfs-discuss at opensolaris.org > Sent: 8.11.''11, 12:50 > >> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Didier Rebeix >> >> from ZFS documentation it appears unclear to me if a "zpool >> scrub" will black list any found bad blocks so they won''t be used >> anymore. > > If there are any physically bad blocks, such that the hardware (hard > disk) > will return an error every time that block is used, then the disk should > be > replaced. All disks have a certain amount of error detection/correction > built in, and remap bad blocks internally and secretly behind the scenes, > transparent to the OS. So if there are any blocks regularly reporting > bad > to the OS, then it means there is a growing problem inside the disk. > Offline the disk and replace it. > > It is ok to get an occasional cksum error. Say, once a year. Because > the > occasional cksum error will be re-read and as long as the data is correct > the second time, no problem. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Very interesting... I didn''t know disk firwares were responsible for automagically relocating bad blocks. Knowing this, it makes no sense for a filesystem to try to deal with this kind of errors. For now, any disk with read/write errors detected will be discarded from my filers and replaced... Thanks ! Le Tue, 08 Nov 2011 13:03:57 +0000, "Andrew Gabriel" <Andrew.Gabriel at oracle.com> a ?crit :> ZFS detects far more errors that traditional filesystems will simply > miss. This means that many of the possible causes for those errors > will be something other than a real bad block on the disk. As Edward > said, the disk firmware should automatically remap real bad blocks, > so if ZFS did that too, we''d not use the remapped block, which is > probably fine. For other errors, there''s nothing wrong with the real > block on the disk - it''s going to be firmware, driver, cache > corruption, or something else, so blacklisting the block will not > solve the issue. Also, with some types of disk (SSD), block numbers > are moved around to achieve wear leveling, so blacklistinng a block > number won''t stop you reusing that real block. >-- Didier REBEIX Universite de Bourgogne Direction des Syst?mes d''Information BP 27877 21078 Dijon Cedex Tel: +33 380395205
On Tue, Nov 8, 2011 at 9:14 AM, Didier Rebeix <Didier.Rebeix at u-bourgogne.fr> wrote:> Very interesting... I didn''t know disk firwares were responsible for > automagically relocating bad blocks. Knowing this, it makes no sense for > a filesystem to try to deal with this kind of errors.In the dark ages, hard drives came with "bad block" lists taped to them so you could load them into the device driver for that drive. New bad blocks would be mapped out by the device driver. All that functionality was moved into the drive a long time ago (at least 10-15 years). Under Solaris, you can see the size of the bad block lists through FORMAT -> DEFECT -> PRIMARY will give you the size of the list from the factory and FORMAT -> DEFECT -> GROWN will give you those added since the drive left the factory. I tend to open a support case to have a drive replaced if the GROWN list is much above 0 or is growing. Keep in mind that any type of hardware RAID should report back 0 for both to the OS. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players