Morning, For those of you who remember last time, this is a different Solaris, different disk box and different host, but the epic nature of the fail is similar. The RAID box that is the 63T LUN has a hardware fault and has been crashing, up to now the box and host got restarted and both came up fine. However, just now as I have got replacement hardware in position and was ready to start copying, it went bang and my data has all gone. Ideas? root at cs4:~# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT content 62.5T 59.9T 2.63T 95% ONLINE - root at cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM content ONLINE 0 0 32 c2t8d0 ONLINE 0 0 32 errors: Permanent errors have been detected in the following files: content:<0x0> content:<0x2c898> root at cs4:~# find /content /content root at cs4:~# (yes that really is it) root at cs4:~# uname -a SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 from format: 2. c2t8d0 <IFT-S12S-G1033-363H-62.76TB> /pci at 7c0/pci at 0/pci at 8/LSILogic,sas at 0/sd at 8,0 Also, "content" does not show in df output. thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain
On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird <tom at marmot.org.uk> wrote:> Morning, > > For those of you who remember last time, this is a different Solaris, > different disk box and different host, but the epic nature of the fail > is similar. > > The RAID box that is the 63T LUN has a hardware fault and has been > crashing, up to now the box and host got restarted and both came up > fine. However, just now as I have got replacement hardware in position > and was ready to start copying, it went bang and my data has all gone. > > Ideas? > > > root at cs4:~# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > content 62.5T 59.9T 2.63T 95% ONLINE - > > root at cs4:~# zpool status -v > pool: content > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > content ONLINE 0 0 32 > c2t8d0 ONLINE 0 0 32 > > errors: Permanent errors have been detected in the following files: > > content:<0x0> > content:<0x2c898> > > root at cs4:~# find /content > /content > root at cs4:~# (yes that really is it) > > root at cs4:~# uname -a > SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 > > from format: > 2. c2t8d0 <IFT-S12S-G1033-363H-62.76TB> > /pci at 7c0/pci at 0/pci at 8/LSILogic,sas at 0/sd at 8,0 > > Also, "content" does not show in df output. > > thanks > -- > Tom > > // www.portfast.co.uk -- internet services and consultancy > // hosting from 1.65 per domain >Those are supposedly the two inodes that are corrupt. The 0x0 is a bit scary... you should be able to find out what file(s) they''re tied to (if any) with: find /content -inum 0 find /content -inum 182424 If you can live without those files, delete them, export the pool, re-import, and resilver, and you should be good to go. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090118/0118af09/attachment.html>
Tim <tim at tcsac.net> wrote:> On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird <tom at marmot.org.uk> wrote:> Those are supposedly the two inodes that are corrupt. The 0x0 is a bit > scary... you should be able to find out what file(s) they''re tied to (if > any) with: > > find /content -inum 0 > find /content -inum 182424Using find to search for inodes with st_inode == 0 is not something you may rely on to work as expected. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) joerg.schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
Tim wrote:> On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird <tom at marmot.org.uk > <mailto:tom at marmot.org.uk>> wrote:> errors: Permanent errors have been detected in the following files: > > content:<0x0> > content:<0x2c898> > > root at cs4:~# find /content > /content > root at cs4:~# (yes that really is it)> Those are supposedly the two inodes that are corrupt. The 0x0 is a bit > scary... you should be able to find out what file(s) they''re tied to (if > any) with: > > find /content -inum 0 > find /content -inum 182424 > > If you can live without those files, delete them, export the pool, > re-import, and resilver, and you should be good to go.Hi, well one of the problems is that find doesn''t find anything as it is not presenting any files, so I can''t delete anything. I''ve exported the pool but on reimport, I get the same error as I was getting last time something popped: root at cs4:~# zpool import content cannot open ''content'': I/O error Last time, Victor Latushkin fixed it by modifying the file system to point to an older copy of the data. I''ve not really been following the list of late, any more sign of a fsck.zfs...? thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain
Hey, Tom - Correct me if I''m wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. I''m not sure how you can class it a ZFS fail when the Disk subsystem has failed... Or - did I miss something? :) Nathan. Tom Bird wrote:> Morning, > > For those of you who remember last time, this is a different Solaris, > different disk box and different host, but the epic nature of the fail > is similar. > > The RAID box that is the 63T LUN has a hardware fault and has been > crashing, up to now the box and host got restarted and both came up > fine. However, just now as I have got replacement hardware in position > and was ready to start copying, it went bang and my data has all gone. > > Ideas? > > > root at cs4:~# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > content 62.5T 59.9T 2.63T 95% ONLINE - > > root at cs4:~# zpool status -v > pool: content > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > content ONLINE 0 0 32 > c2t8d0 ONLINE 0 0 32 > > errors: Permanent errors have been detected in the following files: > > content:<0x0> > content:<0x2c898> > > root at cs4:~# find /content > /content > root at cs4:~# (yes that really is it) > > root at cs4:~# uname -a > SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 > > from format: > 2. c2t8d0 <IFT-S12S-G1033-363H-62.76TB> > /pci at 7c0/pci at 0/pci at 8/LSILogic,sas at 0/sd at 8,0 > > Also, "content" does not show in df output. > > thanks-- /////////////////////////////////////////////////////////////// // Nathan Kroenert nathan.kroenert at sun.com // // Senior Systems Engineer Phone: +61 3 9869 6255 // // Global Systems Engineering Fax: +61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 Victoria Australia // ///////////////////////////////////////////////////////////////
On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote:> Hey, Tom - > > Correct me if I''m wrong here, but it seems you are not allowing ZFS > any > sort of redundancy to manage.Which is particularly catastrophic when one''s ''content'' is organized as a monolithic file, as it is here - unless, of course, you have some way of scavenging that file based on internal structure. --Toby> > I''m not sure how you can class it a ZFS fail when the Disk > subsystem has > failed... > > Or - did I miss something? :) > > Nathan. > > Tom Bird wrote: >> Morning, >> >> For those of you who remember last time, this is a different Solaris, >> different disk box and different host, but the epic nature of the >> fail >> is similar. >> >> The RAID box that is the 63T LUN has a hardware fault and has been >> crashing, up to now the box and host got restarted and both came up >> fine. However, just now as I have got replacement hardware in >> position >> and was ready to start copying, it went bang and my data has all >> gone. >> >> Ideas? >> >> >> root at cs4:~# zpool list >> NAME SIZE USED AVAIL CAP HEALTH ALTROOT >> content 62.5T 59.9T 2.63T 95% ONLINE - >> >> root at cs4:~# zpool status -v >> pool: content >> state: ONLINE >> status: One or more devices has experienced an error resulting in >> data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise >> restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> content ONLINE 0 0 32 >> c2t8d0 ONLINE 0 0 32 >> >> errors: Permanent errors have been detected in the following files: >> >> content:<0x0> >> content:<0x2c898> >> >> root at cs4:~# find /content >> /content >> root at cs4:~# (yes that really is it) >> >> root at cs4:~# uname -a >> SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 >> >> from format: >> 2. c2t8d0 <IFT-S12S-G1033-363H-62.76TB> >> /pci at 7c0/pci at 0/pci at 8/LSILogic,sas at 0/sd at 8,0 >> >> Also, "content" does not show in df output. >> >> thanks > > -- > /////////////////////////////////////////////////////////////// > // Nathan Kroenert nathan.kroenert at sun.com // > // Senior Systems Engineer Phone: +61 3 9869 6255 // > // Global Systems Engineering Fax: +61 3 9869 6288 // > // Level 7, 476 St. Kilda Road // > // Melbourne 3004 Victoria Australia // > /////////////////////////////////////////////////////////////// > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Toby Thain wrote:> On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: > >> Hey, Tom - >> >> Correct me if I''m wrong here, but it seems you are not allowing ZFS any >> sort of redundancy to manage.Every other file system out there runs fine on a single LUN, when things go wrong you have a fsck utility that patches it up and the world keeps on turning. I can''t find anywhere that will sell me a 48 drive SATA JBOD with all the drives presented on a single SAS channel, so running on a single giant LUN is a real world scenario that ZFS should be able to cope with, as this is how the hardware I am stuck with is arranged.> Which is particularly catastrophic when one''s ''content'' is organized as > a monolithic file, as it is here - unless, of course, you have some way > of scavenging that file based on internal structure.No, it''s not a monolithic file, the point I was making there is that no files are showing up.>>> root at cs4:~# find /content >>> /content >>> root at cs4:~# (yes that really is it)thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain
You can get a sort of redundancy by creating multiple filesystems with ''copies'' enabled on the ones that need some sort of self-healing in case of bad blocks. Is it possible to at least present your disks as several LUNs? If you must have an abstraction layer between ZFS and the block device, presenting ZFS with a plurality of abstracted devices would let you get some sort of parity...or is this device live and in production? I do think that, though ZFS doesn''t need fsck in the traditional sense, some sort of recovery tool would make storage admins even happier about using ZFS. cheers, Blake On Mon, Jan 19, 2009 at 4:09 AM, Tom Bird <tom at marmot.org.uk> wrote:> Toby Thain wrote: >> On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: >> >>> Hey, Tom - >>> >>> Correct me if I''m wrong here, but it seems you are not allowing ZFS any >>> sort of redundancy to manage. > > Every other file system out there runs fine on a single LUN, when things > go wrong you have a fsck utility that patches it up and the world keeps > on turning. > > I can''t find anywhere that will sell me a 48 drive SATA JBOD with all > the drives presented on a single SAS channel, so running on a single > giant LUN is a real world scenario that ZFS should be able to cope with, > as this is how the hardware I am stuck with is arranged. > >> Which is particularly catastrophic when one''s ''content'' is organized as >> a monolithic file, as it is here - unless, of course, you have some way >> of scavenging that file based on internal structure. > > No, it''s not a monolithic file, the point I was making there is that no > files are showing up. > >>>> root at cs4:~# find /content >>>> /content >>>> root at cs4:~# (yes that really is it) > > thanks > -- > Tom > > // www.portfast.co.uk -- internet services and consultancy > // hosting from 1.65 per domain > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
>>>>> "nk" == Nathan Kroenert <Nathan.Kroenert at Sun.COM> writes: >>>>> "b" == Blake <blake.irvin at gmail.com> writes:nk> I''m not sure how you can class it a ZFS fail when the Disk nk> subsystem has failed... The disk subsystem did not fail and lose all its contents. It just rebooted a few times. b> You can get a sort of redundancy by creating multiple b> filesystems with ''copies'' enabled on the ones that need some b> sort of self-healing in case of bad blocks. Won''t work here. The pool won''t import at all. The type of bad block fixing you''re talking about applies to cases where the pool imports, but ''zpool status'' reports files with bad blocks in them. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090119/784bd79f/attachment.bin>
Miles, that''s correct - I got muddled in the details of the thread. I''m not necessarily suggesting this, but is this an occasion when removing the zfs cache file located at /etc/zfs/zpool.cache might be an emergency workaround? Tom, please don''t try this until someone more expert replies to my question. cheers, Blake On Mon, Jan 19, 2009 at 1:43 PM, Miles Nordin <carton at ivy.net> wrote:> > b> You can get a sort of redundancy by creating multiple > b> filesystems with ''copies'' enabled on the ones that need some > b> sort of self-healing in case of bad blocks. > > Won''t work here. The pool won''t import at all. The type of bad block > fixing you''re talking about applies to cases where the pool imports, > but ''zpool status'' reports files with bad blocks in them. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
>>>>> "b" == Blake <blake.irvin at gmail.com> writes:b> removing the zfs cache file located at /etc/zfs/zpool.cache b> might be an emergency workaround? just the opposite. There seem to be fewer checks blocking the autoimport of pools listed in zpool.cache than on ''zpool import'' manual imports. I''d expect the reverse, for some forceable ''zpool import'' to accept pools that don''t autoimport, but at least Ross found zpool.cache could auto-import a pool with a missing slog, while ''zpool import'' tells you, recreate from backup. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090119/f271f2ce/attachment.bin>
On 19.01.09 12:09, Tom Bird wrote:> Toby Thain wrote: >> On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: >> >>> Hey, Tom - >>> >>> Correct me if I''m wrong here, but it seems you are not allowing ZFS any >>> sort of redundancy to manage. > > Every other file system out there runs fine on a single LUN, when things > go wrong you have a fsck utility that patches it up and the world keeps > on turning. > > I can''t find anywhere that will sell me a 48 drive SATA JBOD with all > the drives presented on a single SAS channel, so running on a single > giant LUN is a real world scenario that ZFS should be able to cope with, > as this is how the hardware I am stuck with is arranged. > >> Which is particularly catastrophic when one''s ''content'' is organized as >> a monolithic file, as it is here - unless, of course, you have some way >> of scavenging that file based on internal structure. > > No, it''s not a monolithic file, the point I was making there is that no > files are showing up. > >>>> root at cs4:~# find /content >>>> /content >>>> root at cs4:~# (yes that really is it)This issue (and previous one reported by Tom) has got some publicity recently - see here http://www.uknof.org.uk/uknof13/Bird-Redux.pdf So i feel like i need to provide a little bit more information about the outcome (sorry that it is delayed and not as full as previous one). First, it looked like this:> root at cs4:~# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > content 62.5T 59.9T 2.63T 95% ONLINE - > > root at cs4:~# zpool status -v > pool: content > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > content ONLINE 0 0 32 > c2t8d0 ONLINE 0 0 32 > > errors: Permanent errors have been detected in the following files: > > content:<0x0> > content:<0x2c898> >First permanent error means that root block of the filesystem named ''content'' was corrupted (all copies), so it was not possible to open it and access any content of that filesystem. Fortunately enough, there were not too much activity on the pool, so we decided to try previous states of the pool. I do not remember exact txg number we tried, but it was something like hundred txg back or so. We checked it with zdb and discovered that that state was more or less good - at least filesystem content was openable and it was possible to access its content, so we decided to reactivate that previous state. Pool imported fine and contents of ''content'' was there. Subsequent scrub did find some errors but I do not remember exactly how much. Tom may have exact number. Victor
On Jul 1, 2009, at 12:37, Victor Latushkin wrote:> This issue (and previous one reported by Tom) has got some publicity > recently - see here > > http://www.uknof.org.uk/uknof13/Bird-Redux.pdfJoyent also had issues a while back as well: http://tinyurl.com/ytyzs6 http://www.joyeur.com/2008/01/22/bingodisk-and-strongspace-what-happened A lot of people billed it as a ZFS issue, but it should be noted that because of all the checksuming going on, when you get back data you can be fairly sure that it hasn''t been corrupted.
It is a ZFS issue. My understanding is that ZFS has multiple copies of the uberblock, but only tries to use the most recent one on import, meaning that on rare occasions, it''s possible to loose access to the pool even though the vast majority of your data is fine. I believe there is work going on to create automatic recovery tools that will warn you of uberblock corruption, and attempt to automatically use an older copy, but I have no idea of the bug number nor status I''m afraid. -- This message posted from opensolaris.org
mhh, i think i`m afraid, too, as i also need to use zfs on a single, large lun. -- This message posted from opensolaris.org
Victor Latushkin wrote:> This issue (and previous one reported by Tom) has got some publicity > recently - see here > > http://www.uknof.org.uk/uknof13/Bird-Redux.pdf > > So i feel like i need to provide a little bit more information about the > outcome (sorry that it is delayed and not as full as previous one).Morning, Right, the PDF on there doesn''t really give the full story of the presentation, unfortunate as I see it seems to have got around a bit. In the actual presentation I wasn''t perhaps as harsh as it seems on the slides!> First permanent error means that root block of the filesystem named > ''content'' was corrupted (all copies), so it was not possible to open it > and access any content of that filesystem. > > Fortunately enough, there were not too much activity on the pool, so we > decided to try previous states of the pool. I do not remember exact txg > number we tried, but it was something like hundred txg back or so. We > checked it with zdb and discovered that that state was more or less good > - at least filesystem content was openable and it was possible to access > its content, so we decided to reactivate that previous state. Pool > imported fine and contents of ''content'' was there. Subsequent scrub did > find some errors but I do not remember exactly how much. Tom may have > exact number.I can''t remember how many errors the check found, however all the data copied off successfully, as far as we know. -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain
On Fri, August 14, 2009 09:02, Tom Bird wrote:> I can''t remember how many errors the check found, however all the data > copied off successfully, as far as we know.I would think that you''d be fairly confident of the integrity of the data since everything would be checksummed. Joyent also had a fairly public issue with ZFS a while ago: http://tinyurl.com/ptt5zp http://www.joyent.com/joyeurblog/2008/01/22/bingodisk-and-strongspace-what-happened/ http://tinyurl.com/qlzsw6 http://www.joyent.com/joyeurblog/2008/01/24/new-podcast-quad-core-episode-2/
Yup, that one was down to a known (and fixed) bug though, so it isn''t the normal story of ZFS problems. -- This message posted from opensolaris.org
Ross wrote:> Yup, that one was down to a known (and fixed) bug though, so it isn''t > the normal story of ZFS problems.Got a bug ID or anything for that, just out of interest? As an update on my storage situation, I''ve got some JBODs now, see how that goes. -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain