The setup is: # zfs status pool: home state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM home ONLINE 0 0 0 c0t0d0s3 ONLINE 0 0 0 pool: mypool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 c2d1 ONLINE 0 0 0 # zfs list NAME USED AVAIL REFER MOUNTPOINT home 546K 7.19G 99.0K /export/home home/derek 106K 7.19G 106K /export/home/derek mypool 2.10G 98.5G 2.10G /mypool # uname -a SunOS unknown 5.11 snv_30 i86pc i386 i86pc I''m not sure what caused this but the machine was just idling and all of a sudden it rebooted. The messages file shows: unknown savecore: [ID 570001 auth.error] reboot after panic: ZFS: I/O failure (write on /dev/dsk/c2d1 off 10434000: zio eb532200 [L0 unallocated] vdev=2 offset=10034000 size=4000L/4000P/4000A fletcher4 uncompressed LE contiguous birth=53025 fill=0 cksum=127d0e Where do I send the core dumps? Derek This message posted from opensolaris.org
The setup is: # zfs status pool: home state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM home ONLINE 0 0 0 c0t0d0s3 ONLINE 0 0 0 pool: mypool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 c1d0 ONLINE 0 0 0 c2d0 ONLINE 0 0 0 c2d1 ONLINE 0 0 0 # zfs list NAME USED AVAIL REFER MOUNTPOINT home 546K 7.19G 99.0K /export/home home/derek 106K 7.19G 106K /export/home/derek mypool 2.10G 98.5G 2.10G /mypool # uname -a SunOS unknown 5.11 snv_30 i86pc i386 i86pc I''m not sure what caused this because the machine was just idling and all of a sudden it rebooted. The messages file shows: unknown savecore: [ID 570001 auth.error] reboot after panic: ZFS: I/O failure (write on /dev/dsk/c2d1 off 10434000: zio eb532200 [L0 unallocated] vdev=2 offset=10034000 size=4000L/4000P/4000A fletcher4 uncompressed LE contiguous birth=53025 fill=0 cksum=127d0e Where do I send the core dumps? Derek This message posted from opensolaris.org
On Thu, Feb 02, 2006 at 06:47:57AM -0800, Derek Crudgington wrote:> > unknown savecore: [ID 570001 auth.error] reboot after panic: ZFS: I/O > failure (write on /dev/dsk/c2d1 off 10434000: zio eb532200 [L0^^^^^^^^^^^^^^^^^^^^^^> unallocated] vdev=2 offset=10034000 size=4000L/4000P/4000A fletcher4 > uncompressed LE contiguous birth=53025 fill=0 cksum=127d0e >This is a failed write to the drive. Something is probably wrong with the hardware; write failures are a pretty clear indication. I would recommend running your pool with RAID-Z and seeing if there are any consistent level of errors from the drives - at least ZFS will be able to survive them. While theoretically possible, it''s highly unlikely that this is a manifestation of some software problem. Does syslog show any messages from the underlying driver? - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Nothing showed up in syslog. I will try the raidz, thanks. Even if the hard drive is bad, the machine should panic and reboot? This message posted from opensolaris.org
On Thu, Feb 02, 2006 at 06:04:44PM -0800, Derek Crudgington wrote:> Nothing showed up in syslog. I will try the raidz, thanks. > > Even if the hard drive is bad, the machine should panic and reboot? >In a replicated config, ZFS will currently survive both read and write failures. In a non-replicated config, we should be able to survive all read failures. Currently ZFS panics. We should have this fixed in the next month. Note that losing metadata can destroy arbitrarily large portions of your filesystem - we will also be integrating a fix to implicitly mirror metadata to decrease the possibility of this happening. For writes, things get much trickier. By the time the error occurs, we are mid-transaction and have long since lost any association with a particular file or system call. Determining how to fail in this case is difficult at best. If we''re not careful, we could end up with corrupted application state. We have some ideas in this area, but in the mean time it is safer to just panic. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Hello Eric, Friday, February 3, 2006, 4:10:31 AM, you wrote: ES> On Thu, Feb 02, 2006 at 06:04:44PM -0800, Derek Crudgington wrote:>> Nothing showed up in syslog. I will try the raidz, thanks. >> >> Even if the hard drive is bad, the machine should panic and reboot? >>ES> In a replicated config, ZFS will currently survive both read and write ES> failures. ES> In a non-replicated config, we should be able to survive all read ES> failures. Currently ZFS panics. We should have this fixed in the next ES> month. Note that losing metadata can destroy arbitrarily large portions ES> of your filesystem - we will also be integrating a fix to implicitly ES> mirror metadata to decrease the possibility of this happening. ES> For writes, things get much trickier. By the time the error occurs, we ES> are mid-transaction and have long since lost any association with a ES> particular file or system call. Determining how to fail in this case is ES> difficult at best. If we''re not careful, we could end up with corrupted ES> application state. We have some ideas in this area, but in the mean ES> time it is safer to just panic. Similar to UFS there should be an option to panick, lock, umount, ... if unrecoverable error occurs on given dataset/pool. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
> On Thu, Feb 02, 2006 at 06:47:57AM -0800, Derek Crudgington wrote: > > > > unknown savecore: [ID 570001 auth.error] reboot after panic: ZFS: I/O > > failure (write on /dev/dsk/c2d1 off 10434000: zio eb532200 [L0 > ^^^^^^^^^^^^^^^^^^^^^^> This is a failed write to the drive. Something is probably wrong with > the hardware; write failures are a pretty clear indication.Btw. why exactly has zfs to panic the machine on write errors? Does is panic on every failed write, or only when the zfs filesystem structure would be / is corrupted due to the failed write? What about users that are going to use zfs on hot removable storage devices (USB / Firewire)? With these devices is can happen quite easily that a user doesn''t pay attention and hot-removes a device that is still in use by zfs, and the result is a panic reboot because of zfs write i/o errors. This message posted from opensolaris.org
On Fri, 2006-02-03 at 03:10, Eric Schrock wrote:> For writes, things get much trickier. By the time the error occurs, we > are mid-transaction and have long since lost any association with a > particular file or system call. Determining how to fail in this case is > difficult at best. If we''re not careful, we could end up with corrupted > application state. We have some ideas in this area, but in the mean > time it is safer to just panic.Why not just retry the write somewhere else on the disk? After all, unlike (say) ufs you don''t care where on the disk the data ends up, only that you know where it is, and if you write it somewhere else then all you have to do is modify the subsequent writes to point to the now revised location. Or am I missing something? -- -Peter Tribble L.I.S., University of Hertfordshire - http://www.herts.ac.uk/ http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>On Fri, 2006-02-03 at 03:10, Eric Schrock wrote: > >> For writes, things get much trickier. By the time the error occurs, we >> are mid-transaction and have long since lost any association with a >> particular file or system call. Determining how to fail in this case is >> difficult at best. If we''re not careful, we could end up with corrupted >> application state. We have some ideas in this area, but in the mean >> time it is safer to just panic. > >Why not just retry the write somewhere else on the disk? After all, >unlike (say) ufs you don''t care where on the disk the data ends up, >only that you know where it is, and if you write it somewhere else >then all you have to do is modify the subsequent writes to point >to the now revised location. Or am I missing something?Perhaps that information is no longer known? If the association with file is lost then perhaps also the association with the metadata; only the partial ordering of writes is still known? It seems, though, that the error must be able to permeate upwards and either it should be retried elsewhere or EIO must be returned. I think the "panics" on read and write failures are just properties of a beta implementation and they should be removed from the final product. Because whatever the error, panic is never the right answer. (Except when the error wasn''t anticipated; but read and write failures should be anticipated) Casper
On Fri, Feb 03, 2006 at 12:53:38PM +0000, Peter Tribble wrote:> Why not just retry the write somewhere else on the disk? After all, > unlike (say) ufs you don''t care where on the disk the data ends up, > only that you know where it is, and if you write it somewhere else > then all you have to do is modify the subsequent writes to point > to the now revised location. Or am I missing something?As Casper has said, this is not an intrinsic property of ZFS, but an artifact of the current state of implementation. What you''ve said is pretty much what we plan on doing. It''s made trickier by the fact that choosing another location on disk changes that block''s bp. This change has to propogate up the chain to the parent blocks, which may have already been written to disk. Since this all happens within the context of syncing a transaction group, we can re-write those parent blocks in-place. It''s just a bit of work to make sure things don''t cascade out of control. In the case of a user pulling out a USB drive, we could just drop everything on the floor and lock the filesystem. Again, we know what we would need to do, and just need to implement it. Infinite work. --Bill
On Fri, Feb 03, 2006 at 12:53:38PM +0000, Peter Tribble wrote:> > Why not just retry the write somewhere else on the disk? After all, > unlike (say) ufs you don''t care where on the disk the data ends up, > only that you know where it is, and if you write it somewhere else > then all you have to do is modify the subsequent writes to point > to the now revised location. Or am I missing something? >Yes, this is one of the ideas that I mentioned - the ability to reallocate failed writes (preferably to another vdev entirely. Of course, it''s as never as simple as it seems. We will take a look at this after we get the read path working to our liking. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Fri, Feb 03, 2006 at 03:41:49AM -0800, J?rgen Keil wrote:> > Btw. why exactly has zfs to panic the machine on write errors? Does is > panic on every failed write, or only when the zfs filesystem structure would > be / is corrupted due to the failed write?See the previous mail.> What about users that are going to use zfs on hot removable storage devices > (USB / Firewire)? With these devices is can happen quite easily that a user > doesn''t pay attention and hot-removes a device that is still in use by zfs, and > the result is a panic reboot because of zfs write i/o errors.Yes, we are aware of this issue with removable storage. They present interesting problems (for example, reallocating writes won''t save your data on a single-disk pool) that we need to address. Once again, we have some ideas about what to do in this space, but we''re not going to be able to tackle them until we get done with the higher priority items (metadata replication and tolerance of read failures). - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Fri, 2006-02-03 at 17:11, Eric Schrock wrote:> On Fri, Feb 03, 2006 at 12:53:38PM +0000, Peter Tribble wrote: > > > > Why not just retry the write somewhere else on the disk? After all, > > unlike (say) ufs you don''t care where on the disk the data ends up, > > only that you know where it is, and if you write it somewhere else > > then all you have to do is modify the subsequent writes to point > > to the now revised location. Or am I missing something? > > > > Yes, this is one of the ideas that I mentioned - the ability to > reallocate failed writes (preferably to another vdev entirely.Thanks! One of my concerns is that the current panic-at-first-sign-of-trouble approach might make life exciting on workstations and horizontally-scaled farms, where you only have the one disk available anyway. While I would want to know that the disk was going bad, I would rather the machine stayed up for me to fix the disk at a time of my choosing. -- -Peter Tribble L.I.S., University of Hertfordshire - http://www.herts.ac.uk/ http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/