On Wed, May 24, 2006 at 09:41:53AM +0100, Darren J Moffat
wrote:> Bill Moore wrote:
> >On Tue, May 23, 2006 at 06:10:11PM +0100, Darren J Moffat wrote:
> >>Should I be worried that ztest has errors showing up in the cksum
column ?
> >>
> >>This is on snv_40 bits as well as zfs-crypto bits. There are no
read or
> >>write errors though.
> >
> >You should not be worried. As part of its work, ztest will scribble
> >random garbage over disks (assuming there''s a redundant
configuration)
> >in order to test self-healing data. It should never destroy enough
data
> >to induce a read failure, though.
>
> I thought that is what was going on but I just wanted to be sure.
>
> Is there a global "PASS/FAIL" from ztest or do I have to examine
the
> output and check there were no read/write errors ? In other words other
> than running to completion without dumping core how do I know that I
> have passed the tests that ztest runs ?
ztest will fail if we:
- blow an assertion or fail an I/O we shouldn''t
- corrupt on-disk data
- leak blocks
In all cases, either ztest will die, or zdb (which is run after each
cycle) will die. If ztest completes, then you''ve passed. There are
some subtle bugs (such as incomplete resilver) that may slip by in most
cases, but eventually we should die somewhere. If you can run ztest for
24 hours (Jeff and Bill have a tool which will run it succession with
randomized options) then you''re safe from a DMU/SPA perspective. There
may (and likely will) be additional bugs at the ZPL layer, but the ZFS
test team can help once you get to that point with kernel-level stress
tests.
- Eric
--
Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock