Hi. I''ve almost all file system functions working. I started to run some heavy file system regression tests. They work. fsx wasn''t able to break my port, but the test you can find here: http://people.freebsd.org/~kan/fsstress.tar.gz broke it. My kernel panics on this assertion (zfs_dir.c): 749: mutex_exit(&dzp->z_lock); 750: 751: error = zap_remove(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, tx); 752-> ASSERT(error == 0); 753: 754: if (reaped_ptr != NULL) zap_remove() returns ENOENT, which is returned because mze_find() returns NULL. I changed this assertion to printf and I don''t see any other problems with this test-suite - ZFS is stable. What I''m looking for is confirmation, that this problem doesn''t exist on Solaris. To verify this someone needs to compile ZFS with debug and run this test: # zpool create tank ... # fsstress -d /tank/ -n 10000 -p 16 This will tell me if this is mine or ZFS''s insuffiecient synchronization somewhere. Thanks in advance! -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20060825/fa1fbfa4/attachment.bin>
Pawel Jakub Dawidek wrote:> Hi. > > I''ve almost all file system functions working. > > I started to run some heavy file system regression tests. They work. fsx > wasn''t able to break my port, but the test you can find here: > > http://people.freebsd.org/~kan/fsstress.tar.gz > > broke it. My kernel panics on this assertion (zfs_dir.c): > > 749: mutex_exit(&dzp->z_lock); > 750: > 751: error = zap_remove(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, tx); > 752-> ASSERT(error == 0); > 753: > 754: if (reaped_ptr != NULL) > > zap_remove() returns ENOENT, which is returned because mze_find() > returns NULL. I changed this assertion to printf and I don''t see any > other problems with this test-suite - ZFS is stable. > > What I''m looking for is confirmation, that this problem doesn''t exist on > Solaris. To verify this someone needs to compile ZFS with debug and run > this test: > > # zpool create tank ... > # fsstress -d /tank/ -n 10000 -p 16 > > This will tell me if this is mine or ZFS''s insuffiecient synchronization > somewhere. > > Thanks in advance! >I tried this on the following systems without tripping the ASSERT. 2 processor opteron 2 processor sparc 4 procssor intel -Mark
On Fri, Aug 25, 2006 at 08:33:32AM -0600, Mark Shellenbaum wrote:> Pawel Jakub Dawidek wrote: > >Hi. > >I''ve almost all file system functions working. > >I started to run some heavy file system regression tests. They work. fsx > >wasn''t able to break my port, but the test you can find here: > > http://people.freebsd.org/~kan/fsstress.tar.gz > >broke it. My kernel panics on this assertion (zfs_dir.c): > >749: mutex_exit(&dzp->z_lock); > >750: > >751: error = zap_remove(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, tx); > >752-> ASSERT(error == 0); > >753: > >754: if (reaped_ptr != NULL) > >zap_remove() returns ENOENT, which is returned because mze_find() > >returns NULL. I changed this assertion to printf and I don''t see any > >other problems with this test-suite - ZFS is stable. > >What I''m looking for is confirmation, that this problem doesn''t exist on > >Solaris. To verify this someone needs to compile ZFS with debug and run > >this test: > > # zpool create tank ... > > # fsstress -d /tank/ -n 10000 -p 16 > >This will tell me if this is mine or ZFS''s insuffiecient synchronization > >somewhere. > >Thanks in advance! > > I tried this on the following systems without tripping the ASSERT. > 2 processor opteron > 2 processor sparc > 4 procssor intelThank you. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20060825/6a3d072e/attachment.bin>
Pawel Jakub Dawidek wrote:> Hi. > > I''ve almost all file system functions working. > > I started to run some heavy file system regression tests. They work. fsx > wasn''t able to break my port, but the test you can find here: > > http://people.freebsd.org/~kan/fsstress.tar.gz > > broke it. My kernel panics on this assertion (zfs_dir.c): > > 749: mutex_exit(&dzp->z_lock); > 750: > 751: error = zap_remove(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, tx); > 752-> ASSERT(error == 0); > 753: > 754: if (reaped_ptr != NULL) > > zap_remove() returns ENOENT, which is returned because mze_find() > returns NULL. I changed this assertion to printf and I don''t see any > other problems with this test-suite - ZFS is stable.Did you figure out what was causing this? One thing that you could do to try an narrow down the bug is make sure that the microzap''s in-core avl tree (zap_avl) agrees with the on-disk structure (mz_chunk[]). If the entry is missing in both, then the zap is probably working right and the problem is likely in the zpl. Otherwise it''s definitely in the zap. --matt
On Tue, Aug 29, 2006 at 10:38:31AM -0700, Matthew Ahrens wrote:> Pawel Jakub Dawidek wrote: > >Hi. > >I''ve almost all file system functions working. > >I started to run some heavy file system regression tests. They work. fsx > >wasn''t able to break my port, but the test you can find here: > > http://people.freebsd.org/~kan/fsstress.tar.gz > >broke it. My kernel panics on this assertion (zfs_dir.c): > >749: mutex_exit(&dzp->z_lock); > >750: > >751: error = zap_remove(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, tx); > >752-> ASSERT(error == 0); > >753: > >754: if (reaped_ptr != NULL) > >zap_remove() returns ENOENT, which is returned because mze_find() > >returns NULL. I changed this assertion to printf and I don''t see any > >other problems with this test-suite - ZFS is stable. > > Did you figure out what was causing this? One thing that you could do to try an narrow down the bug is make sure that the > microzap''s in-core avl tree (zap_avl) agrees with the on-disk structure (mz_chunk[]). If the entry is missing in both, then > the zap is probably working right and the problem is likely in the zpl. Otherwise it''s definitely in the zap.I forgot to answer. Yes, it was my bug and I already fixed it. Thanks for the help. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20061207/818b3aa9/attachment.bin>