Hi all !
I got a UniProcessor AMD64 box, with 512 MB ram with 2 ZFS pools as a 
home-NAS.
I got some IO issues since I moved from 7.2 to 8.0.
With a GENERIC kernel (or a stripped down one),  during high IO activity 
(as a make buildword can cause), I encounter random hangs or deadlocks.
top show system CPU usage at 99%, the most CPU using process being 
[zfskern] ( {txg_thread_enter} if I switch to thread view).
The box still respond to ping. Current processes can still run, but I 
can't run new ones.
Sometimes, I can return to normal by Ctrl-C-ing the buildworld (or other 
operation), sometimes I can't, I got to reboot the box.
The Issue seemed to become less frequent with 8.0-stable instead of 
8.0-RELEASE, but still present (I get approximately 75% chance of hang 
with a buildworld).
I got the hang with Prefetch enabled or disabled. Idem for ZIL.
I tried to enable kernel dumps, but the box hangs saving the dump (root 
is on ZFS) or when starting kdbg on it.
I recompiled kernel with SCHED_4BSD, and it seems I can't reproduce the 
hang.
What do you think ?
Did I misconfigured something ?
cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:unsafe/root"
vm.kmem_size="512M"
vm.kmem_size_max="512M"
vfs.zfs.arc_max="100M"
vfs.zfs.vdev.cache.size="10M"
vfs.zfs.prefetch_disable="0"
vfs.zfs.zil_disable="1"
[carenath] ~> zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad6p1   ONLINE       0     0     0
            ad10p1  ONLINE       0     0     0
            ad8p1   ONLINE       0     0     0
            ad4p1   ONLINE       0     0     0
errors: No known data errors
  pool: unsafe
 state: ONLINE
 scrub: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        unsafe      ONLINE       0     0     0
          ad0p3     ONLINE       0     0     0
errors: No known data errors
On Dec 16, 2009, at 11:04 AM, Arnaud Houdelette wrote:> Hi all ! > I got a UniProcessor AMD64 box, with 512 MB ram with 2 ZFS pools as a home-NAS. > > I got some IO issues since I moved from 7.2 to 8.0. > With a GENERIC kernel (or a stripped down one), during high IO activity (as a make buildword can cause), I encounter random hangs or deadlocks. > top show system CPU usage at 99%, the most CPU using process being [zfskern] ( {txg_thread_enter} if I switch to thread view). > The box still respond to ping. Current processes can still run, but I can't run new ones. > Sometimes, I can return to normal by Ctrl-C-ing the buildworld (or other operation), sometimes I can't, I got to reboot the box. > > The Issue seemed to become less frequent with 8.0-stable instead of 8.0-RELEASE, but still present (I get approximately 75% chance of hang with a buildworld). > I got the hang with Prefetch enabled or disabled. Idem for ZIL. > > I tried to enable kernel dumps, but the box hangs saving the dump (root is on ZFS) or when starting kdbg on it. > I recompiled kernel with SCHED_4BSD, and it seems I can't reproduce the hang. > > What do you think ? > Did I misconfigured something ?This sounds similar to something I ran into on CURRENT last year: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832196+0+archive/2009/freebsd-current/20090322.freebsd-current The immediate problem was a priority inversion problem between the txg_thread_enter threads and the spa_zio threads. This should be solved (or at least mitigated) on 8.0 now that these threads have explicit priorities set. Can you check to see what priorities these threads are at on your machine? They should have priorities something like -8 for txg_thread_enter and -16 for spa_zio. - Ben