:Hi everyone,
:
:I'm wondering if the problems described in the following link have been
:resolved:
:
:http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html
:
:I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them
:are experiencing the behavior.
:
:The problem only happens with extreme disk activity. The box becomes
:unresponsive (can not SSH etc). Keyboard input is displayed on the
:console, but the commands are not accepted.
:
:Is there anything I can do to either figure this out, or work around it?
:
:Steve
If you are getting DMA timeouts, go to this URL:
http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting
Then I would suggest going into /usr/src/sys/dev/ata (I think, on
FreeBSD), locate all instances where request->timeout is set to 5,
and change them all to 10.
cd /usr/src/sys/dev/ata
fgrep 'request->timeout' *.c
... change all assignments of 5 to 10 ...
Try that first. If it helps then it is a known issue. Basically
a combination of the on-disk write cache and possible ECC corrections,
remappings, or excessive remapped sectors can cause the drive to take
much longer then normal to complete a request. The default 5-second
timeout is insufficient.
If it does help, post confirmation to prod the FBsd developers to
change the timeouts.
--
If you are NOT getting DMA timeouts then the ZFS lockups may be due
to buffer/memory deadlocks. ZFS has knobs for adjusting its memory
footprint size. Lowering the footprint ought to solve (most of) those
issues. It's actually somewhat of a hard issue to solve. Filesystems
like UFS aren't complex enough to require the sort of dynamic memory
allocations deep in the filesystem that ZFS and HAMMER need to do.
-Matt
Matthew Dillon
<dillon@backplane.com>