Hello, The following is not a request for help or bug report as such, I just want to put the information out there in case it helps other people by encouraging active checking for silent data corruption (also happens to be a good "saved yet again by ZFS" story). I was moving some disks to a machine that didn't have SATA ports for them, so I took one of the TX4:s I had left over since some previous desperate attempts to get working SATA in another machine. At first, things *seemed* good. No DMA timeouts or anything like that. Streamed through 4 250 gig drives no problem; ran a bunch of rsyncs of ports trees during the night. However, once I started dd:ing large files and reading them back in I started getting I/O errors from ZFS, because of checksum mismatches. Turns out all the drives connected to the TX4 in the raidz2 were generating checksum errors (the one that was not connected to the TX4 was fine). Write a 2-3 gig file of zeroes -> handful of checksum mismatches on subsequent scrub. Since then I have now tried two distinct TX4 cards (but only in one PCI slot). Both suffer from the same problem. Amazingly a SiI 3114 *does* seem to work in the same PCI slot - no corruption, and no DMA timeouts and whatnot that I was expecting from a SiI card. This was on amd64, RELENG_7. Same SATA cables used on all drives. Drives on TX4 were also in a Supermicro hotswap enclosure, which may or may not be related (but again, no problem with SiI). -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071021/7036b16a/attachment.pgp
On 21/10/2007, Peter Schuller <peter.schuller@infidyne.com> wrote:> However, once I started dd:ing large files and reading them back in I > started getting I/O errors from ZFS, because of checksum > mismatches. Turns out all the drives connected to the TX4 in the > raidz2 were generating checksum errors (the one that was not connected > to the TX4 was fine). Write a 2-3 gig file of zeroes -> handful of > checksum mismatches on subsequent scrub.Is there some nice utility floating about to do this in userspace on uninitialised/raw devices? Thanks, Adrian -- Adrian Chadd - adrian@freebsd.org
On Sun, Oct 21, 2007 at 09:19:35AM +0200, Peter Schuller wrote:> Hello, > > Since then I have now tried two distinct TX4 cards (but only in one > PCI slot). Both suffer from the same problem. Amazingly a SiI 3114 > *does* seem to work in the same PCI slot - no corruption, and no DMA > timeouts and whatnot that I was expecting from a SiI card. > > This was on amd64, RELENG_7. Same SATA cables used on all > drives. Drives on TX4 were also in a Supermicro hotswap enclosure, > which may or may not be related (but again, no problem with SiI).I have experienced the same kind of data corruption as you on both 6.2-RELEASE and Ubuntu 7.04 on two different machines, one P3 and one P4. The card seems to be doing naughty things to the PCI bus under load; your dmesg ought to be full of PCI timeouts. If you grep the old mailing lists for the PCI timeout errors that are produced, you'll find some messages indicating that the hardware is flawed. This seems to be worked around in the driver for Windows, since the same card, disks and machines work flawlessly on Windows 2003. -- Lars Viklund ------------------- To make it is hell. To fail is divine.
Hello. I was experiencing the same problem with TX4 on both Linux and FreeBSD. It was determined that the root cause is a hardware bug in controller. Patch that implements a workaround inspired by vendor-supplied driver: http://www.spinics.net/lists/linux-ide/msg15858.html I have not yet had enough time to make sense of FreeBSD ata subsystem and patch it the same way. From first glances it seems like we need to implement something like ata_marvell_dmasetprd() (file dev/ata/ata-chipset.c) -- ./lxnt