On Sat, Feb 18, 2012 at 10:10 PM, Ask Bj?rn Hansen <ask@develooper.com>
wrote:> Hi everyone,
>
> We're recycling an old database server with room for 16 disks as a
backup server (our old database servers had 12-20 15k disks; the new ones one or
two SSDs and they're faster).
>
> We have a box running FreeBSD 8.2 with 7 disks in a ZFS raidz2 (and a
spare). ?It's using an older 3ware card with all the disks (2TB WD green
"ears" ones) setup as a "single" unit on the 3ware
controller and though slow is basically working great. ?We have a small program
to smartly purge old snapshots that I wrote after a year and tens of thousands
of snapshots: https://github.com/abh/zfs-snapshot-cleaner
>
> The new box is running 9.0 with a 3ware 9690SA-4I4E card with the latest
firmware (4.10.00.024). ?We're using Seagate 3TB barracuda disks (big and
cheap; good for backups).
>
> Now for the problem: When running bonnie++ we get a few ZFS checksum errors
and (weirder) we get this error from bonnie:
>
> "Can't read a full block, only got 8193 bytes."
That's probably just a side effect of ZFS checksum errors. ZFS will
happily read the file until it hits a record with checksum. If
redundant info is available (raidz or mirror), ZFS will attempt to
recover your data. If there's no redundancy you will get read error.
If you do "zpool status -v" you should see list of files affected by
corruption.
>
> This seems to only be when testing a single ZFS disk or a UFS partition.
?Testing a raidz1 we just get checksum errors noted in zpool status, but no
errors reading (though read speeds are ~10MB/second across four disks -- writing
sequentially was ~230MB/second).
>
> Any ideas where to start look?
You need to figure out why you're getting checksum errors. Alas
there's probably no easy way to troubleshoot it. The issue could be
hardware related and possible culprits may include bad RAM, bad SATA
cables, quirks of particular firmware revision on disk controller
and/or hard drive.
> Our best guess is that the 3ware controller can't play nicely with the
disks; we're planning to try some older/smaller disks on Monday and then
trying the same system and disks with Linux to see if the 3ware driver there
works differently.
--Artem