I'm posting this to the freebsd-stable and freebsd-fs mailing lists.
Followups
should probably happen on freebsd-fs.
I have a ZFS pool configured as:
zpool create data raidz da1 da2 da3 da4 da5 raidz da6 da7 da8 da9 da10
raidz da11 da12 da13 da14 da15 spare da16 log da0
where da1-16 are WD2003FYYS drives (2TB RE4) and da0 is a 256GB PCI-Express
SSD (name omitted to protect the guilty).
The SSD has been dropping offline randomly - it seems that one or more flash
modules pop out of their sockets and need to be re-seated frequently for some
reason.
The most recent time it did that, I replaced the SSD with another one (for some
reason, the manufacturer ties the flash modules to a particular controller, so
just moving the modules results in an offline SSD and inability to manage it
due to "license limits exceeded" or some such nonsense).
ZFS wasn't happy with the log device being changed, and reported it as
corrupted, with the suggested corrective action being to "zpool clear"
it. I
did that, and then did a "zpool replace data da0 da0" and it claimed
to
successfully resilver it. I then did a "zpool scrub" and the scrub
completed
with no errors. So far, so good.
However, any attempt to write to the array results in a near-immediate panic:
panic: solaris assert: sm->sm_spare + size <= sm->sm_size, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line: 93 cpuid=2
(Screenshot at http://www.tmk.com/transient/zfs-panic.png in case I mis-typed
something).
This is repeatable across reboot / scrub / test cycles. System is 8-STABLE as
of Fri Nov 5 19:08:35 EDT 2010, on-disk pool is version 4/15, same as the
kernel.
I know that certain operations on log devices aren't supported until pool
version 19 or thereabouts, but the error messages and zpool command results
gave the impression that what I was doing was supported and worked (when it
didn't). If this is truly a "you can't do that in pool version
15", perhaps a
warning could be added so users don't get fooled into thinking it worked?
I can give a developer remote console / root access to the box if that would
help. I have a couple days before I will need to nuke the pool and restore it
from backups.
Terry Kennedy http://www.tmk.com
terry@tmk.com New York, NY USA