ZFS in FreeBSD lacks at least one major feature from the Solaris version: hot spares. There is a PR open at http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been any motion/thoughts posted on it since its creation almost one year ago. I'm aware that on Solaris, hot spare replacement is handled by a few Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug into the Solaris FMA (Fault Management Architecture). Have there been any thoughts on porting these over or getting something similar running within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now committed, it would be nice to have automatic replacement of hot spares with a future hot-replacement of the failed drive. On the other side, I'd be interested in hearing if anyone has had success in rolling their own scripted solution: i.e. something which polls 'zpool status' looking for failed drives and performing hot-spare replacements automatically. Thanks, Steve Polyack
On Mon, Mar 08, 2010 at 01:06:10PM -0500, Steve Polyack wrote:> ZFS in FreeBSD lacks at least one major feature from the Solaris > version: hot spares. There is a PR open at > http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been > any motion/thoughts posted on it since its creation almost one year ago. > > I'm aware that on Solaris, hot spare replacement is handled by a few > Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug > into the Solaris FMA (Fault Management Architecture). Have there been > any thoughts on porting these over or getting something similar running > within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now > committed, it would be nice to have automatic replacement of hot spares > with a future hot-replacement of the failed drive. > > On the other side, I'd be interested in hearing if anyone has had > success in rolling their own scripted solution: i.e. something which > polls 'zpool status' looking for failed drives and performing hot-spare > replacements automatically.Currently FreeBSD's ZFS sends various events to devd. It should be possible to implement some scripts (or maybe reuse zfs-retire/zfs-diagnose?) to perform 'zpool replace' when disk disappears, etc. This shouldn't be very hard modulo bugs in FreeBSD/ZFS as this functionality, because unused, wasn't tested. -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100309/a9568819/attachment.pgp
On 03/09/10 05:11, Ivan Voras wrote:> On 03/08/10 19:06, Steve Polyack wrote: >> ZFS in FreeBSD lacks at least one major feature from the Solaris >> version: hot spares. There is a PR open at >> http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been >> any motion/thoughts posted on it since its creation almost one year ago. >> >> I'm aware that on Solaris, hot spare replacement is handled by a few >> Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug >> into the Solaris FMA (Fault Management Architecture). Have there been >> any thoughts on porting these over or getting something similar running >> within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now >> committed, it would be nice to have automatic replacement of hot spares >> with a future hot-replacement of the failed drive. >> >> On the other side, I'd be interested in hearing if anyone has had >> success in rolling their own scripted solution: i.e. something which >> polls 'zpool status' looking for failed drives and performing hot-spare >> replacements automatically. > > You don't have to exactly poll it. See /etc/devd.conf: > > # Sample ZFS problem reports handling. > notify 10 { > match "system" "ZFS"; > match "type" "zpool"; > action "logger -p kern.err 'ZFS: failed to load zpool $pool'"; > }; > > notify 10 { > match "system" "ZFS"; > match "type" "vdev"; > action "logger -p kern.err 'ZFS: vdev failure, zpool=$pool > type=$type'"; > }; > > notify 10 { > match "system" "ZFS"; > match "type" "data"; > action "logger -p kern.warn 'ZFS: zpool I/O failure, > zpool=$pool error=$zio_err'"; > }; > > notify 10 { > match "system" "ZFS"; > match "type" "io"; > action "logger -p kern.warn 'ZFS: vdev I/O failure, > zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size > error=$zio_err'"; > }; > > notify 10 { > match "system" "ZFS"; > match "type" "checksum"; > action "logger -p kern.warn 'ZFS: checksum mismatch, > zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; > }; > > I don't really know if these notifications actually work since I don't > have hot-plug test machines, but if they do, this looks like a decent > starting point. >Thanks for the suggestions. I received a similar one from someone else. If I get time to build a ZFS lab machine then I will certainly try these out and provide feedback on how they work.