thr3ads.net - zfs discuss - [zfs-discuss] zfs boot error recovery [May 2007]

If this information is useful, please help other people find it:
Share via:

Jakob Praher

2007-May-31 22:27 UTC

[zfs-discuss] zfs boot error recovery

hi all,

i would like to ask some questions regarding best practices for zfs
recovery if disk errors occur.

currently i have zfs boot (nv62) and the following setup:

2 si3224 controllers (each 4 sata disks)
8 sata disks, same size, same type

i have two pools:
a) rootpool
b) datapool

the rootpool is a mirrored pool, where every disk has a slice (the s0,
which is 5 % of the whole disk) and this is devoted to the rootpool,
just for mirroring.

the rest of the disk (s1) is added to the datapool which is raidz.

my idea is that if any disk is corrupt i am still be able to boot.

now I have some questions:

a) if i want to boot from every disk in case of error, i have to setup
grub on every disk, such that if the controller sets this disk as the
booting, the rootpool is able to be loaded from that.

b) what is the best way to as fast as possible replace a disk.
adding a disk as hotspare for the raidz is a good idea. but i also would
like to replace the disk during runtime as simple as possible.

the problem is that for the root pool the disks are labeled (the slices
thingy). So I cannot simply  detach the volumes and replace the disk and
attach them again, but I have to format the disk such that the slicing
exists. Is there some clever way to automatically re-label a replacement
disk?

c) si 3224 related question: is it possible to simply hot swap the disk
(i have the disks in special hot-swappable units, but have no experience
in hotswapping under solaris, such that i want to have some echo).

d) do you have best practices for systems like that above? what are the
best resources on the web for learning about monitoring the health of
the zfs system (like email notifications in case of disk failures...)

thannks in advance
-- Jakob

Will Murnane

2007-Jun-01 04:31 UTC

head link

[zfs-discuss] zfs boot error recovery

On 5/31/07, Jakob Praher <jp at hapra.at> wrote:> c) si 3224 related question: is it possible to simply hot swap the disk
> (i have the disks in special hot-swappable units, but have no experience
> in hotswapping under solaris, such that i want to have some echo).As it happens, I just happen to have tried this - albeit on a
different card, it went well.  I have a Marvell 88SX6081 controller,
and removing a disk caused no undue panic (as far as I can tell).
Adding a new disk, the kernel detected it immediately and then I had
to run "cfgadm -cconfigure scsi0/1" or something like that.  Then it
Just Worked.  I don''t know if this is recommended or not... but it
worked for me.

Will

Jakob Praher

2007-Jun-01 05:56 UTC

head link

[zfs-discuss] Re: zfs boot error recovery

Jakob Praher schrieb:> hi all,
> 
> i would like to ask some questions regarding best practices for zfs
> recovery if disk errors occur.
> 
> currently i have zfs boot (nv62) and the following setup:
> 
> 2 si3224 controllers (each 4 sata disks)
> 8 sata disks, same size, same type
> 
> i have two pools:
> a) rootpool
> b) datapool
> 
> the rootpool is a mirrored pool, where every disk has a slice (the s0,
> which is 5 % of the whole disk) and this is devoted to the rootpool,
> just for mirroring.
> 
> the rest of the disk (s1) is added to the datapool which is raidz.
> 
> my idea is that if any disk is corrupt i am still be able to boot.
> 
> now I have some questions:
> 
> a) if i want to boot from every disk in case of error, i have to setup
> grub on every disk, such that if the controller sets this disk as the
> booting, the rootpool is able to be loaded from that.
> 
> b) what is the best way to as fast as possible replace a disk.
> adding a disk as hotspare for the raidz is a good idea. but i also would
> like to replace the disk during runtime as simple as possible.
> 
> the problem is that for the root pool the disks are labeled (the slices
> thingy). So I cannot simply  detach the volumes and replace the disk and
> attach them again, but I have to format the disk such that the slicing
> exists. Is there some clever way to automatically re-label a replacement
> disk?
> 
i found out that storing or getting the label information from another 
disk should work:

prtvtoc /dev/rdsk/xxxxs2 | fmthard -s - /dev/rdsk/xxxxs2

for instance i could simply store the label of all disks on the root 
pool, which should be available as long as any of the 8 disks is still 
availabe. So in case of repair i simply have to fmthard -s <disknumber> 
before attaching the replaced disk.
> c) si 3224 related question: is it possible to simply hot swap the disk
> (i have the disks in special hot-swappable units, but have no experience
> in hotswapping under solaris, such that i want to have some echo).
> 
> d) do you have best practices for systems like that above? what are the
> best resources on the web for learning about monitoring the health of
> the zfs system (like email notifications in case of disk failures...)
> 
> thannks in advance
> -- Jakob

Jakob Praher

2007-Jun-01 05:59 UTC

head link

[zfs-discuss] Re: zfs boot error recovery

hi Will,

thanks for your answer.
Will Murnane schrieb:> On 5/31/07, Jakob Praher <jp at hapra.at> wrote:
>> c) si 3224 related question: is it possible to simply hot swap the disk
>> (i have the disks in special hot-swappable units, but have no
experience
>> in hotswapping under solaris, such that i want to have some echo).
> As it happens, I just happen to have tried this - albeit on a
> different card, it went well.  I have a Marvell 88SX6081 controller,
> and removing a disk caused no undue panic (as far as I can tell).
> Adding a new disk, the kernel detected it immediately and then I had
> to run "cfgadm -cconfigure scsi0/1" or something like that.  Then
it
> Just Worked.  I don''t know if this is recommended or not... but it
> worked for me.What is the best way to simulate a disk error under zfs.
before i want to add real data to the system, i want to make sure it works.

my naive aproach:

1) remove disk from any pool membership (is this needed?)

zpool xxx detach <disk>
zpool yyy detach <disk>

2) disk should be free to be removed
3) pull plug
4) see what happens

5) plug disk in
6) restore zpool membership again

(1) and (6) should not be really needed, or do I see that incorrectly?

-- Jakob

> 
> Will

Apparently Analagous Threads

Search for more maybe matching threads

zfs discuss - May 2007 - zfs boot error recovery

[zfs-discuss] zfs boot error recovery

[zfs-discuss] zfs boot error recovery

[zfs-discuss] Re: zfs boot error recovery

[zfs-discuss] Re: zfs boot error recovery

Apparently Analagous Threads