hi all, i would like to ask some questions regarding best practices for zfs recovery if disk errors occur. currently i have zfs boot (nv62) and the following setup: 2 si3224 controllers (each 4 sata disks) 8 sata disks, same size, same type i have two pools: a) rootpool b) datapool the rootpool is a mirrored pool, where every disk has a slice (the s0, which is 5 % of the whole disk) and this is devoted to the rootpool, just for mirroring. the rest of the disk (s1) is added to the datapool which is raidz. my idea is that if any disk is corrupt i am still be able to boot. now I have some questions: a) if i want to boot from every disk in case of error, i have to setup grub on every disk, such that if the controller sets this disk as the booting, the rootpool is able to be loaded from that. b) what is the best way to as fast as possible replace a disk. adding a disk as hotspare for the raidz is a good idea. but i also would like to replace the disk during runtime as simple as possible. the problem is that for the root pool the disks are labeled (the slices thingy). So I cannot simply detach the volumes and replace the disk and attach them again, but I have to format the disk such that the slicing exists. Is there some clever way to automatically re-label a replacement disk? c) si 3224 related question: is it possible to simply hot swap the disk (i have the disks in special hot-swappable units, but have no experience in hotswapping under solaris, such that i want to have some echo). d) do you have best practices for systems like that above? what are the best resources on the web for learning about monitoring the health of the zfs system (like email notifications in case of disk failures...) thannks in advance -- Jakob
On 5/31/07, Jakob Praher <jp at hapra.at> wrote:> c) si 3224 related question: is it possible to simply hot swap the disk > (i have the disks in special hot-swappable units, but have no experience > in hotswapping under solaris, such that i want to have some echo).As it happens, I just happen to have tried this - albeit on a different card, it went well. I have a Marvell 88SX6081 controller, and removing a disk caused no undue panic (as far as I can tell). Adding a new disk, the kernel detected it immediately and then I had to run "cfgadm -cconfigure scsi0/1" or something like that. Then it Just Worked. I don''t know if this is recommended or not... but it worked for me. Will
Jakob Praher schrieb:> hi all, > > i would like to ask some questions regarding best practices for zfs > recovery if disk errors occur. > > currently i have zfs boot (nv62) and the following setup: > > 2 si3224 controllers (each 4 sata disks) > 8 sata disks, same size, same type > > i have two pools: > a) rootpool > b) datapool > > the rootpool is a mirrored pool, where every disk has a slice (the s0, > which is 5 % of the whole disk) and this is devoted to the rootpool, > just for mirroring. > > the rest of the disk (s1) is added to the datapool which is raidz. > > my idea is that if any disk is corrupt i am still be able to boot. > > now I have some questions: > > a) if i want to boot from every disk in case of error, i have to setup > grub on every disk, such that if the controller sets this disk as the > booting, the rootpool is able to be loaded from that. > > b) what is the best way to as fast as possible replace a disk. > adding a disk as hotspare for the raidz is a good idea. but i also would > like to replace the disk during runtime as simple as possible. > > the problem is that for the root pool the disks are labeled (the slices > thingy). So I cannot simply detach the volumes and replace the disk and > attach them again, but I have to format the disk such that the slicing > exists. Is there some clever way to automatically re-label a replacement > disk? >i found out that storing or getting the label information from another disk should work: prtvtoc /dev/rdsk/xxxxs2 | fmthard -s - /dev/rdsk/xxxxs2 for instance i could simply store the label of all disks on the root pool, which should be available as long as any of the 8 disks is still availabe. So in case of repair i simply have to fmthard -s <disknumber> before attaching the replaced disk.> c) si 3224 related question: is it possible to simply hot swap the disk > (i have the disks in special hot-swappable units, but have no experience > in hotswapping under solaris, such that i want to have some echo). > > d) do you have best practices for systems like that above? what are the > best resources on the web for learning about monitoring the health of > the zfs system (like email notifications in case of disk failures...) > > thannks in advance > -- Jakob
hi Will, thanks for your answer. Will Murnane schrieb:> On 5/31/07, Jakob Praher <jp at hapra.at> wrote: >> c) si 3224 related question: is it possible to simply hot swap the disk >> (i have the disks in special hot-swappable units, but have no experience >> in hotswapping under solaris, such that i want to have some echo). > As it happens, I just happen to have tried this - albeit on a > different card, it went well. I have a Marvell 88SX6081 controller, > and removing a disk caused no undue panic (as far as I can tell). > Adding a new disk, the kernel detected it immediately and then I had > to run "cfgadm -cconfigure scsi0/1" or something like that. Then it > Just Worked. I don''t know if this is recommended or not... but it > worked for me.What is the best way to simulate a disk error under zfs. before i want to add real data to the system, i want to make sure it works. my naive aproach: 1) remove disk from any pool membership (is this needed?) zpool xxx detach <disk> zpool yyy detach <disk> 2) disk should be free to be removed 3) pull plug 4) see what happens 5) plug disk in 6) restore zpool membership again (1) and (6) should not be really needed, or do I see that incorrectly? -- Jakob> > Will