Matt B
2007-Dec-03 20:36 UTC
[zfs-discuss] Help replacing dual identity disk in ZFS raidz and SVM mirror
Hi,
We have a number of 4200''s setup using a combination of an SVM 4 way
mirror and a ZFS raidz stripe.
Each disk (of 4) is divided up like this
/ 6GB UFS s0
Swap 8GB s1
/var 6GB UFS s3
Metadb 50MB UFS s4
/data 48GB ZFS s5
For SVM we do a 4 way mirror on /,swap, and /var
So we have 3 SVM mirrors
d0=root (sub mirrors d10, d20, d30, d40)
d1=swap (sub mirrors d11, d21,d31,d41)
d3=/var (sub mirrors d13,d23,d33,d43)
For ZFS we have a single Raidz set across all four disks s5
Everything has worked flawlessly for some time. This week we discovered that one
of our 4200''s is reporting some level of failure with regards to one of
the disks
We see these recurring errors in the syslog
Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice] Vendor: FUJITSU
Serial Number: 0616S02DD5
Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice] Sense Key: Media
Error
Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice] ASC: 0x15
(mechanical positioning error), ASCQ: 0x1, FRU: 0x0
When we run a metastat we see that 2 of the 3 SVM mirrors is reporting that the
failing disks submirror needs maintenance. Oddly enough, the third SVM mirror
reports no issues making me think there is a media error on the disk that only
happens to affect 2 of the 3 disks slices respectively
Also "zpool status" reports read issues on the failing disk
config:
NAME STATE READ WRITE CKSUM
zpool ONLINE 0 0 0
raidz ONLINE 0 0 0
c0t0d0s5 ONLINE 0 0 0
c0t1d0s5 ONLINE 50 0 0
c0t2d0s5 ONLINE 0 0 0
c0t3d0s5 ONLINE 0 0 0
So my question is what series of steps do we need to perform given the fact that
I have one disk out of four that hosts a zfs raidz on one slice, and SVM mirrors
on 3 other slices, but only 2 of the 3 SVM mirrors report requiring maintenance.
We want to keep the data integrity in place (obviously)
The server is still operational, but we want to take this opportunity to hammer
out these steps.
We found plenty of information specific to SVM disk replacement and ZFS disk
replacement, but not one document that describes disk replacement when disks
have dual identities in both raid systems (SVM and ZFS) that is sliced up
Any help or pointing to good documentation would be much appreciated.
Thanks
Matt B
Below I included a metastat dump
d3: Mirror
Submirror 0: d13
State: Okay
Submirror 1: d23
State: Needs maintenance
Submirror 2: d33
State: Okay
Submirror 3: d43
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12289725 blocks (5.9 GB)
d13: Submirror of d3
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s3 0 No Okay Yes
d23: Submirror of d3
State: Needs maintenance
Invoke: metareplace d3 c0t1d0s3 <new device>
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s3 0 No Maintenance Yes
d33: Submirror of d3
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s3 0 No Okay Yes
d43: Submirror of d3
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t3d0s3 0 No Okay Yes
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d20
State: Needs maintenance
Submirror 2: d30
State: Okay
Submirror 3: d40
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12289725 blocks (5.9 GB)
d10: Submirror of d0
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s0 0 No Okay Yes
d20: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t1d0s0 <new device>
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s0 0 No Maintenance Yes
d30: Submirror of d0
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s0 0 No Okay Yes
d40: Submirror of d0
State: Okay
Size: 12289725 blocks (5.9 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t3d0s0 0 No Okay Yes
d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Okay
Submirror 2: d31
State: Okay
Submirror 3: d41
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 16386300 blocks (7.8 GB)
d11: Submirror of d1
State: Okay
Size: 16386300 blocks (7.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t0d0s1 0 No Okay Yes
d21: Submirror of d1
State: Okay
Size: 16386300 blocks (7.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t1d0s1 0 No Okay Yes
d31: Submirror of d1
State: Okay
Size: 16386300 blocks (7.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t2d0s1 0 No Okay Yes
d41: Submirror of d1
State: Okay
Size: 16386300 blocks (7.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c0t3d0s1 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c0t3d0 Yes id1,sd at n500000e011f4a930
c0t2d0 Yes id1,sd at n500000e011edd9f0
c0t1d0 Yes id1,sd at n500000e011eddf70
c0t0d0 Yes id1,sd at n500000e011e308a0
This message posted from opensolaris.org
Matt B
2007-Dec-06 17:33 UTC
[zfs-discuss] Help replacing dual identity disk in ZFS raidz and SVM mirror
Anyone? Really need some help here This message posted from opensolaris.org
Robert Milkowski
2007-Dec-07 18:00 UTC
[zfs-discuss] Help replacing dual identity disk in ZFS raidz and SVM mirror
Hello Matt,
Monday, December 3, 2007, 8:36:28 PM, you wrote:
MB> Hi,
MB> We have a number of 4200''s setup using a combination of an SVM 4
MB> way mirror and a ZFS raidz stripe.
MB> Each disk (of 4) is divided up like this
MB> / 6GB UFS s0
MB> Swap 8GB s1
MB> /var 6GB UFS s3
MB> Metadb 50MB UFS s4
MB> /data 48GB ZFS s5
MB> For SVM we do a 4 way mirror on /,swap, and /var
MB> So we have 3 SVM mirrors
MB> d0=root (sub mirrors d10, d20, d30, d40)
MB> d1=swap (sub mirrors d11, d21,d31,d41)
MB> d3=/var (sub mirrors d13,d23,d33,d43)
MB> For ZFS we have a single Raidz set across all four disks s5
MB> Everything has worked flawlessly for some time. This week we
MB> discovered that one of our 4200''s is reporting some level of
MB> failure with regards to one of the disks
MB> We see these recurring errors in the syslog
MB> Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice]
MB> Vendor: FUJITSU Serial Number: 0616S02DD5
MB> Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice] Sense
Key: Media Error
MB> Dec 3 12:00:47 vfcustgfs02b scsi: [ID 107833 kern.notice]
MB> ASC: 0x15 (mechanical positioning error), ASCQ: 0x1, FRU: 0x0
MB> When we run a metastat we see that 2 of the 3 SVM mirrors is
MB> reporting that the failing disks submirror needs maintenance.
MB> Oddly enough, the third SVM mirror reports no issues making me
MB> think there is a media error on the disk that only happens to
MB> affect 2 of the 3 disks slices respectively
MB> Also "zpool status" reports read issues on the failing disk
MB> config:
MB> NAME STATE READ WRITE CKSUM
MB> zpool ONLINE 0 0 0
MB> raidz ONLINE 0 0 0
MB> c0t0d0s5 ONLINE 0 0 0
MB> c0t1d0s5 ONLINE 50 0 0
MB> c0t2d0s5 ONLINE 0 0 0
MB> c0t3d0s5 ONLINE 0 0 0
MB> So my question is what series of steps do we need to perform
MB> given the fact that I have one disk out of four that hosts a zfs
MB> raidz on one slice, and SVM mirrors on 3 other slices, but only 2
MB> of the 3 SVM mirrors report requiring maintenance.
MB> We want to keep the data integrity in place (obviously)
MB> The server is still operational, but we want to take this
MB> opportunity to hammer out these steps.
If you can add another disk then do it and replace a failing one with
the new one in SVM and ZFS (one by one - should be faster).
I guess you can''t add another disk.
Then detach the disk from SVM (no need from MD which are already in
maintainance mode), detech (offline) it from zfs pool, destroy metadb
replica on that disk, write down vtoc
(prtvtoc), use cfgadm -c unconfigure or -disconnect, remove disk, put
new one, label it (fmthard) the same, put metadb, attach it to zfs
(online it actually) first, as you are risking your data in your
config on zfs, while still having 3-way mirror on SVM, once its done
attach it (replace) in SVM.
Probably install a bootblock too.
--
Best regards,
Robert Milkowski mailto:rmilkowski at task.gda.pl
http://milek.blogspot.com