Thomas Bleek
2008-Aug-28 13:23 UTC
[zfs-discuss] trouble with resilver after removing drive from 3510
Hello all,
I tried to test the behavior of zpool recovering after removing one
drive with strange results.
Setup SunFire V240/4Gig RAM, Solaris10u5, fully patched (last week)
1 3510 12x 140Gig FC Drives, 12 luns (every drive is one lun), (I don''t
want to use the RAID hardware, letting ZFS doing all.)
one pool with 5x2 disks and 2 spares
(details below)
After pulling drive 2 it took about two minutes to recognise the situation.
zpool status command output and also zpool iostat 1 command output is
very slow. some lines are fast, then it stops for about 30-60 seconds,
but they do complete after all.
the resilver has started but is VERY slow and shows strange data. The %
done value is going up and down all the time. I don''t think it is
working correctly.
zpool iostat 1 (when it works) shows many reads but very few writes. I
would have expected a mainly equal read and write rate reading from the
intact mirror-side writing to the spare-disk.
Most of the time during resilver the machine is 99% idle, maximum 10%
kernel load for some short times.
Now I have waited for more than one day but nothing is getting better.
I did not put a new drive in, I wanted to see one spare getting into use.
snip of zpool iostat 1
tank 337G 343G 313 2 37.4M 19.3K
tank 337G 343G 240 5 29.0M 38.6K
tank 337G 343G 355 6 44.4M 45.0K
tank 337G 343G 336 8 41.6M 57.9K
tank 337G 343G 422 0 46.0M 0
tank 337G 343G 415 10 49.4M 70.8K
tank 337G 343G 358 0 43.3M 0
tank 337G 343G 340 10 42.6M 70.8K
tank 337G 343G 323 5 38.1M 38.6K
tank 337G 343G 315 0 35.0M 0
tank 337G 343G 336 0 40.0M 6.43K
tank 337G 343G 388 10 46.8M 70.8K
tank 337G 343G 351 4 43.9M 32.2K
tank 337G 343G 5 5 620K 285K
nothing useful (at least for me) in messages. after grep -v of the both
lines
date+time nftp scsi: [ID 107833 kern.warning] WARNING:
/pci at 1d,700000/SUNW,qlc at 1/fp at 0,0/ssd at w216000c0ff804ba8,1 (ssd48):
date+time nftp drive offline
only these entries to see:
Aug 27 13:04:22 nftp i/o to invalid geometry
Aug 27 13:04:32 nftp i/o to invalid geometry
Aug 27 13:04:37 nftp i/o to invalid geometry
Aug 27 13:04:37 nftp i/o to invalid geometry
Aug 27 13:04:47 nftp i/o to invalid geometry
Aug 27 13:04:52 nftp i/o to invalid geometry
Aug 27 13:05:23 nftp fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 27 13:05:23 nftp EVENT-TIME: Wed Aug 27 13:05:22 CEST 2008
Aug 27 13:05:23 nftp PLATFORM: SUNW,Sun-Fire-V240, CSN: -, HOSTNAME: nftp
Aug 27 13:05:23 nftp SOURCE: zfs-diagnosis, REV: 1.0
Aug 27 13:05:23 nftp EVENT-ID: ea01afff-c58e-6b32-e345-81da8bf43146
Aug 27 13:05:23 nftp DESC: A ZFS device failed. Refer to
http://sun.com/msg/ZFS-8000-D3 for more information.
Aug 27 13:05:23 nftp AUTO-RESPONSE: No automated response will occur.
Aug 27 13:05:23 nftp IMPACT: Fault tolerance of the pool may be compromised.
Aug 27 13:05:23 nftp REC-ACTION: Run ''zpool status -x'' and
replace the
bad device.
uname -a
SunOS nftp 5.10 Generic_137111-04 sun4u sparc SUNW,Sun-Fire-V240
########################################################################################
before pulling drive:
sccli> show disk
Ch Id Size Speed LD Status IDs
Rev
----------------------------------------------------------------------------
2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY602V300007412
WWNN 2000000C505EB811
2(3) 1 136.73GB 200MB ld1 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JX400007412
WWNN 2000000C505EB885
2(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC
0006
S/N 3HY62EGZ00007443
WWNN 2000000C50D76130
2(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JKG00007411
WWNN 2000000C505EB815
2(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY60YHX00007410
WWNN 2000000C505EBCBB
2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61FQ000007412
WWNN 2000000C505E98B9
2(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61F2E00007411
WWNN 2000000C505E8DB7
2(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY60Y1100007412
WWNN 2000000C505E98BB
2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61D0A00007411
WWNN 2000000C505E6A56
2(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61AQ200007411
WWNN 2000000C505EC2B4
2(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JP900007412
WWNN 2000000C505EB712
2(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JZC00007412
WWNN 2000000C505EB9B2
sccli>
root at nftp:/>zpool status
pool: tank
state: ONLINE
scrub: scrub completed with 0 errors on Thu Aug 21 17:22:16 2008
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d0 ONLINE 0 0 0
c2t40d1 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d2 ONLINE 0 0 0
c2t40d3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d4 ONLINE 0 0 0
c2t40d5 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d6 ONLINE 0 0 0
c2t40d7 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d8 ONLINE 0 0 0
c2t40d9 ONLINE 0 0 0
spares
c2t40d10 AVAIL
c2t40d11 AVAIL
errors: No known data errors
root at nftp:/>
########################################################################################
after pulling drive:
sccli> show disk
Ch Id Size Speed LD Status IDs
Rev
----------------------------------------------------------------------------
2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY602V300007412
WWNN 2000000C505EB811
2 1 0MB 0MB NONE MISSING SEAGATE ST314680FSUN146G
0407
S/N 3HY61JX400007412
2(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC
0006
S/N 3HY62EGZ00007443
WWNN 2000000C50D76130
2(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JKG00007411
WWNN 2000000C505EB815
2(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY60YHX00007410
WWNN 2000000C505EBCBB
2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61FQ000007412
WWNN 2000000C505E98B9
2(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61F2E00007411
WWNN 2000000C505E8DB7
2(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY60Y1100007412
WWNN 2000000C505E98BB
2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61D0A00007411
WWNN 2000000C505E6A56
2(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61AQ200007411
WWNN 2000000C505EC2B4
2(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JP900007412
WWNN 2000000C505EB712
2(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G
0407
S/N 3HY61JZC00007412
WWNN 2000000C505EB9B2
sccli>
root at nftp:/>zpool status
pool: tank
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas
exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
see: http://www.sun.com/msg/ZFS-8000-D3
scrub: resilver in progress, 11.56% done, 0h37m to go
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c2t40d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
c2t40d1 UNAVAIL 0 0 0 cannot open
c2t40d10 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d2 ONLINE 0 0 0
c2t40d3 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d4 ONLINE 0 0 0
c2t40d5 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d6 ONLINE 0 0 0
c2t40d7 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t40d8 ONLINE 0 0 0
c2t40d9 ONLINE 0 0 0
spares
c2t40d10 INUSE currently in use
c2t40d11 AVAIL
errors: No known data errors
root at nftp:/>
root at nftp:/>/usr/sbin/fmadm faulty
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID
SEVERITY
--------------- ------------------------------------ --------------
---------
Aug 27 13:05:22 ea01afff-c58e-6b32-e345-81da8bf43146 ZFS-8000-D3
Major
Fault class : fault.fs.zfs.device
Description : A ZFS device failed. Refer to
http://sun.com/msg/ZFS-8000-D3 for
more information.
Response : No automated response will occur.
Impact : Fault tolerance of the pool may be compromised.
Action : Run ''zpool status -x'' and replace the bad
device.
root at nftp:/>
########################################################################################
I have no idea what is going wrong here. Please give me some advices how
to proceed. Or should I better make a call to service?
Thanks in advance,
thomas
--
Dr. Thomas Bleek, Netzwerkadministrator
Helmholtz-Zentrum Potsdam
Deutsches GeoForschungsZentrum
Telegrafenberg G261
D-14473 Potsdam
Tel.: +49 331 288- 1818/1681 Fax.: 1730 Mobil: +49 172 1543233
E-Mail: bl at gfz-potsdam.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5957 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080828/cede1e57/attachment.bin>