Thomas Bleek
2008-Aug-28 13:23 UTC
[zfs-discuss] trouble with resilver after removing drive from 3510
Hello all, I tried to test the behavior of zpool recovering after removing one drive with strange results. Setup SunFire V240/4Gig RAM, Solaris10u5, fully patched (last week) 1 3510 12x 140Gig FC Drives, 12 luns (every drive is one lun), (I don''t want to use the RAID hardware, letting ZFS doing all.) one pool with 5x2 disks and 2 spares (details below) After pulling drive 2 it took about two minutes to recognise the situation. zpool status command output and also zpool iostat 1 command output is very slow. some lines are fast, then it stops for about 30-60 seconds, but they do complete after all. the resilver has started but is VERY slow and shows strange data. The % done value is going up and down all the time. I don''t think it is working correctly. zpool iostat 1 (when it works) shows many reads but very few writes. I would have expected a mainly equal read and write rate reading from the intact mirror-side writing to the spare-disk. Most of the time during resilver the machine is 99% idle, maximum 10% kernel load for some short times. Now I have waited for more than one day but nothing is getting better. I did not put a new drive in, I wanted to see one spare getting into use. snip of zpool iostat 1 tank 337G 343G 313 2 37.4M 19.3K tank 337G 343G 240 5 29.0M 38.6K tank 337G 343G 355 6 44.4M 45.0K tank 337G 343G 336 8 41.6M 57.9K tank 337G 343G 422 0 46.0M 0 tank 337G 343G 415 10 49.4M 70.8K tank 337G 343G 358 0 43.3M 0 tank 337G 343G 340 10 42.6M 70.8K tank 337G 343G 323 5 38.1M 38.6K tank 337G 343G 315 0 35.0M 0 tank 337G 343G 336 0 40.0M 6.43K tank 337G 343G 388 10 46.8M 70.8K tank 337G 343G 351 4 43.9M 32.2K tank 337G 343G 5 5 620K 285K nothing useful (at least for me) in messages. after grep -v of the both lines date+time nftp scsi: [ID 107833 kern.warning] WARNING: /pci at 1d,700000/SUNW,qlc at 1/fp at 0,0/ssd at w216000c0ff804ba8,1 (ssd48): date+time nftp drive offline only these entries to see: Aug 27 13:04:22 nftp i/o to invalid geometry Aug 27 13:04:32 nftp i/o to invalid geometry Aug 27 13:04:37 nftp i/o to invalid geometry Aug 27 13:04:37 nftp i/o to invalid geometry Aug 27 13:04:47 nftp i/o to invalid geometry Aug 27 13:04:52 nftp i/o to invalid geometry Aug 27 13:05:23 nftp fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major Aug 27 13:05:23 nftp EVENT-TIME: Wed Aug 27 13:05:22 CEST 2008 Aug 27 13:05:23 nftp PLATFORM: SUNW,Sun-Fire-V240, CSN: -, HOSTNAME: nftp Aug 27 13:05:23 nftp SOURCE: zfs-diagnosis, REV: 1.0 Aug 27 13:05:23 nftp EVENT-ID: ea01afff-c58e-6b32-e345-81da8bf43146 Aug 27 13:05:23 nftp DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Aug 27 13:05:23 nftp AUTO-RESPONSE: No automated response will occur. Aug 27 13:05:23 nftp IMPACT: Fault tolerance of the pool may be compromised. Aug 27 13:05:23 nftp REC-ACTION: Run ''zpool status -x'' and replace the bad device. uname -a SunOS nftp 5.10 Generic_137111-04 sun4u sparc SUNW,Sun-Fire-V240 ######################################################################################## before pulling drive: sccli> show disk Ch Id Size Speed LD Status IDs Rev ---------------------------------------------------------------------------- 2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY602V300007412 WWNN 2000000C505EB811 2(3) 1 136.73GB 200MB ld1 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JX400007412 WWNN 2000000C505EB885 2(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC 0006 S/N 3HY62EGZ00007443 WWNN 2000000C50D76130 2(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JKG00007411 WWNN 2000000C505EB815 2(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY60YHX00007410 WWNN 2000000C505EBCBB 2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61FQ000007412 WWNN 2000000C505E98B9 2(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61F2E00007411 WWNN 2000000C505E8DB7 2(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY60Y1100007412 WWNN 2000000C505E98BB 2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61D0A00007411 WWNN 2000000C505E6A56 2(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61AQ200007411 WWNN 2000000C505EC2B4 2(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JP900007412 WWNN 2000000C505EB712 2(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JZC00007412 WWNN 2000000C505EB9B2 sccli> root at nftp:/>zpool status pool: tank state: ONLINE scrub: scrub completed with 0 errors on Thu Aug 21 17:22:16 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d0 ONLINE 0 0 0 c2t40d1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d2 ONLINE 0 0 0 c2t40d3 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d4 ONLINE 0 0 0 c2t40d5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d6 ONLINE 0 0 0 c2t40d7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d8 ONLINE 0 0 0 c2t40d9 ONLINE 0 0 0 spares c2t40d10 AVAIL c2t40d11 AVAIL errors: No known data errors root at nftp:/> ######################################################################################## after pulling drive: sccli> show disk Ch Id Size Speed LD Status IDs Rev ---------------------------------------------------------------------------- 2(3) 0 136.73GB 200MB ld0 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY602V300007412 WWNN 2000000C505EB811 2 1 0MB 0MB NONE MISSING SEAGATE ST314680FSUN146G 0407 S/N 3HY61JX400007412 2(3) 2 136.73GB 200MB ld2 ONLINE SEAGATE ST3146807FC 0006 S/N 3HY62EGZ00007443 WWNN 2000000C50D76130 2(3) 3 136.73GB 200MB ld3 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JKG00007411 WWNN 2000000C505EB815 2(3) 4 136.73GB 200MB ld4 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY60YHX00007410 WWNN 2000000C505EBCBB 2(3) 5 136.73GB 200MB ld5 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61FQ000007412 WWNN 2000000C505E98B9 2(3) 6 136.73GB 200MB ld6 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61F2E00007411 WWNN 2000000C505E8DB7 2(3) 7 136.73GB 200MB ld7 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY60Y1100007412 WWNN 2000000C505E98BB 2(3) 8 136.73GB 200MB ld8 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61D0A00007411 WWNN 2000000C505E6A56 2(3) 9 136.73GB 200MB ld9 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61AQ200007411 WWNN 2000000C505EC2B4 2(3) 10 136.73GB 200MB ld10 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JP900007412 WWNN 2000000C505EB712 2(3) 11 136.73GB 200MB ld11 ONLINE SEAGATE ST314680FSUN146G 0407 S/N 3HY61JZC00007412 WWNN 2000000C505EB9B2 sccli> root at nftp:/>zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver in progress, 11.56% done, 0h37m to go config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c2t40d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c2t40d1 UNAVAIL 0 0 0 cannot open c2t40d10 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d2 ONLINE 0 0 0 c2t40d3 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d4 ONLINE 0 0 0 c2t40d5 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d6 ONLINE 0 0 0 c2t40d7 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t40d8 ONLINE 0 0 0 c2t40d9 ONLINE 0 0 0 spares c2t40d10 INUSE currently in use c2t40d11 AVAIL errors: No known data errors root at nftp:/> root at nftp:/>/usr/sbin/fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 27 13:05:22 ea01afff-c58e-6b32-e345-81da8bf43146 ZFS-8000-D3 Major Fault class : fault.fs.zfs.device Description : A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information. Response : No automated response will occur. Impact : Fault tolerance of the pool may be compromised. Action : Run ''zpool status -x'' and replace the bad device. root at nftp:/> ######################################################################################## I have no idea what is going wrong here. Please give me some advices how to proceed. Or should I better make a call to service? Thanks in advance, thomas -- Dr. Thomas Bleek, Netzwerkadministrator Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum Telegrafenberg G261 D-14473 Potsdam Tel.: +49 331 288- 1818/1681 Fax.: 1730 Mobil: +49 172 1543233 E-Mail: bl at gfz-potsdam.de -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5957 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080828/cede1e57/attachment.bin>