I had a drive fail and replaced it with a new drive. During the resilvering process the new drive had write faults and was taken offline. These faults were caused by a broken SATA cable (drive checked with Manufacturers software and all ok). New cable fixed the the failure. However, now the drive shows as faulted. I know the drive is healthy so want to force a rescrub. However, this wont happen while it is showing FAULTED. I tried to force a replace but this gives the error "Cannot replace a replacing device". So I seem to be in a stuck state, where the replace wont complete. Please help - screen output below. C3P0# zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 replacing UNAVAIL 0 1.06K 0 insufficient replicas 1796873336336467178 UNAVAIL 0 1.23K 0 was /dev/ad7/old 4407623704004485413 FAULTED 0 1.22K 0 was /dev/ad7 errors: No known data errors C3P0# zpool replace -f tank 4407623704004485413 ad7 cannot replace 4407623704004485413 with ad7: cannot replace a replacing device C3P0# -- This message posted from opensolaris.org
On 03/29/10 10:31 AM, Jim wrote:> I had a drive fail and replaced it with a new drive. During the resilvering process the new drive had write faults and was taken offline. These faults were caused by a broken SATA cable (drive checked with Manufacturers software and all ok). New cable fixed the the failure. However, now the drive shows as faulted. > > I know the drive is healthy so want to force a rescrub. However, this wont happen while it is showing FAULTED. I tried to force a replace but this gives the error "Cannot replace a replacing device". So I seem to be in a stuck state, where the replace wont complete. Please help - screen output below. > >Can you zpool clear the device? -- Ian.
Yes - but it does nothing. The drive remains FAULTED. -- This message posted from opensolaris.org
On Mar 29, 2010, at 1:57 AM, Jim wrote:> Yes - but it does nothing. The drive remains FAULTED.Try to detach one of the failed devices: zpool detach tank 4407623704004485413
On Mon, 29 Mar 2010, Victor Latushkin wrote:> > On Mar 29, 2010, at 1:57 AM, Jim wrote: > >> Yes - but it does nothing. The drive remains FAULTED. > > Try to detach one of the failed devices: > > zpool detach tank 4407623704004485413 >As Victor says, the detach should work. This is a known issue and I''m currently in the process of fixing it. Here''s a bit more info: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6782540
Thanks for the suggestion, but have tried detaching but it refuses reporting no valid replicas. Capture below. C3P0# zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 replacing UNAVAIL 0 9.77K 0 insufficient replicas 1796873336336467178 UNAVAIL 0 11.6K 0 was /dev/ad7/old 4407623704004485413 FAULTED 0 10.4K 0 was /dev/ad7 errors: No known data errors C3P0# zpool detach tank 1796873336336467178 cannot detach 1796873336336467178: no valid replicas C3P0# zpool detach tank 4407623704004485413 cannot detach 4407623704004485413: no valid replicas -- This message posted from opensolaris.org
Thanks for the suggestion, but have tried detaching but it refuses reporting no valid replicas. Capture below. C3P0# zpool status pool: tank state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 replacing UNAVAIL 0 9.77K 0 insufficient replicas 1796873336336467178 UNAVAIL 0 11.6K 0 was /dev/ad7/old 4407623704004485413 FAULTED 0 10.4K 0 was /dev/ad7 errors: No known data errors C3P0# zpool detach tank 1796873336336467178 cannot detach 1796873336336467178: no valid replicas C3P0# zpool detach tank 4407623704004485413 cannot detach 4407623704004485413: no valid replicas -- This message posted from opensolaris.org
Miles Nordin
2010-Mar-29 21:51 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
>>>>> "cm" == Courtney Malone <courtney at courtneymalone.com> writes: >>>>> "j" == Jim <biainmcnally at hotmail.com> writes:j> Thanks for the suggestion, but have tried detaching but it j> refuses reporting no valid replicas. yeah this happened to someone else also, see list archives around 2008-12-03: cm> I have a 10 drive raidz, recently one of the disks appeared to cm> be generating errors (this later turned out to be a cable), cm> # zpool replace data 17096229131581286394 c0t2d0 cm> cannot replace 17096229131581286394 with c0t2d0: cannot cm> replace a replacing device cm> if i try to detach it i get: cm> # zpool detach data 17096229131581286394 cm> cannot detach 17096229131581286394: no valid replicas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100329/7cd2ab2b/attachment.bin>
On Mon, 29 Mar 2010, Jim wrote:> Thanks for the suggestion, but have tried detaching but it refuses > reporting no valid replicas. Capture below.Could you run ''zdb -ddd tank | | awk ''/^Dirty/ {output=1} /^Dataset/ {output=0} {if (output) {print}}'' This will print the dirty time log of the pool. It may take some time to run, because there''s no current way to print the dtl without printing all the metaslab info.> > C3P0# zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > replacing UNAVAIL 0 9.77K 0 insufficient replicas > 1796873336336467178 UNAVAIL 0 11.6K 0 was /dev/ad7/old > 4407623704004485413 FAULTED 0 10.4K 0 was /dev/ad7 > > errors: No known data errors > C3P0# zpool detach tank 1796873336336467178 > cannot detach 1796873336336467178: no valid replicas > C3P0# zpool detach tank 4407623704004485413 > cannot detach 4407623704004485413: no valid replicas > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Regards, markm
Thanks - have run it and returns pretty quickly. Given the output (attached) what action can I take? Thanks James -- This message posted from opensolaris.org -------------- next part -------------- Dirty time logs: tank outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 raidz outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad4 /dev/ad6 replacing outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad7/old outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad7 outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 Metaslabs: vdev 0 0 26 20.0M offset spacemap free ------ -------- ---- 400000000 52 166M 800000000 56 2.66G c00000000 65 12.4M 1000000000 66 20.7M 1400000000 69 29.1M 1800000000 73 29.7M 1c00000000 77 29.6M 2000000000 81 79.2M 2400000000 91 87.9M 2800000000 92 63.2M 2c00000000 94 94.2M 3000000000 99 123M 3400000000 103 523M 3800000000 107 50.9M 3c00000000 111 117M 4000000000 116 54.3M 4400000000 119 60.2M 4800000000 123 97.4M 4c00000000 126 1.20G 5000000000 129 48.5M 5400000000 132 106M 5800000000 137 27.4M 5c00000000 140 39.6M 6000000000 146 45.3M 6400000000 149 34.9M 6800000000 151 544M 6c00000000 154 36.6M 7000000000 156 19.4M 7400000000 160 35.7M 7800000000 162 41.2M 7c00000000 166 23.1M 9c00000000 74 14.1M a000000000 78 15.2M a400000000 88 28.1M a800000000 174 23.3M ac00000000 178 24.2M b000000000 181 26.3M b400000000 100 43.4M b800000000 104 33.6M bc00000000 108 30.6M c000000000 113 59.8M c400000000 115 53.9M c800000000 120 30.8M cc00000000 124 82.2M d000000000 127 36.9M d400000000 130 76.2M d800000000 133 39.7M dc00000000 136 25.4M e000000000 142 44.5M e400000000 143 35.3M e800000000 144 43.6M ec00000000 150 43.7M f000000000 155 57.5M f400000000 158 37.2M f800000000 161 53.4M fc00000000 164 68.5M 10000000000 167 13.8M 10400000000 171 10.4M 10800000000 24 145M 10c00000000 183 14.6M 11000000000 63 22.4M 11400000000 93 74.9M 11800000000 95 53.9M 11c00000000 71 20.3M 12000000000 76 24.0M 12400000000 79 36.1M 12800000000 90 54.5M 13800000000 101 56.8M 13c00000000 106 96.6M 14000000000 110 71.5M 14400000000 112 37.4M 14800000000 117 111M 14c00000000 122 14.9M 15000000000 125 359M 15400000000 128 29.6M 15800000000 131 54.1M 15c00000000 134 35.6M 16000000000 138 39.3M 16400000000 141 80.6M 16800000000 147 19.3M 16c00000000 145 418M 17000000000 152 1.11G 17400000000 187 35.1M 17800000000 157 85.6M 17c00000000 190 32.3M 18000000000 163 33.9M 18400000000 169 1.05G 18800000000 173 23.3M 18c00000000 51 3.39G 19000000000 185 125M 19400000000 64 19.6M 19800000000 192 31.1M 19c00000000 97 75.1M 1a000000000 72 21.2M 1a400000000 195 27.4M 1a800000000 80 147M 1ac00000000 89 1.02G 1b000000000 198 32.5M 1b400000000 179 23.2M 1b800000000 200 400M 1bc00000000 102 474M 1c000000000 105 29.2M 1c400000000 109 54.9M 1c800000000 114 24.3M 1cc00000000 118 43.8M 1d000000000 202 1.99G 1d400000000 0 16G 1d800000000 0 16G 1dc00000000 0 16G 1e000000000 135 14.7M 1e400000000 139 25.3M 1e800000000 0 16G 1ec00000000 148 34.9M 1f000000000 0 16G 1f400000000 153 82.4M 1f800000000 188 16.9M 1fc00000000 159 21.1M 20000000000 0 16G 20400000000 165 43.1M 20800000000 0 16G 20c00000000 172 20.7M 21000000000 54 251M 21400000000 184 26.1M 21800000000 68 21.0M 21c00000000 193 16.5M 22000000000 98 78.7M 22400000000 75 20.4M 22800000000 197 27.5M 22c00000000 0 16G 23000000000 0 16G 23400000000 0 16G 23800000000 180 36.9M 23c00000000 199 13.7G 24000000000 0 16G 24400000000 0 16G 24800000000 0 16G 24c00000000 0 16G 25000000000 121 75.1M 25400000000 201 1016M 25800000000 0 16G 25c00000000 0 16G 26000000000 0 16G 26400000000 0 16G 26800000000 0 16G 26c00000000 0 16G 12c00000000 175 27.9M 13000000000 177 23.5M 13400000000 182 35.1M 8000000000 170 74.2M 8400000000 25 90.2M 8800000000 53 64.9M 8c00000000 55 2.26G 9000000000 96 39.3M 9400000000 67 119M 9800000000 70 11.5M