Courtney Malone
2008-Dec-04 02:19 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
I have a 10 drive raidz, recently one of the disks appeared to be generating errors (this later turned out to be a cable), I removed the disk from the array, ran vendor diagnostics (which zeroed it). Upon reinstalling the disk however zfs will not resilver it, it gets referred to numerically instead of by device name, and when i try to replace it, i get: # zpool replace data 17096229131581286394 c0t2d0 cannot replace 17096229131581286394 with c0t2d0: cannot replace a replacing device if i try to detach it i get: # zpool detach data 17096229131581286394 cannot detach 17096229131581286394: no valid replicas current zpool output looks like: # zpool status -v pool: data state: DEGRADED scrub: none requested config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 - raidz1 DEGRADED 0 0 0 --- c0t0d0 ONLINE 0 0 0 --- c0t1d0 ONLINE 0 0 0 --- replacing UNAVAIL 0 543 0 insufficient replicas ------ 17096229131581286394 FAULTED 0 581 0 was /dev/dsk/c0t2d0s0/old ------ 11342560969745958696 FAULTED 0 582 0 was /dev/dsk/c0t2d0s0 --- c0t3d0 ONLINE 0 0 0 --- c0t4d0 ONLINE 0 0 0 --- c0t5d0 ONLINE 0 0 0 --- c0t6d0 ONLINE 0 0 0 --- c0t7d0 ONLINE 0 0 0 --- c2t2d0 ONLINE 0 0 0 --- c2t3d0 ONLINE 0 0 0 errors: No known data errors i have also tried exporting and reimporting the pool, any help would greatly appreciated. -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-05 05:55 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
any suggestions? I would like to restore redundancy ASAP -- This message posted from opensolaris.org
Brian Couper
2008-Dec-07 15:01 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
hi, --- replacing UNAVAIL 0 543 0 insufficient replicas ------ 17096229131581286394 FAULTED 0 581 0 was /dev/dsk/c0t2d0s0/old ------ 11342560969745958696 FAULTED [u][b]0 582 0[/b][/u] was /dev/dsk/c0t2d0s0 Looking at that, i dont think you have fixed the original fault. Its still getting write errors. Thats why the resilvering has stopped i recon. There there any spare drive connection''s on the system? Could you free up a drive connection? So you can plug the drive in to a different connection. You will need to resolve the hardware error, is it the drive, the cable or hard drive controller. Remember a hard drives best trick is to "act alive and well" when is really at deaths door..... One of ZFS''s best features is its ability to sniff out hardware faults. To restart the resilver, do a zpool clear and zpool online. This will force the zpool and the hard drive online. It will start to resilver, do a zpool status -v to monitor the process. Watch out for the error count on the drive. Dont do this till you really think you have got the error fixed. How is your back up situation? Get you critical data off the zpool before attempting to repair the zpool or change any thing with the zpool. What i would do is, get a new drive, connect it to a different hard drive connection and use a new cable. Remove the old drive, unplug it. I would not try to replace the faulty drive while is still connected, things are just going to get confusing. Your zpool status will then show the drive as missing. Zpool replace with the new drive. Your zpool with be fixed in a few hours. Your zpool may give errors across other drives, as long as its <50 just use zpool clear. your hardware fault my have been causing trouble for ages without you knowing. Am an amateur ZFS er so use my advice with caution. Brian, -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-07 18:19 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
Well you would think that would be the case, but the behavior is the same whether the disk is physically present or not. I can even use cfgadm to unconfigure the deevice and the pool will stay in the same state and not let me offline/detach/replace the vdev. also I don''t have any spare ports unfortunately. -- This message posted from opensolaris.org
Brian Couper
2008-Dec-07 21:23 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
zpool replace data 11342560969745958696 c0t2d0 that might replace the drive BUT you will have to sort out the hardware error first. For now forget about zfs and what is says about the zpool status. Concentrate on fixing the hardware error. Use the manufacturs drive check boot CD to check the drive again. I know you checked it once before but my money is on the hard drive being faulty. I recon you will get errors on the drive if you check it again. If it passes without any errors, and without wipeing the drive, try zpool clear and zpool online. It may not get any more write errors. Is the drive showing up in the format command? Remember, this small error has all the signs of going pear-shaped on you, so back up your data now while you can still read it! -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-07 23:39 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
the disk passes sector by sector write tests both with the vendor diag and seatools, the cable failed as soon as i tried it in another machine. the disk is good, the cable was not. it also shows up in format just fine and it has the same partition layout as all the other disks in the pool. zpool state is the problem here, like i said it doesnt care if the disk is there or not or even if c0::dsk/c0t2d0 is unconfigured with cfgadm the pool stays in a faulted state even after "zpool clear data" and those 2 vdevs under replacing remain faulted whether the disk is present or not. -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-07 23:45 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
is there anyway to use zdb to simply remove those vdevs since they arent active members of the pool? -- This message posted from opensolaris.org
Brian Couper
2008-Dec-08 07:51 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
Am at the limit of my knowlage now. google man zpool UNAVAIL is coming up because the zpool was imported with the drive missing. Try exporting the pool, rebooting then importing it with the drive connected. UNAVAIL The device could not be opened. If a pool is imported when a device was unavailable, then the device will be identified by a unique identifier instead of its path since the path was never correct in the first place. zpool attach [-f] pool device new_device have a read of zpool attach, it might work. You could also try adding the drive as a hot spare. Thats all the help i can give, sorry, i dont know how to change/edit parts of ZFS. -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-08 16:44 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
unfortunately i''ve tried zpool attach -f and exporting and reimporting the pool both with and without the disk present. -- This message posted from opensolaris.org
Miles Nordin
2008-Dec-08 19:44 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
>>>>> "cm" == Courtney Malone <courtney at courtneymalone.com> writes:cm> # zpool detach data 17096229131581286394 cm> cannot detach 17096229131581286394: no valid replicas I think detach is only for mirrors. That slot in the raidz stripe has to be filled with some kind of marker, even if the drive is not present, because the raidz slots aren''t interchangeable like they are in a mirror. With raidz{,2} you''re supposed to be able to ''zpool offline'' up to the redundancy limit, but not detach. cm> # zpool replace data 17096229131581286394 c0t2d0 cm> cannot replace 17096229131581286394 with c0t2d0: cannot replace a replacing device That''s frustrating. This hasn''t happened to me yet. How about: # zpool replace data 11342560969745958696 c0t2d0 Maybe one of the two UUID''s is the original and one''s the copy, and you have to restart the replacement by referring to the original not the copy? If that doesn''t work, maybe you''ve hit a corner case that''s not well-handled. ''zpool replace'' should be interruptable without corruption, and maybe not need to be abortable but at least needs to be restartable. Let us know what happens. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081208/38a1d743/attachment.bin>
Courtney Malone
2008-Dec-08 21:12 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
unfortunately i get the same thing whether i use either 11342560969745958696 or 17096229131581286394: zpool replace data 11342560969745958696 c0t2d0 returns: cannot replace 11342560969745958696 with c0t2d0: cannot replace a replacing device -- This message posted from opensolaris.org
This is only a guess, but have you tried # zpool replace data c0t2d0 -- This message posted from opensolaris.org
And I''m also wondering if it might be worth trying a different disk. I wonder if it''s struggling now because it''s seeing the same disk as it''s already tried to use, or if the zeroing of the disk confused it. Do you have another drive of the same size you could try? -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-09 07:07 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
#zpool replace data c0t2d0 cannot replace c0t2d0 with c0t2d0: cannot replace a replacing device I dont have another drive of that size unfortunately, though since the device was zeroed there shouldnt be any pool config data on it -- This message posted from opensolaris.org
No, there won''t be anything on the drive, I was just wondering if ZFS might get confused seeing a disk it knows about, but with no data on there. To be honest, on a single parity raid array with that many drives, I''d be buying another drive straight away. You''ve got no protection for your data right now. I''d also advise adding a hot spare to that system. If the new drive works ok, you can probably use the one you''re having problems with now as a hot spare. -- This message posted from opensolaris.org
Courtney Malone
2008-Dec-09 16:37 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
I have another drive on the way, which will be handy in the future, but it doesn''t solve the problem that zfs wont let me manipulate that pool in a manner that will return it to a non-degraded state, (even with a replacement drive or hot spare, i have already tried adding a spare) and I don''t have somewhere to dump ~6TB of data and do a restore. -- This message posted from opensolaris.org
Chris Ekkelenkamp
2008-Dec-10 21:13 UTC
[zfs-discuss] zpool "cannot replace a replacing device"
I''ve never encountered that error myself, so I''m not at all sure this suggestion will work, but I did run into something similar and the answer was to install Windows on the drive and then pop the drive back in my server. Prior to that, OpenSolaris/ZFS "remembered" the disk and wouldn''t let me use it until it appeared sufficiently different. I also had tried different tools to zero the disk. BTW, it''s always a good idea to have a spare disk of the same model on hand. -- This message posted from opensolaris.org
On Tue, Dec 9, 2008 at 8:37 AM, Courtney Malone <courtney at courtneymalone.com> wrote:> I have another drive on the way, which will be handy in the future, but it doesn''t solve the problem that zfs wont let me manipulate that pool in a manner that will return it to a non-degraded state, (even with a replacement drive or hot spare, i have already tried adding a spare) and I don''t have somewhere to dump ~6TB of data and do a restore. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Did you file a bug report? If so, can you link it so we can see the resolve (if one comes, even) -- Brent Jones brent at servuhome.net