I have a zpool with one dataset and a handful of snapshots. I cannot delete two of the snapshots. The message I get is "dataset is busy". Neither fuser or lsof show anything holding open the .zfs/snapshot/<sanpshot name> directory. What can cause this ? xxx> uname -a SunOS nyc-sed3 5.10 Generic_142909-17 sun4u sparc SUNW,SPARC-Enterprise xxx> zpool upgrade This system is currently running ZFS pool version 22. All pools are formatted using this version. xxx> zpool get all zpool-01 NAME PROPERTY VALUE SOURCE zpool-01 size 74.9T - zpool-01 capacity 10% - zpool-01 altroot - default zpool-01 health ONLINE - zpool-01 guid 6976165213827467407 default zpool-01 version 22 default zpool-01 bootfs - default zpool-01 delegation on default zpool-01 autoreplace off default zpool-01 cachefile - default zpool-01 failmode wait default zpool-01 listsnapshots on default zpool-01 autoexpand off default zpool-01 free 67.2T - zpool-01 allocated 7.75T - xxx> zfs upgrade This system is currently running ZFS filesystem version 4. All filesystems are formatted with the current version. xxx> zfs get all zpool-01/dataset-01 NAME PROPERTY VALUE SOURCE zpool-01/dataset-01 type filesystem - zpool-01/dataset-01 creation Tue Jan 25 10:02 2011 - zpool-01/dataset-01 used 4.60T - zpool-01/dataset-01 available 39.3T - zpool-01/dataset-01 referenced 1.09M - zpool-01/dataset-01 compressratio 1.54x - zpool-01/dataset-01 mounted yes - zpool-01/dataset-01 quota none default zpool-01/dataset-01 reservation none default zpool-01/dataset-01 recordsize 32K inherited from zpool-01 zpool-01/dataset-01 mountpoint /zpool-01/dataset-01 default zpool-01/dataset-01 sharenfs off default zpool-01/dataset-01 checksum on default zpool-01/dataset-01 compression on inherited from zpool-01 zpool-01/dataset-01 atime on default zpool-01/dataset-01 devices on default zpool-01/dataset-01 exec on default zpool-01/dataset-01 setuid on default zpool-01/dataset-01 readonly off default zpool-01/dataset-01 zoned off default zpool-01/dataset-01 snapdir hidden default zpool-01/dataset-01 aclmode passthrough inherited from zpool-01 zpool-01/dataset-01 aclinherit passthrough inherited from zpool-01 zpool-01/dataset-01 canmount on default zpool-01/dataset-01 shareiscsi off default zpool-01/dataset-01 xattr on default zpool-01/dataset-01 copies 1 default zpool-01/dataset-01 version 4 - zpool-01/dataset-01 utf8only off - zpool-01/dataset-01 normalization none - zpool-01/dataset-01 casesensitivity sensitive - zpool-01/dataset-01 vscan off default zpool-01/dataset-01 nbmand off default zpool-01/dataset-01 sharesmb off default zpool-01/dataset-01 refquota none default zpool-01/dataset-01 refreservation none default zpool-01/dataset-01 primarycache all default zpool-01/dataset-01 secondarycache all default zpool-01/dataset-01 usedbysnapshots 4.60T - zpool-01/dataset-01 usedbydataset 1.09M - zpool-01/dataset-01 usedbychildren 0 - zpool-01/dataset-01 usedbyrefreservation 0 - zpool-01/dataset-01 logbias latency default xxx> zfs list | grep zpool-01/dataset-01 zpool-01/dataset-01 4.60T 39.3T 1.09M /zpool-01/dataset-01 zpool-01/dataset-01 at 1299636001 117G - 1.12T - zpool-01/dataset-01 at 1300233615 3.48T - 4.48T - zpool-01/dataset-01 at 1301950939 0 - 1.02M - zpool-01/dataset-01 at 1301951162 0 - 1.02M - zpool-01/dataset-01 at 1302004805 0 - 1.09M - zpool-01/dataset-01 at 1302005162 0 - 1.09M - zpool-01/dataset-01 at 1302005414 0 - 1.09M - xxx> sudo zfs destroy zpool-01/dataset-01 at 1299636001 Password: cannot destroy ''zpool-01/dataset-01 at 1299636001'': dataset is busy xxx> sudo zfs destroy zpool-01/dataset-01 at 1300233615 cannot destroy ''zpool-01/dataset-01 at 1300233615'': dataset is busy xxx> -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 04/ 6/11 12:28 AM, Paul Kraus wrote:> I have a zpool with one dataset and a handful of snapshots. I > cannot delete two of the snapshots. The message I get is "dataset is > busy". Neither fuser or lsof show anything holding open the > .zfs/snapshot/<sanpshot name> directory. What can cause this ? >Do you have any clones? -- Ian.
On Tue, Apr 5, 2011 at 5:29 PM, Ian Collins <ian at ianshome.com> wrote:> ?On 04/ 6/11 12:28 AM, Paul Kraus wrote: >> >> ? ? I have a zpool with one dataset and a handful of snapshots. I >> cannot delete two of the snapshots. The message I get is "dataset is >> busy". Neither fuser or lsof show anything holding open the >> .zfs/snapshot/<sanpshot name> ?directory. What can cause this ? >> > Do you have any clones?Nope. Just a basic snapshot. I did a `zfs destroy -d` and that did not complain, so I''ll if they magically disappear at some point in the future. I just can''t figure out what can be holding those snapshots open to prevent destruction. It reminds me of the first time I could not umount a UFS and fuser/lsof showed nothing... it was NFS shared and the kernel does not show up in fuser/lsof. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 04/05/11 17:29, Ian Collins wrote:> On 04/ 6/11 12:28 AM, Paul Kraus wrote: >> I have a zpool with one dataset and a handful of snapshots. I >> cannot delete two of the snapshots. The message I get is "dataset is >> busy". Neither fuser or lsof show anything holding open the >> .zfs/snapshot/<sanpshot name> directory. What can cause this ? >> > Do you have any clones?If there are clones then zfs destroy should report that. The error being reported is "dataset is busy" which would be reported if there are user holds on the snapshots that can''t be deleted. Try running "zfs holds zpool-01/dataset-01 at 1299636001" -- Rich
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > I have a zpool with one dataset and a handful of snapshots. I > cannot delete two of the snapshots. The message I get is "dataset is > busy". Neither fuser or lsof show anything holding open the > .zfs/snapshot/<sanpshot name> directory. What can cause this ?This may not apply to you, but in some other unrelated situation it was useful... Try zdb -d poolname In an older version of zpool, under certain conditions, there would sometimes be "hidden" clones listed with a % in the name. Maybe the % won''t be there in your case, but maybe you have some other manifestation of the hidden clone problem?
On Tue, Apr 5, 2011 at 6:56 PM, Rich Morris <rich.morris at oracle.com> wrote:> On 04/05/11 17:29, Ian Collins wrote:> If there are clones then zfs destroy should report that. ?The error being > reported is "dataset is busy" which would be reported if there are user > holds on the snapshots that can''t be deleted. > > Try running "zfs holds zpool-01/dataset-01 at 1299636001"xxx> zfs holds zpool-01/dataset-01 at 1299636001 NAME TAG TIMESTAMP zpool-01/dataset-01 at 1299636001 .send-18440-0 Tue Mar 15 20:00:39 2011 xxx> zfs holds zpool-01/dataset-01 at 1300233615 NAME TAG TIMESTAMP zpool-01/dataset-01 at 1300233615 .send-18440-0 Tue Mar 15 20:00:47 2011 xxx> That is what I was looking for. Looks like when a zfs send got killed it left a hanging lock (hold) around. I assume the next export/import (not likely as this is a production zpool) or a reboot (will happen eventually, and I can wait) these will clear. Unless there is a way to force clear the hold. Thanks Rich. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Tue, Apr 5, 2011 at 9:26 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> This may not apply to you, but in some other unrelated situation it was > useful... > > Try zdb -d poolname > In an older version of zpool, under certain conditions, there would > sometimes be "hidden" clones listed with a % in the name. ?Maybe the % won''t > be there in your case, but maybe you have some other manifestation of the > hidden clone problem?I have seen the dataset with a ''%'' in the name, but that was during a zfs recv (and if the zfs recv dies, then it sometimes hangs around and has to be destroyed, and the zfs destroy claims to fail even though it succeeds ;-), but not in this case. The snapshots are all valid (I just can''t destroy two of them), we are snapshotting on a fairly frequent basis as we are loading data. Thanks for the suggestion. xxx> zdb -d zpool-01 Dataset mos [META], ID 0, cr_txg 4, 18.7G, 745 objects Dataset zpool-01/dataset-01 at 1302019202 [ZPL], ID 140, cr_txg 654658, 38.9G, 990842 objects Dataset zpool-01/dataset-01 at 1302051600 [ZPL], ID 158, cr_txg 655776, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302062401 [ZPL], ID 189, cr_txg 656162, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1301951162 [ZPL], ID 108, cr_txg 652292, 1.02M, 478 objects Dataset zpool-01/dataset-01 at 1302087601 [ZPL], ID 254, cr_txg 657065, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302105601 [ZPL], ID 291, cr_txg 657710, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302058800 [ZPL], ID 164, cr_txg 656033, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1299636001 [ZPL], ID 48, cr_txg 560375, 1.12T, 28468324 objects Dataset zpool-01/dataset-01 at 1302007173 [ZPL], ID 125, cr_txg 654202, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302055201 [ZPL], ID 161, cr_txg 655905, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302080401 [ZPL], ID 248, cr_txg 656807, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302044400 [ZPL], ID 152, cr_txg 655518, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1301950939 [ZPL], ID 106, cr_txg 652280, 1.02M, 478 objects Dataset zpool-01/dataset-01 at 1302015602 [ZPL], ID 137, cr_txg 654530, 10.3G, 175879 objects Dataset zpool-01/dataset-01 at 1302030001 [ZPL], ID 143, cr_txg 655029, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1300233615 [ZPL], ID 79, cr_txg 594951, 4.48T, 99259515 objects Dataset zpool-01/dataset-01 at 1302094801 [ZPL], ID 282, cr_txg 657323, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302066001 [ZPL], ID 214, cr_txg 656291, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302006933 [ZPL], ID 120, cr_txg 654181, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302098401 [ZPL], ID 285, cr_txg 657452, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302007755 [ZPL], ID 131, cr_txg 654240, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302048001 [ZPL], ID 155, cr_txg 655647, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302005414 [ZPL], ID 116, cr_txg 654119, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302007469 [ZPL], ID 128, cr_txg 654221, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302084001 [ZPL], ID 251, cr_txg 656936, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302076801 [ZPL], ID 245, cr_txg 656678, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302069601 [ZPL], ID 217, cr_txg 656420, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302073201 [ZPL], ID 242, cr_txg 656549, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302102001 [ZPL], ID 288, cr_txg 657581, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 at 1302005162 [ZPL], ID 112, cr_txg 654101, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302012001 [ZPL], ID 134, cr_txg 654391, 1.18G, 63312 objects Dataset zpool-01/dataset-01 at 1302004805 [ZPL], ID 110, cr_txg 654085, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302006769 [ZPL], ID 118, cr_txg 654171, 1.09M, 506 objects Dataset zpool-01/dataset-01 at 1302091201 [ZPL], ID 257, cr_txg 657194, 71.1G, 1845553 objects Dataset zpool-01/dataset-01 [ZPL], ID 84, cr_txg 439406, 71.1G, 1845553 objects Dataset zpool-01 [ZPL], ID 16, cr_txg 1, 39.3K, 5 objects xxx> -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 04/06/11 12:43, Paul Kraus wrote:> xxx> zfs holds zpool-01/dataset-01 at 1299636001 > NAME TAG TIMESTAMP > zpool-01/dataset-01 at 1299636001 .send-18440-0 Tue Mar 15 20:00:39 2011 > xxx> zfs holds zpool-01/dataset-01 at 1300233615 > NAME TAG TIMESTAMP > zpool-01/dataset-01 at 1300233615 .send-18440-0 Tue Mar 15 20:00:47 2011 > xxx> > > That is what I was looking for. Looks like when a zfs send got > killed it left a hanging lock (hold) around. I assume the next > export/import (not likely as this is a production zpool) or a reboot > (will happen eventually, and I can wait) these will clear. Unless > there is a way to force clear the hold.The user holds won''t be released by an export/import or a reboot. "zfs get defer_destroy snapname" will show whether this snapshot is marked for deferred destroy and "zfs release .send-18440-0 snapname" will clear that hold. If the snapshot is marked for deferred destroy then the release of the last tag will also destroy it. -- Rich
On Wed, Apr 6, 2011 at 1:58 PM, Rich Morris <rich.morris at oracle.com> wrote:> On 04/06/11 12:43, Paul Kraus wrote: >> >> xxx> zfs holds zpool-01/dataset-01 at 1299636001 >> NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TAG ? ? ? ? ? ?TIMESTAMP >> zpool-01/dataset-01 at 1299636001 ?.send-18440-0 ?Tue Mar 15 20:00:39 2011 >> xxx> zfs holds zpool-01/dataset-01 at 1300233615 >> NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TAG ? ? ? ? ? ?TIMESTAMP >> zpool-01/dataset-01 at 1300233615 ?.send-18440-0 ?Tue Mar 15 20:00:47 2011 >> xxx> >> >> ? ?That is what I was looking for. Looks like when a zfs send got >> killed it left a hanging lock (hold) around. I assume the next >> export/import (not likely as this is a production zpool) or a reboot >> (will happen eventually, and I can wait) these will clear. Unless >> there is a way to force clear the hold. > > The user holds won''t be released by an export/import or a reboot. > > "zfs get defer_destroy snapname" will show whether this snapshot is marked > for > deferred destroy and "zfs release .send-18440-0 snapname" will clear that > hold. > If the snapshot is marked for deferred destroy then the release of the last > tag > will also destroy it.Sorry I did not get back on this last week, it got busy late in the week. I tried the `zfs release` and it appeared to hang, so I just let it be. A few hours later the server experienced a resource crunch of some type (fork errors about unable to allocate resources). The load also varied between about 16 and 50 (it is a 16 CPU M4000). Users who had an open SAMBA connection seemed OK, but eventually we needed to reboot the box (I did let it sit in that state as long as I could). Since I could not even get on the XSCF console, I had to `break` it to the OK prompt and sync it. The first boot hung. I then did a boot -rv and that also hung (I was hoping to see a device probe that caused the hang, but it looked like it was getting past all the device discovery). That also hung. Finally a boot -srv got me to a login prompt. I logged in as root, then logged out and it came up to mulltiuser-server without a hitch. I do not know what the root cause of the initial resource problem was, as I did not get a good core dump. I *hope* it was not the `zfs release`, but it may have been. After the boot cycle(s) the zfs snapshots are no longer held and I could destroy them. Thanks to all those who helped. This discussion is one of the best sources, if not THE best source, of zfs support and knowledge. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Apr 11, 2011, at 3:22 PM, Paul Kraus wrote:> On Wed, Apr 6, 2011 at 1:58 PM, Rich Morris <rich.morris at oracle.com> wrote: >> On 04/06/11 12:43, Paul Kraus wrote: >>> >>> xxx> zfs holds zpool-01/dataset-01 at 1299636001 >>> NAME TAG TIMESTAMP >>> zpool-01/dataset-01 at 1299636001 .send-18440-0 Tue Mar 15 20:00:39 2011 >>> xxx> zfs holds zpool-01/dataset-01 at 1300233615 >>> NAME TAG TIMESTAMP >>> zpool-01/dataset-01 at 1300233615 .send-18440-0 Tue Mar 15 20:00:47 2011 >>> xxx> >>> >>> That is what I was looking for. Looks like when a zfs send got >>> killed it left a hanging lock (hold) around. I assume the next >>> export/import (not likely as this is a production zpool) or a reboot >>> (will happen eventually, and I can wait) these will clear. Unless >>> there is a way to force clear the hold. >> >> The user holds won''t be released by an export/import or a reboot. >> >> "zfs get defer_destroy snapname" will show whether this snapshot is marked >> for >> deferred destroy and "zfs release .send-18440-0 snapname" will clear that >> hold. >> If the snapshot is marked for deferred destroy then the release of the last >> tag >> will also destroy it. > > Sorry I did not get back on this last week, it got busy late in the week. > > I tried the `zfs release` and it appeared to hang, so I just let > it be. A few hours later the server experienced a resource crunch of > some type (fork errors about unable to allocate resources). The load > also varied between about 16 and 50 (it is a 16 CPU M4000). > > Users who had an open SAMBA connection seemed OK, but eventually > we needed to reboot the box (I did let it sit in that state as long as > I could). Since I could not even get on the XSCF console, I had to > `break` it to the OK prompt and sync it. The first boot hung. I then > did a boot -rv and that also hung (I was hoping to see a device probe > that caused the hang, but it looked like it was getting past all the > device discovery). That also hung. Finally a boot -srv got me to a > login prompt. I logged in as root, then logged out and it came up to > mulltiuser-server without a hitch. > > I do not know what the root cause of the initial resource problem > was, as I did not get a good core dump. I *hope* it was not the `zfs > release`, but it may have been. > > After the boot cycle(s) the zfs snapshots are no longer held and I > could destroy them. > > Thanks to all those who helped. This discussion is one of the best > sources, if not THE best source, of zfs support and knowledge.I hate to drudge up this "old" email thread, but I just wanted to: a) say thanks ("thanks!") as I had exactly this same issue just crop up on Sol10u9 (zpool rev22) and sure enough, it had a hold from a previous send. b) mention (for those that may find this thread in the future) that once I found the hold, the "zfs release [hold] [snapname]" method mentioned above worked swimmingly for me. I was nervous doing this during production hours, but the release command returned in about 5-7 seconds with no apparent adverse effects. I was then able to destroy the snap. I was initially afraid that it was somehow the "memory bug" mentioned in the current thread (when things are fresh in your mind, they seem more likely), so I''m glad this thread was out there. matt
Hi, For my carelessness, I added two disks into a raid-z2 zpool as normal data disk, but in fact I want to make them as zil devices. Any remedy solutions? Many thanks. Fred
Edward Ned Harvey
2011-Sep-19 11:16 UTC
[zfs-discuss] remove wrongly added device from zpool
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Fred Liu > > For my carelessness, I added two disks into a raid-z2 zpool as normal data > disk, but in fact > I want to make them as zil devices.That''s a huge bummer, and it''s the main reason why device removal has been a priority request for such a long time... There is no solution. You can only destroy & recreate your pool, or learn to live with it that way. Sorry...
> > That''s a huge bummer, and it''s the main reason why device removal has > been a > priority request for such a long time... There is no solution. You > can > only destroy & recreate your pool, or learn to live with it that way. > > Sorry... >Yeah, I also realized this when I send out this message. In NetApp, it is so easy to change raid group size. There is still a long way for zfs to go. Hope I can see that in the future. I also did another huge "mistake" which really brings me into the deep pain. I physically removed these two added devices for I though raidz2 can afford it. But now the whole pool corrupts. I don''t know where I can go ... Any help will be tremendously appreciated. Thanks. Fred
On 19 September, 2011 - Fred Liu sent me these 0,9K bytes:> > > > That''s a huge bummer, and it''s the main reason why device removal has > > been a > > priority request for such a long time... There is no solution. You > > can > > only destroy & recreate your pool, or learn to live with it that way. > > > > Sorry... > > > > Yeah, I also realized this when I send out this message. In NetApp, it is so > easy to change raid group size. There is still a long way for zfs to go. > Hope I can see that in the future. > > I also did another huge "mistake" which really brings me into the deep pain. > I physically removed these two added devices for I though raidz2 can afford it. > But now the whole pool corrupts. I don''t know where I can go ... > Any help will be tremendously appreciated.You can add mirrors to those lonely disks. /Tomas -- Tomas Forsman, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Edward Ned Harvey
2011-Sep-19 12:07 UTC
[zfs-discuss] remove wrongly added device from zpool
> From: Fred Liu [mailto:Fred_Liu at issi.com] > > Yeah, I also realized this when I send out this message. In NetApp, it isso> easy to change raid group size. There is still a long way for zfs to go. > Hope I can see that in the future.This one missing feature of ZFS, IMHO, does not result in "a long way for zfs to go" in relation to netapp. I shut off my netapp 2 years ago in favor of ZFS, because ZFS performs so darn much better, and has such immensely greater robustness. Try doing ndmp, cifs, nfs, iscsi on netapp (all extra licenses). Try experimenting with the new version of netapp to see how good it is (you can''t unless you buy a whole new box.) Try mirroring a production box onto a lower-cost secondary backup box (there is no such thing). Try storing your backup on disk and rotating your disks offsite. Try running any "normal" utilities - iostat, top, wireshark - you can''t. Try backing up with commercial or otherwise modular (agent-based) backup software. You can''t. You have to use CIFS/NFS/NDMP. Just try finding a public mailing list like this one where you can even so much as begin such a conversation about netapp... Been there done that, it''s not even in the same ballpark. etc etc. (end rant.) I hate netapp.> I also did another huge "mistake" which really brings me into the deeppain.> I physically removed these two added devices for I though raidz2 canafford> it. > But now the whole pool corrupts. I don''t know where I can go ... > Any help will be tremendously appreciated.Um... Wanna post your "zpool status" and "cat /etc/release" and "zpool upgrade"
> > This one missing feature of ZFS, IMHO, does not result in "a long way > for > zfs to go" in relation to netapp. I shut off my netapp 2 years ago in > favor > of ZFS, because ZFS performs so darn much better, and has such > immensely > greater robustness. Try doing ndmp, cifs, nfs, iscsi on netapp (all > extra > licenses). Try experimenting with the new version of netapp to see how > good > it is (you can''t unless you buy a whole new box.) Try mirroring a > production box onto a lower-cost secondary backup box (there is no such > thing). Try storing your backup on disk and rotating your disks > offsite. > Try running any "normal" utilities - iostat, top, wireshark - you can''t. > Try backing up with commercial or otherwise modular (agent-based) > backup > software. You can''t. You have to use CIFS/NFS/NDMP. > > Just try finding a public mailing list like this one where you can even > so > much as begin such a conversation about netapp... Been there done that, > it''s not even in the same ballpark. > > etc etc. (end rant.) I hate netapp. > >Yeah, It is kind of touchy topic, we may discuss more in the future. I want to focus on how to repair my pool first. ;-(> > Um... > > Wanna post your "zpool status" and "cat /etc/release" and "zpool > upgrade" >I exported the pool for I want to use zpool import -F to fix it. But now I get " one or more devices is currently unavailable Destroy and re-create the pool from a backup source." I use opensolaris b134 and zpool version 22. Thanks. Fred
> > You can add mirrors to those lonely disks. >Can it repair the pool? Thanks. Fred
On Mon, September 19, 2011 08:07, Edward Ned Harvey wrote:> This one missing feature of ZFS, IMHO, does not result in "a long way for > zfs to go" in relation to netapp. I shut off my netapp 2 years ago in > favor of ZFS, because ZFS performs so darn much better, and has such > immensely greater robustness. Try doing ndmp, cifs, nfs, iscsi on netapp > (all extra licenses). Try experimenting with the new version of netapp to > see how good it is (you can''t unless you buy a whole new box.)As another datum, at $WORK we''re going to Isilon. Our NetApp is being retired by the end of the year as it just can''t handle the load of HPC. We also have the regular assortment of web, mail, code repositories, etc., VMs that also live on Isilon. We''re quite happy, especially with the more recent Isilon hardware that uses SSDs to store/cache metadata. NFS and CIFS are quite good, but we haven''t really tried their iSCSI stuff yet; they don''t have FC at all. We also have a bunch of Blue Arc, but find it much more finicky than Isilon. Perhaps Hitachi will help them stabilize things a bit. As for experimenting with NetApp, they do have a "simulator" that you can run in a VM if you wish (or actual hardware AFAICT). A bit more on topic, bp* rewrite has been a long-time coming, and AFAICT, it won''t be in Solaris 11. As it stands, I don''t care much about changing RAID levels, but not being able to remove a mistakenly added device is something is becoming more and more conspicuous. For better or worse I''m not doing as much Solaris stuff (esp. with the new Ellison pricing model), but still pay attention to what''s going on, and this (missing) feature is one of those "WTF?" things that is the fly in the otherwise very tasty soup that is ZFS.
Edward Ned Harvey
2011-Sep-19 13:25 UTC
[zfs-discuss] remove wrongly added device from zpool
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Fred Liu > > For my carelessness, I added two disks into a raid-z2 zpool as normal data > disk,> -----Original Message----- > From: Fred Liu [mailto:Fred_Liu at issi.com] > > I also did another huge "mistake" which really brings me into the deeppain.> I physically removed these two added devices for I though raidz2 canafford> it.So... You accidentally added non-redundant disks to a pool. They were not part of the raidz2, so the redundancy in the raidz2 did not help you. You removed the non-redundant disks, and now the pool is faulted. The only thing you can do is: Add the disks back to the pool (re-insert them to the system). Then you should be able to import the pool. Now, you don''t want these devices in the pool. You must either destroy & recreate your pool, or add redundancy to your non-redundant devices.
> > So... You accidentally added non-redundant disks to a pool. They were > not > part of the raidz2, so the redundancy in the raidz2 did not help you. > You > removed the non-redundant disks, and now the pool is faulted. > > The only thing you can do is: > Add the disks back to the pool (re-insert them to the system). Then > you > should be able to import the pool. > > Now, you don''t want these devices in the pool. You must either destroy > & > recreate your pool, or add redundancy to your non-redundant devices. >Yes. I have connected them back to server. But it does not help. I am really sad now...
On Mon, Sep 19, 2011 at 9:29 AM, Fred Liu <Fred_Liu at issi.com> wrote:> Yes. I have connected them back to server. But it does not help. > I am really sad now...I cringed a little when I read the thread title. I did this on accident once as well, but "lucky" for me, I had enough scratch storage around in various sizes to cobble together a JBOD (risky) and use it as a holding area for my data while I remade the pool. I''m a home user and only have around 21TB or so, so it was feasible for me. Probably not so feasible for you enterprise guys with 1000s of users and 100s of filesystems! --khd
Edward Ned Harvey
2011-Sep-19 13:48 UTC
[zfs-discuss] remove wrongly added device from zpool
> From: Krunal Desai [mailto:movszx at gmail.com] > > On Mon, Sep 19, 2011 at 9:29 AM, Fred Liu <Fred_Liu at issi.com> wrote: > > Yes. I have connected them back to server. But it does not help. > > I am really sad now...I''ll tell you what does not help. This email. Now that you know what you''re trying to do, why don''t you post the results of your "zpool import" command? How about an error message, and how you''re trying to go about fixing your pool? Nobody here can help you without information.> I cringed a little when I read the thread title. I did this on > accident once as well, but "lucky" for me, I had enough scratch > storage around in various sizes to cobble together a JBOD (risky) and > use it as a holding area for my data while I remade the pool. > > I''m a home user and only have around 21TB or so, so it was feasible > for me. Probably not so feasible for you enterprise guys with 1000s of > users and 100s of filesystems!No enterprise guys with 1000s of users and 100s of filesystems are making this mistake. Even if it does happen, on a pool that significant, the obvious response is to add redundancy instead of recreating the pool.
> > I''ll tell you what does not help. This email. Now that you know what > you''re trying to do, why don''t you post the results of your "zpool > import" command? How about an error message, and how you''re trying to > go about fixing your pool? Nobody here can help you without > information. > >User tty login@ idle JCPU PCPU what root console 9:25pm w root at cn03:~# df Filesystem 1K-blocks Used Available Use% Mounted on rpool/ROOT/opensolaris 94109412 6880699 87228713 8% / swap 108497952 344 108497608 1% /etc/svc/volatile /usr/lib/libc/libc_hwcap1.so.1 94109412 6880699 87228713 8% /lib/libc.so.1 swap 108497616 8 108497608 1% /tmp swap 108497688 80 108497608 1% /var/run rpool/export 46864 23 46841 1% /export rpool/export/home 46864 23 46841 1% /export/home rpool/export/home/fred 48710 5300 43410 11% /export/home/fred rpool 102155158 80 102155078 1% /rpool root at cn03:~# !z zpool import cn03 cannot import ''cn03'': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. Thanks. Fred
I also used zpool import -fFX cn03 in b134 and b151a(via live SX11 live cd). It resulted a core dump and reboot after about 15 min. I can see all the leds are blinking on the HDD within this 15 min. Can replacing empty ZIL devices help? Thanks. Fred> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 19, 2011 21:54 > To: ''Edward Ned Harvey''; ''Krunal Desai'' > Cc: zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > > > > I''ll tell you what does not help. This email. Now that you know > what > > you''re trying to do, why don''t you post the results of your "zpool > > import" command? How about an error message, and how you''re trying > to > > go about fixing your pool? Nobody here can help you without > > information. > > > > > User tty login@ idle JCPU PCPU what > root console 9:25pm w > root at cn03:~# df > Filesystem 1K-blocks Used Available Use% Mounted on > rpool/ROOT/opensolaris > 94109412 6880699 87228713 8% / > swap 108497952 344 108497608 1% > /etc/svc/volatile > /usr/lib/libc/libc_hwcap1.so.1 > 94109412 6880699 87228713 8% /lib/libc.so.1 > swap 108497616 8 108497608 1% /tmp > swap 108497688 80 108497608 1% /var/run > rpool/export 46864 23 46841 1% /export > rpool/export/home 46864 23 46841 1% /export/home > rpool/export/home/fred > 48710 5300 43410 11% > /export/home/fred > rpool 102155158 80 102155078 1% /rpool > root at cn03:~# !z > zpool import cn03 > cannot import ''cn03'': one or more devices is currently unavailable > Destroy and re-create the pool from > a backup source. > > Thanks. > > Fred > > >
The core dump: r10: ffffff19a5592000 r11: 0 r12: 0 r13: 0 r14: 0 r15: ffffff00ba4a5c60 fsb: fffffd7fff172a00 gsb: ffffff19a5592000 ds: 0 es: 0 fs: 0 gs: 0 trp: e err: 0 rip: fffffffff782f81a cs: 30 rfl: 10246 rsp: ffffff00b9bf0a40 ss: 38 ffffff00b9bf0830 unix:die+10f () ffffff00b9bf0940 unix:trap+177b () ffffff00b9bf0950 unix:cmntrap+e6 () ffffff00b9bf0ab0 procfs:prchoose+72 () ffffff00b9bf0b00 procfs:prgetpsinfo+2b () ffffff00b9bf0ce0 procfs:pr_read_psinfo+4e () ffffff00b9bf0d30 procfs:prread+72 () ffffff00b9bf0da0 genunix:fop_read+6b () ffffff00b9bf0f00 genunix:pread+22c () ffffff00b9bf0f10 unix:brand_sys_syscall+20d () syncing file systems... done dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:17 100% done 100% done: 1041082 pages dumped, dump succeeded rebooting...> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 19, 2011 22:00 > To: Fred Liu; ''Edward Ned Harvey''; ''Krunal Desai'' > Cc: ''zfs-discuss at opensolaris.org'' > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > I also used zpool import -fFX cn03 in b134 and b151a(via live SX11 live > cd). It resulted a core dump and reboot after about 15 min. > I can see all the leds are blinking on the HDD within this 15 min. > Can replacing empty ZIL devices help? > > Thanks. > > Fred > > -----Original Message----- > > From: Fred Liu > > Sent: ???, ?? 19, 2011 21:54 > > To: ''Edward Ned Harvey''; ''Krunal Desai'' > > Cc: zfs-discuss at opensolaris.org > > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > > > > > > > I''ll tell you what does not help. This email. Now that you know > > what > > > you''re trying to do, why don''t you post the results of your "zpool > > > import" command? How about an error message, and how you''re trying > > to > > > go about fixing your pool? Nobody here can help you without > > > information. > > > > > > > > User tty login@ idle JCPU PCPU what > > root console 9:25pm w > > root at cn03:~# df > > Filesystem 1K-blocks Used Available Use% Mounted on > > rpool/ROOT/opensolaris > > 94109412 6880699 87228713 8% / > > swap 108497952 344 108497608 1% > > /etc/svc/volatile > > /usr/lib/libc/libc_hwcap1.so.1 > > 94109412 6880699 87228713 8% > /lib/libc.so.1 > > swap 108497616 8 108497608 1% /tmp > > swap 108497688 80 108497608 1% /var/run > > rpool/export 46864 23 46841 1% /export > > rpool/export/home 46864 23 46841 1% /export/home > > rpool/export/home/fred > > 48710 5300 43410 11% > > /export/home/fred > > rpool 102155158 80 102155078 1% /rpool > > root at cn03:~# !z > > zpool import cn03 > > cannot import ''cn03'': one or more devices is currently unavailable > > Destroy and re-create the pool from > > a backup source. > > > > Thanks. > > > > Fred > > > > > >
On Sep 19, 2011, at 12:10 AM, Fred Liu <Fred_Liu at issi.com> wrote:> Hi, > > For my carelessness, I added two disks into a raid-z2 zpool as normal data disk, but in fact > I want to make them as zil devices.You don''t mention which OS you are using, but for the past 5 years of [Open]Solaris releases, the system prints a warning message and will not allow this to occur without using the force option (-f). -- richard
I use opensolaris b134. Thanks. Fred> -----Original Message----- > From: Richard Elling [mailto:richard.elling at gmail.com] > Sent: ???, ?? 19, 2011 22:21 > To: Fred Liu > Cc: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] remove wrongly added device from zpool > > On Sep 19, 2011, at 12:10 AM, Fred Liu <Fred_Liu at issi.com> wrote: > > > Hi, > > > > For my carelessness, I added two disks into a raid-z2 zpool as normal > data disk, but in fact > > I want to make them as zil devices. > > You don''t mention which OS you are using, but for the past 5 years of > [Open]Solaris > releases, the system prints a warning message and will not allow this > to occur > without using the force option (-f). > -- richard >
> > You don''t mention which OS you are using, but for the past 5 years of > [Open]Solaris > releases, the system prints a warning message and will not allow this > to occur > without using the force option (-f). > -- richard >Yes. There is a warning message, I used zpool add -f. Thanks. Fred
I get some good progress like following: zpool import pool: cn03 id: 1907858070511204110 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: cn03 UNAVAIL missing device raidz2-0 ONLINE c4t5000C5000970B70Bd0 ONLINE c4t5000C5000972C693d0 ONLINE c4t5000C500097009DBd0 ONLINE c4t5000C500097040BFd0 ONLINE c4t5000C5000970727Fd0 ONLINE c4t5000C50009707487d0 ONLINE c4t5000C50009724377d0 ONLINE c4t5000C50039F0B447d0 ONLINE c22t3d0 ONLINE c4t50015179591C238Fd0 ONLINE logs c22t4d0 ONLINE c22t5d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. Any suggestions? Thanks. Fred> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 19, 2011 22:28 > To: ''Richard Elling'' > Cc: zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > > > > > You don''t mention which OS you are using, but for the past 5 years of > > [Open]Solaris > > releases, the system prints a warning message and will not allow this > > to occur > > without using the force option (-f). > > -- richard > > > Yes. There is a warning message, I used zpool add -f. > > Thanks. > > Fred
And: format Searching for disks...done c22t2d0: configured with capacity of 1.77GB AVAILABLE DISK SELECTIONS: 0. c4t5000C5003AC39D5Fd0 <SEAGATE-ST3600057SS-ES64-558.91GB> /scsi_vhci/disk at g5000c5003ac39d5f 1. c4t5000C50039F0B447d0 <SEAGATE-ST3600057SS-ES64-558.91GB> /scsi_vhci/disk at g5000c50039f0b447 2. c4t5000C5000970B70Bd0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c5000970b70b 3. c4t5000C5000972C693d0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c5000972c693 4. c4t5000C500097009DBd0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c500097009db 5. c4t5000C500097040BFd0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c500097040bf 6. c4t5000C5000970727Fd0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c5000970727f 7. c4t5000C50009724377d0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c50009724377 8. c4t5000C50009707487d0 <SEAGATE-ST3600057SS-ES62-558.91GB> /scsi_vhci/disk at g5000c50009707487 9. c4t50015179591C238Fd0 <ATA-INTEL SSDSA2M160-02HA-149.05GB> /scsi_vhci/disk at g50015179591c238f 10. c4t500151795910D221d0 <DEFAULT cyl 24915 alt 2 hd 224 sec 56> /scsi_vhci/disk at g500151795910d221 11. c22t2d0 <ATA-ANS9010_2NNN2NNN-_200 cyl 908 alt 2 hd 128 sec 32> /pci at 0,0/pci15d9,400 at 1f,2/disk at 2,0 12. c22t3d0 <ATA-ANS9010_2NNN2NNN-_200-1.78GB> /pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0 13. c22t4d0 <ATA-ANS9010_2NNN2NNN-_200-1.78GB> /pci at 0,0/pci15d9,400 at 1f,2/disk at 4,0 14. c22t5d0 <ATA-ANS9010_2NNN2NNN-_200-1.78GB> /pci at 0,0/pci15d9,400 at 1f,2/disk at 5,0> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 19, 2011 23:35 > To: Fred Liu; Richard Elling > Cc: zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > I get some good progress like following: > > zpool import > pool: cn03 > id: 1907858070511204110 > state: UNAVAIL > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-6X > config: > > cn03 UNAVAIL missing device > raidz2-0 ONLINE > c4t5000C5000970B70Bd0 ONLINE > c4t5000C5000972C693d0 ONLINE > c4t5000C500097009DBd0 ONLINE > c4t5000C500097040BFd0 ONLINE > c4t5000C5000970727Fd0 ONLINE > c4t5000C50009707487d0 ONLINE > c4t5000C50009724377d0 ONLINE > c4t5000C50039F0B447d0 ONLINE > c22t3d0 ONLINE > c4t50015179591C238Fd0 ONLINE > logs > c22t4d0 ONLINE > c22t5d0 ONLINE > > Additional devices are known to be part of this pool, though > their > exact configuration cannot be determined. > > Any suggestions? > > Thanks. > > Fred > > > -----Original Message----- > > From: Fred Liu > > Sent: ???, ?? 19, 2011 22:28 > > To: ''Richard Elling'' > > Cc: zfs-discuss at opensolaris.org > > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > > > > > > > > > You don''t mention which OS you are using, but for the past 5 years > of > > > [Open]Solaris > > > releases, the system prints a warning message and will not allow > this > > > to occur > > > without using the force option (-f). > > > -- richard > > > > > Yes. There is a warning message, I used zpool add -f. > > > > Thanks. > > > > Fred
On Sep 19, 2011, at 8:34 AM, Fred Liu wrote:> I get some good progress like following: > > zpool import > pool: cn03 > id: 1907858070511204110 > state: UNAVAIL > status: One or more devices are missing from the system. > action: The pool cannot be imported. Attach the missing > devices and try again. > see: http://www.sun.com/msg/ZFS-8000-6X > config: > > cn03 UNAVAIL missing device > raidz2-0 ONLINE > c4t5000C5000970B70Bd0 ONLINE > c4t5000C5000972C693d0 ONLINE > c4t5000C500097009DBd0 ONLINE > c4t5000C500097040BFd0 ONLINE > c4t5000C5000970727Fd0 ONLINE > c4t5000C50009707487d0 ONLINE > c4t5000C50009724377d0 ONLINE > c4t5000C50039F0B447d0 ONLINE > c22t3d0 ONLINE > c4t50015179591C238Fd0 ONLINE > logs > c22t4d0 ONLINE > c22t5d0 ONLINE > > Additional devices are known to be part of this pool, though their > exact configuration cannot be determined. > > Any suggestions?For each disk, look at the output of "zdb -l /dev/rdsk/DISKNAMEs0". 1. Confirm that each disk provides 4 labels. 2. Build the vdev tree by hand and look to see which disk is missing This can be tedious and time consuming. -- richard> > Thanks. > > Fred > >> -----Original Message----- >> From: Fred Liu >> Sent: ???, ?? 19, 2011 22:28 >> To: ''Richard Elling'' >> Cc: zfs-discuss at opensolaris.org >> Subject: RE: [zfs-discuss] remove wrongly added device from zpool >> >> >>> >>> You don''t mention which OS you are using, but for the past 5 years of >>> [Open]Solaris >>> releases, the system prints a warning message and will not allow this >>> to occur >>> without using the force option (-f). >>> -- richard >>> >> Yes. There is a warning message, I used zpool add -f. >> >> Thanks. >> >> Fred-- ZFS and performance consulting http://www.RichardElling.com VMworld Copenhagen, October 17-20 OpenStorage Summit, San Jose, CA, October 24-27 LISA ''11, Boston, MA, December 4-9
> > For each disk, look at the output of "zdb -l /dev/rdsk/DISKNAMEs0". > 1. Confirm that each disk provides 4 labels. > 2. Build the vdev tree by hand and look to see which disk is missing > > This can be tedious and time consuming.Do I need to export the pool first? Can you give more details about #2 -- " Build the vdev tree by hand and look to see which disk is missing"? Thanks. Fred
On Sep 19, 2011, at 9:16 AM, Fred Liu wrote:>> >> For each disk, look at the output of "zdb -l /dev/rdsk/DISKNAMEs0". >> 1. Confirm that each disk provides 4 labels. >> 2. Build the vdev tree by hand and look to see which disk is missing >> >> This can be tedious and time consuming. > > Do I need to export the pool first?No, but your pool is not imported.> Can you give more details about #2 -- " Build the vdev tree by hand and look to see which disk is missing"?The label, as displayed by "zdb -l" contains the heirarchy of the expected pool config. The contents are used to build the output you see in the "zpool import" or "zpool status" commands. zpool is complaining that it cannot find one of these disks, so look at the labels on the disks to determine what is or is not missing. The next steps depend on this knowledge. -- richard -- ZFS and performance consulting http://www.RichardElling.com VMworld Copenhagen, October 17-20 OpenStorage Summit, San Jose, CA, October 24-27 LISA ''11, Boston, MA, December 4-9
> > No, but your pool is not imported. >YES. I see.> and look to see which disk is missing"? > > The label, as displayed by "zdb -l" contains the heirarchy of the > expected pool config. > The contents are used to build the output you see in the "zpool import" > or "zpool status" > commands. zpool is complaining that it cannot find one of these disks, > so look at the > labels on the disks to determine what is or is not missing. The next > steps depend on > this knowledge.zdb -l /dev/rdsk/c22t2d0s0 cannot open ''/dev/rdsk/c22t2d0s0'': I/O error root at cn03:~# zdb -l /dev/rdsk/c22t3d0s0 -------------------------------------------- LABEL 0 -------------------------------------------- version: 22 name: ''cn03'' state: 0 txg: 18269872 pool_guid: 1907858070511204110 hostid: 13564652 hostname: ''cn03'' top_guid: 11074483144412112931 guid: 11074483144412112931 vdev_children: 6 vdev_tree: type: ''disk'' id: 1 guid: 11074483144412112931 path: ''/dev/dsk/c22t3d0s0'' devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' whole_disk: 1 metaslab_array: 37414 metaslab_shift: 24 ashift: 9 asize: 1895563264 is_log: 0 create_txg: 18269863 -------------------------------------------- LABEL 1 -------------------------------------------- version: 22 name: ''cn03'' state: 0 txg: 18269872 pool_guid: 1907858070511204110 hostid: 13564652 hostname: ''cn03'' top_guid: 11074483144412112931 guid: 11074483144412112931 vdev_children: 6 vdev_tree: type: ''disk'' id: 1 guid: 11074483144412112931 path: ''/dev/dsk/c22t3d0s0'' devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' whole_disk: 1 metaslab_array: 37414 metaslab_shift: 24 ashift: 9 asize: 1895563264 is_log: 0 create_txg: 18269863 -------------------------------------------- LABEL 1 -------------------------------------------- version: 22 name: ''cn03'' state: 0 txg: 18269872 pool_guid: 1907858070511204110 hostid: 13564652 hostname: ''cn03'' top_guid: 11074483144412112931 guid: 11074483144412112931 vdev_children: 6 vdev_tree: type: ''disk'' id: 1 guid: 11074483144412112931 path: ''/dev/dsk/c22t3d0s0'' devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' whole_disk: 1 metaslab_array: 37414 metaslab_shift: 24 ashift: 9 asize: 1895563264 is_log: 0 create_txg: 18269863 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3 c22t2d0 and c22t3d0 are the devices I physically removed and connected back to the server. How can I fix them? Thanks. Fred
more below? On Sep 19, 2011, at 9:51 AM, Fred Liu wrote:>> >> No, but your pool is not imported. >> > > YES. I see. >> and look to see which disk is missing"? >> >> The label, as displayed by "zdb -l" contains the heirarchy of the >> expected pool config. >> The contents are used to build the output you see in the "zpool import" >> or "zpool status" >> commands. zpool is complaining that it cannot find one of these disks, >> so look at the >> labels on the disks to determine what is or is not missing. The next >> steps depend on >> this knowledge. > > zdb -l /dev/rdsk/c22t2d0s0 > cannot open ''/dev/rdsk/c22t2d0s0'': I/O errorIs this disk supposed to be available? You might need to check the partition table, if one exists, to determine if s0 has a non-zero size.> root at cn03:~# zdb -l /dev/rdsk/c22t3d0s0 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > version: 22 > name: ''cn03'' > state: 0 > txg: 18269872 > pool_guid: 1907858070511204110 > hostid: 13564652 > hostname: ''cn03'' > top_guid: 11074483144412112931 > guid: 11074483144412112931 > vdev_children: 6 > vdev_tree: > type: ''disk'' > id: 1 > guid: 11074483144412112931 > path: ''/dev/dsk/c22t3d0s0'' > devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' > phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' > whole_disk: 1 > metaslab_array: 37414 > metaslab_shift: 24 > ashift: 9 > asize: 1895563264 > is_log: 0 > create_txg: 18269863 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version: 22 > name: ''cn03'' > state: 0 > txg: 18269872 > pool_guid: 1907858070511204110 > hostid: 13564652 > hostname: ''cn03'' > top_guid: 11074483144412112931 > guid: 11074483144412112931 > vdev_children: 6 > vdev_tree: > type: ''disk'' > id: 1 > guid: 11074483144412112931 > path: ''/dev/dsk/c22t3d0s0'' > devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' > phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' > whole_disk: 1 > metaslab_array: 37414 > metaslab_shift: 24 > ashift: 9 > asize: 1895563264 > is_log: 0 > create_txg: 18269863 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > version: 22 > name: ''cn03'' > state: 0 > txg: 18269872 > pool_guid: 1907858070511204110 > hostid: 13564652 > hostname: ''cn03'' > top_guid: 11074483144412112931 > guid: 11074483144412112931 > vdev_children: 6 > vdev_tree: > type: ''disk'' > id: 1 > guid: 11074483144412112931 > path: ''/dev/dsk/c22t3d0s0'' > devid: ''id1,sd at s4154412020202020414e53393031305f324e4e4e324e4e4e2020202020202020353632383637390000005f31/a'' > phys_path: ''/pci at 0,0/pci15d9,400 at 1f,2/disk at 3,0:a'' > whole_disk: 1 > metaslab_array: 37414 > metaslab_shift: 24 > ashift: 9 > asize: 1895563264 > is_log: 0 > create_txg: 18269863 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > failed to unpack label 2 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > failed to unpack label 3This is a bad sign, but can be recoverable, depending on how you got here. zdb is saying that it could not find labels at the end of the disk. Label 2 and label 3 are 256KB each, located at the end of the disk, aligned to 256KB boundary. zpool import is smarter than zdb in these cases, and can often recover from it -- up to the loss of all 4 labels, but you need to make sure that the partition tables look reasonable and haven''t changed.> c22t2d0 and c22t3d0 are the devices I physically removed and connected back to the server. > How can I fix them?Unless I''m mistaken, these are ACARD SSDs that have an optional CF backup. Let''s hope that the CF backup worked. -- richard -- ZFS and performance consulting http://www.RichardElling.com VMworld Copenhagen, October 17-20 OpenStorage Summit, San Jose, CA, October 24-27 LISA ''11, Boston, MA, December 4-9
> -----Original Message----- > From: Richard Elling [mailto:richard.elling at gmail.com] > Sent: ???, ?? 20, 2011 3:57 > To: Fred Liu > Cc: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] remove wrongly added device from zpool > > more below? > > On Sep 19, 2011, at 9:51 AM, Fred Liu wrote: > > Is this disk supposed to be available? > You might need to check the partition table, if one exists, to > determine if > s0 has a non-zero size. >Yes. I use format to write an EFI label to it. Now this error is gone. But all four label are failed to unpack under "zdb -l" now.> > This is a bad sign, but can be recoverable, depending on how you got > here. zdb is saying > that it could not find labels at the end of the disk. Label 2 and label > 3 are 256KB each, located > at the end of the disk, aligned to 256KB boundary. zpool import is > smarter than zdb in these > cases, and can often recover from it -- up to the loss of all 4 labels, > but you need to make sure > that the partition tables look reasonable and haven''t changed. >I have tried zpool import -fFX cn03. But it will do core-dump and reboot about 1 hour later.> > Unless I''m mistaken, these are ACARD SSDs that have an optional CF > backup. Let''s hope > that the CF backup worked.Yes. It is ACARD. You mean push the "restore from CF" button to see what will happen? Thanks for your nice help! Fred
zdb -l /dev/rdsk/c22t2d0s0 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3> -----Original Message----- > From: Fred Liu > Sent: ???, ?? 20, 2011 4:06 > To: ''Richard Elling'' > Cc: zfs-discuss at opensolaris.org > Subject: RE: [zfs-discuss] remove wrongly added device from zpool > > > > > -----Original Message----- > > From: Richard Elling [mailto:richard.elling at gmail.com] > > Sent: ???, ?? 20, 2011 3:57 > > To: Fred Liu > > Cc: zfs-discuss at opensolaris.org > > Subject: Re: [zfs-discuss] remove wrongly added device from zpool > > > > more below? > > > > On Sep 19, 2011, at 9:51 AM, Fred Liu wrote: > > > > Is this disk supposed to be available? > > You might need to check the partition table, if one exists, to > > determine if > > s0 has a non-zero size. > > > > Yes. I use format to write an EFI label to it. Now this error is gone. > But all four label are failed to unpack under "zdb -l" now. > > > > > > This is a bad sign, but can be recoverable, depending on how you got > > here. zdb is saying > > that it could not find labels at the end of the disk. Label 2 and > label > > 3 are 256KB each, located > > at the end of the disk, aligned to 256KB boundary. zpool import is > > smarter than zdb in these > > cases, and can often recover from it -- up to the loss of all 4 > labels, > > but you need to make sure > > that the partition tables look reasonable and haven''t changed. > > > > I have tried zpool import -fFX cn03. But it will do core-dump and > reboot about 1 hour later. > > > > > Unless I''m mistaken, these are ACARD SSDs that have an optional CF > > backup. Let''s hope > > that the CF backup worked. > > Yes. It is ACARD. You mean push the "restore from CF" button to see > what will happen? > > > Thanks for your nice help! > > > Fred