Folks, I have a zpool with a raidz2 configuration which I''ve been switching between two machines - an old one with a hardware problem and a new one, which doesn''t have hardware issues, but has a different configuration . I''ve been trying to import the pool on the new machine, so I can back up the data, because the old (broken) machine resets (I don''t think it''s panicking, because there are no logged messages) every time I try to tar off the data from the ZFS. Unfortunately, the first time I tried to import the pool on the new machine, I didn''t have the right five drives in it, so it didn''t work. After I figured out that I was confused about which was the boot drive, I did get the five drives into the new machine and asked it to import the pool. It said that the pool could not be imported due to damaged devices or data. Which is slightly odd, since it had been mounting the pool fine on the broken machine before. I then moved the drives back into the old machine, figuring I''d at least copy some small stuff onto a USB stick (it only dies reading large files, apparently), but now the old machine can''t mount the pool either, and asking it to import gives the same message. It shows all five drives online, but says the pool is UNAVAIL due to insufficient replicas, and the raidz2 is UNAVAIL due to corrupted data. Must I resign myself to having lost this pool due to the hardware problems I''ve had, and restore such backups as I have on the new machine, or is there something that can be done to get the pool back online at least in degraded mode? Thanks in advance, --Terry. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081026/fd2900e9/attachment.html>
Terry Heatlie wrote:> Folks, > > I have a zpool with a raidz2 configuration which I''ve been switching > between two machines - an old one with a hardware problem and a new > one, which doesn''t have hardware issues, but has a different > configuration . I''ve been trying to import the pool on the new > machine, so I can back up the data, because the old (broken) machine > resets (I don''t think it''s panicking, because there are no logged > messages) every time I try to tar off the data from the ZFS. > > Unfortunately, the first time I tried to import the pool on the new > machine, I didn''t have the right five drives in it, so it didn''t work. > After I figured out that I was confused about which was the boot > drive, I did get the five drives into the new machine and asked it to > import the pool. It said that the pool could not be imported due to > damaged devices or data. Which is slightly odd, since it had been > mounting the pool fine on the broken machine before. > > I then moved the drives back into the old machine, figuring I''d at > least copy some small stuff onto a USB stick (it only dies reading > large files, apparently), but now the old machine can''t mount the pool > either, and asking it to import gives the same message. It shows all > five drives online, but says the pool is UNAVAIL due to insufficient > replicas, and the raidz2 is UNAVAIL due to corrupted data. > > Must I resign myself to having lost this pool due to the hardware > problems I''ve had, and restore such backups as I have on the new > machine, or is there something that can be done to get the pool back > online at least in degraded mode?Note: we''re also working on a troubleshooting wiki... need more days in the hour... You should try to read the labels from each device. zdb -l /dev/rdsk/... You should see 4 labels for each proper device. Here is my hypothesis: If you see a device which has only label 0 and 1, then it may be the case that the label has overlapping partitions. Why does this matter? Because under normal circumstances, the actual devices used for creating or importing the pool are stored in the /etc/zfs/zpool.cache file. When the system boots, it looks there first and will import the pools listed therein. When you export the pool, the zpool.cache entries for the pool are removed. If the pool is not in zpool.cache, then zpool import scans all of the devices found in /dev/dsk for valid pools. If you have overlapping partitions or slices, then a partially exposed vdev may be found. But since it won''t be complete, due to perhaps not being able to see the end of the device which is where labels 2 & 3 are located, then it will be marked as bad. The solution would be to reconcile the partions/slices using format. -- richard
Hi Terry Please could you post back to this forum the output from # zdb -l /dev/rdsk/... ... for each of the 5 drives in your raidz2. (maybe best as an attachment) Are you seeing labels with the error ''failed to unpack''? What is the reported ''status'' of your zpool? (You have not provided a ''zpool status'') Thanks Nigel Smith -- This message posted from opensolaris.org
> > > You should try to read the labels from each device. > zdb -l /dev/rdsk/... > > You should see 4 labels for each proper device.This is attached, and it does seem to bear out the hypothesis. However, the disk with only 2 labels looks the same to fdisk and prtvtoc as one which has all four labels. So I''m not sure what to do about that. Also, for Nigel, who asked what the zpool status is being reported as, here is some information: # zpool status no pools available # zpool import pool: mypool id: 15475782335153879655 state: UNAVAIL action: The pool cannot be imported due to damaged devices or data. config: mypool UNAVAIL insufficient replicas raidz2 UNAVAIL corrupted data c2d0p0 ONLINE replacing ONLINE 8531660876727254001 ONLINE c3d0p0 ONLINE c4d0p0 ONLINE c5d0p0 ONLINE c6d0p0 ONLINE #> > Here is my hypothesis: > > If you see a device which has only label 0 and 1, then it may > be the case that the label has overlapping partitions. Why does > this matter? Because under normal circumstances, the actual > devices used for creating or importing the pool are stored in the > /etc/zfs/zpool.cache file. When the system boots, it looks there > first and will import the pools listed therein. > > When you export the pool, the zpool.cache entries for the pool > are removed. > > If the pool is not in zpool.cache, then zpool import scans all of > the devices found in /dev/dsk for valid pools. If you have overlapping > partitions or slices, then a partially exposed vdev may be found. > But since it won''t be complete, due to perhaps not being able to > see the end of the device which is where labels 2 & 3 are located, > then it will be marked as bad. The solution would be to reconcile > the partions/slices using format. > -- richard > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081028/9ab0ed4b/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: label_info Type: application/octet-stream Size: 43777 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081028/9ab0ed4b/attachment.obj>
These are the symptoms of a shrinking device in a RAID-Z pool. You can try to run the attached script during the import to see if this the case. There''s a bug filed on this, but I don''t have it handy. - Eric On Sun, Oct 26, 2008 at 05:18:25PM -0700, Terry Heatlie wrote:> Folks, > I have a zpool with a raidz2 configuration which I''ve been switching between > two machines - an old one with a hardware problem and a new one, which > doesn''t have hardware issues, but has a different configuration . I''ve > been trying to import the pool on the new machine, so I can back up the > data, because the old (broken) machine resets (I don''t think it''s panicking, > because there are no logged messages) every time I try to tar off the data > from the ZFS. > > Unfortunately, the first time I tried to import the pool on the new > machine, I didn''t have the right five drives in it, so it didn''t work. > After I figured out that I was confused about which was the boot drive, I > did get the five drives into the new machine and asked it to import the > pool. It said that the pool could not be imported due to damaged devices or > data. Which is slightly odd, since it had been mounting the pool fine on > the broken machine before. > > I then moved the drives back into the old machine, figuring I''d at least > copy some small stuff onto a USB stick (it only dies reading large files, > apparently), but now the old machine can''t mount the pool either, and asking > it to import gives the same message. It shows all five drives online, but > says the pool is UNAVAIL due to insufficient replicas, and the raidz2 is > UNAVAIL due to corrupted data. > > Must I resign myself to having lost this pool due to the hardware problems > I''ve had, and restore such backups as I have on the new machine, or is there > something that can be done to get the pool back online at least in degraded > mode? > > Thanks in advance, > > --Terry.> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Fishworks http://blogs.sun.com/eschrock -------------- next part -------------- #!/sbin/dtrace -s #pragma D option quiet BEGIN { printf("run ''zpool import'' to generate trace\n\n"); } vdev_raidz_open:entry { printf("%d BEGIN RAIDZ OPEN\n", timestamp); printf("%d config asize = %d\n", timestamp, args[0]->vdev_asize); printf("%d config ashift = %d\n", timestamp, args[0]->vdev_top->vdev_ashift); self->child = 1; self->asize = args[1]; self->ashift = args[2]; } vdev_disk_open:entry /self->child/ { self->disk_asize = args[1]; self->disk_ashift = args[2]; } vdev_disk_open:return /self->child/ { printf("%d child[%d]: asize = %d, ashift = %d\n", timestamp, self->child - 1, *self->disk_asize, *self->disk_ashift); self->disk_asize = 0; self->disk_ashift = 0; self->child++; } vdev_raidz_open:return { printf("%d asize = %d\n", timestamp, *self->asize); printf("%d ashift = %d\n", timestamp, *self->ashift); printf("%d END RAIDZ OPEN\n", timestamp); self->child = 0; self->asize = 0; self->ashift = 0; }
Eric Schrock ?????:> These are the symptoms of a shrinking device in a RAID-Z pool. You can > try to run the attached script during the import to see if this the > case. There''s a bug filed on this, but I don''t have it handy.it''s 6753869 labeling/shrinking a disk in raid-z vdev makes pool un-importable victor> > - Eric > > On Sun, Oct 26, 2008 at 05:18:25PM -0700, Terry Heatlie wrote: >> Folks, >> I have a zpool with a raidz2 configuration which I''ve been switching between >> two machines - an old one with a hardware problem and a new one, which >> doesn''t have hardware issues, but has a different configuration . I''ve >> been trying to import the pool on the new machine, so I can back up the >> data, because the old (broken) machine resets (I don''t think it''s panicking, >> because there are no logged messages) every time I try to tar off the data >> from the ZFS. >> >> Unfortunately, the first time I tried to import the pool on the new >> machine, I didn''t have the right five drives in it, so it didn''t work. >> After I figured out that I was confused about which was the boot drive, I >> did get the five drives into the new machine and asked it to import the >> pool. It said that the pool could not be imported due to damaged devices or >> data. Which is slightly odd, since it had been mounting the pool fine on >> the broken machine before. >> >> I then moved the drives back into the old machine, figuring I''d at least >> copy some small stuff onto a USB stick (it only dies reading large files, >> apparently), but now the old machine can''t mount the pool either, and asking >> it to import gives the same message. It shows all five drives online, but >> says the pool is UNAVAIL due to insufficient replicas, and the raidz2 is >> UNAVAIL due to corrupted data. >> >> Must I resign myself to having lost this pool due to the hardware problems >> I''ve had, and restore such backups as I have on the new machine, or is there >> something that can be done to get the pool back online at least in degraded >> mode? >> >> Thanks in advance, >> >> --Terry.
I recently tried to import a b97 pool into a b98 upgraded version of that os, and it failed because of some bug. So maybe try eliminating that kind of problem by making sure to use the version that you know worked in the past. Maybe you already did this.> <div id="jive-html-wrapper-div"> > Folks,<div><br></div><div>I have a zpool with a > raidz2 configuration which I've been switching > between two machines - an old one with a hardware > problem and a new one, which doesn't have > hardware issues, but has a different configuration . > I've been trying to import the pool on the > new machine, so I can back up the data, because the > old (broken) machine resets (I don't think > it's panicking, because there are no logged > messages) every time I try to tar off the data from > the ZFS. </div> > <div><br></div><div> Unfortunately, the first > time I tried to import the pool on the new machine, I > didn't have the right five drives in it, so it > didn't work. After I figured out that I was > confused about which was the boot drive, I did get > the five drives into the new machine and asked it to > import the pool. It said that the pool could > not be imported due to damaged devices or data. > Which is slightly odd, since it had been > mounting the pool fine on the broken machine before. > </div> > <div><br></div><div>I then moved the drives back into > the old machine, figuring I'd at least copy some > small stuff onto a USB stick (it only dies reading > large files, apparently), but now the old machine > can't mount the pool either, and asking it to > import gives the same message. It shows all > five drives online, but says the pool is UNAVAIL due > to insufficient replicas, and the raidz2 is > UNAVAIL due to corrupted data.</div> > <div><br></div><div>Must I resign myself to having > lost this pool due to the hardware problems I've > had, and restore such backups as I have on the new > machine, or is there something that can be done to > get the pool back online at least in degraded > mode?</div> > <div><br></div><div>Thanks in > advance,</div><div><br></div><div>--Terry.</div> > > </div>_______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss-- This message posted from opensolaris.org
oops, meant to reply-all... ---------- Forwarded message ---------- From: Terry Heatlie <terry.heatlie at gmail.com> Date: Wed, Oct 29, 2008 at 8:14 PM Subject: Re: [zfs-discuss] zpool import problem To: Eric Schrock <eric.schrock at sun.com> well, this does seem to be the case: bash-3.2# dtrace -s raidz_open2.d run ''zpool import'' to generate trace 1145357764648 BEGIN RAIDZ OPEN 1145357764648 config asize = 1600340623360 1145357764648 config ashift = 9 1145358131986 child[0]: asize = 320071851520, ashift = 9 1145358861331 child[1]: asize = 400088457216, ashift = 9 1145396437606 child[2]: asize = 400088457216, ashift = 9 1145396891657 child[3]: asize = 320072933376, ashift = 9 1145397584944 child[4]: asize = 400087375360, ashift = 9 1145397920504 child[5]: asize = 400087375360, ashift = 9 1145398947963 asize = 1600335380480 1145398947963 ashift = 9 1145398947963 END RAIDZ OPEN But I still don''t see a difference between the partition maps of the drive with only 2 labels and a good one... c2 is bad, c4 is good... # prtvtoc /dev/dsk/c2d0p0 > /tmp/vtoc_c2 # prtvtoc /dev/dsk/c4d0p0 > /tmp/vtoc_c4 # diff /tmp/vtoc_c2 /tmp/vtoc_c4 1c1 < * /dev/dsk/c2d0p0 partition map ---> * /dev/dsk/c4d0p0 partition map# On Tue, Oct 28, 2008 at 3:53 AM, Eric Schrock <eric.schrock at sun.com> wrote:> These are the symptoms of a shrinking device in a RAID-Z pool. You can > try to run the attached script during the import to see if this the > case. There''s a bug filed on this, but I don''t have it handy. > > [...]-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081029/e5b93f4c/attachment.html>