After reinstalling a snv62 machine and upgrade a snv64a machine to snv70, I''m running into "no pools available to import" when I try to import two existing pools (that were previously mounted on these machines). On one host (a new install, I wiped out the snv62 install) the hint I''m getting is: Aug 20 22:59:35 imsfs fcp_online_child: Create devinfo for LU=fffffffec157e000 in the messages log. And on the other host, I''m getting similar warnings. I''ve put full copies of /var/adm/messages for these two hosts up at http://www.cepheid.org/~jeff/files/imsfs_messages.txt (fresh install) http://www.cepheid.org/~jeff/files/imsfs_mirror_messages.txt (upgrade from snv64a) I suspect that the cause might be related to http://bugs.opensolaris.org/view_bug.do?bug_id=5074082 , but possibly the warning is a red herring. I''m happy to repost this elsewhere if it is definitely not a zfs issue (I''m unsure of non-destructive ways to test), but if anyone has any debug clue they could throw me, I''d very much appreciate it. These machines are using qla2460 cards connected to a qlogic FC switch. The zfs pool member discs are pass-throughs from an ACNC JetStor array shelf, although there is one 3TB volume being presented, as well. Thanks, Jeff -- Jeff Bachtel (root at VPR,TAMU) http://www.cepheid.org/~jeff "The sciences, each straining in [finger jeff at cepheid.org for PGP key] its own direction, have hitherto harmed us little;" - HPL, TCoC
On Tue, Aug 21, 2007 at 01:39:26PM -0700, Richard Elling wrote:> Jeff Bachtel wrote: > >On Tue, Aug 21, 2007 at 09:38:17AM -0700, Richard Elling wrote: > >>can you see the LUN in format? > > > >Yes. Additionally, I can dd from the rdsk device to a file, and get > >enough data to confirm that the disk devices are members of the old > >zpools, and still have data. > > OK, I suspect the complaint is because ZFS sees the exact same ID > on both drives.Which ID would that be, and which drives? I''ve updated one of the systems to snv 72, and I''m still getting the "Create devinfo" warning, and the imports are still not going. cfgadm seems, sane, a la: imsfs-mirror:~> sudo cfgadm -la c2 Ap_Id Type Receptacle Occupant Condition c2 fc-fabric connected configured unknown c2::21000004d9600099 disk connected configured unknown c2::21000004d960cded disk connected configured unknown devinfo, once I figured out how to call it, also seems sane: imsfs-mirror:~> sudo devinfo -i /dev/rdsk/c2t21000004D960CDEDd7s2 /dev/rdsk/c2t21000004D960CDEDd7s2 0 0 32130 512 2 imsfs-mirror:~> sudo devinfo -p /dev/rdsk/c2t21000004D960CDEDd7s2 /dev/rdsk/c2t21000004D960CDEDd7s2 36 682 0 1465031610 0 5 As I mentioned, I can dd the raw device and get valid data (although I''ve no way to verify that ZFS metadata is at the beginning of the disks). Is there a documented example of using DTrace to examine what zpool is seeing/discarding while it scans for pools? One of these ZFS pools had its members totally renamed (the port used on the disk array got switched, and so the WWN changed), but the other should still be the same. I''m close to downgrading to snv_69 on one of the machines to see if it can import the pools, but after doing some research it seems that other people have had mysterious import problems that never resolved. (Richard, I''m sending this back to zfs-discuss, but only because I added more debug info that might mean something to someone else.) Jeff> -- richard-- Jeff Bachtel (root at VPR,TAMU) http://www.cepheid.org/~jeff "The sciences, each straining in [finger jeff at cepheid.org for PGP key] its own direction, have hitherto harmed us little;" - HPL, TCoC
What does ''zdb -l /dev/dsk/<device>s0'' show for each device? - Eric On Tue, Aug 21, 2007 at 04:45:40PM -0500, Jeff Bachtel wrote:> On Tue, Aug 21, 2007 at 01:39:26PM -0700, Richard Elling wrote: > > Jeff Bachtel wrote: > > >On Tue, Aug 21, 2007 at 09:38:17AM -0700, Richard Elling wrote: > > >>can you see the LUN in format? > > > > > >Yes. Additionally, I can dd from the rdsk device to a file, and get > > >enough data to confirm that the disk devices are members of the old > > >zpools, and still have data. > > > > OK, I suspect the complaint is because ZFS sees the exact same ID > > on both drives. > > Which ID would that be, and which drives? > > I''ve updated one of the systems to snv 72, and I''m still getting the > "Create devinfo" warning, and the imports are still not going. > > cfgadm seems, sane, a la: > > imsfs-mirror:~> sudo cfgadm -la c2 > Ap_Id Type Receptacle Occupant Condition > c2 fc-fabric connected configured unknown > c2::21000004d9600099 disk connected configured unknown > c2::21000004d960cded disk connected configured unknown > > devinfo, once I figured out how to call it, also seems sane: > imsfs-mirror:~> sudo devinfo -i /dev/rdsk/c2t21000004D960CDEDd7s2 > /dev/rdsk/c2t21000004D960CDEDd7s2 0 0 32130 512 > 2 > imsfs-mirror:~> sudo devinfo -p /dev/rdsk/c2t21000004D960CDEDd7s2 > /dev/rdsk/c2t21000004D960CDEDd7s2 36 682 0 > 1465031610 0 5 > > As I mentioned, I can dd the raw device and get valid data (although > I''ve no way to verify that ZFS metadata is at the beginning of the > disks). > > Is there a documented example of using DTrace to examine what zpool is > seeing/discarding while it scans for pools? One of these ZFS pools had > its members totally renamed (the port used on the disk array got > switched, and so the WWN changed), but the other should still be the > same. > > I''m close to downgrading to snv_69 on one of the machines to see if it > can import the pools, but after doing some research it seems that > other people have had mysterious import problems that never resolved. > > (Richard, I''m sending this back to zfs-discuss, but only because I > added more debug info that might mean something to someone else.) > > Jeff > > > > -- richard > > -- > Jeff Bachtel (root at VPR,TAMU) http://www.cepheid.org/~jeff > "The sciences, each straining in [finger jeff at cepheid.org for PGP key] > its own direction, have hitherto harmed us little;" - HPL, TCoC > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Tue, Aug 21, 2007 at 02:56:11PM -0700, Eric Schrock wrote:> What does ''zdb -l /dev/dsk/<device>s0'' show for each device?bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s0 cannot open ''/dev/dsk/c2t21000004D9600099d0s0'': I/O error bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s1 cannot open ''/dev/dsk/c2t21000004D9600099d0s1'': I/O error bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s2 -------------------------------------------- LABEL 0 -------------------------------------------- failed to unpack label 0 -------------------------------------------- LABEL 1 -------------------------------------------- failed to unpack label 1 -------------------------------------------- LABEL 2 -------------------------------------------- failed to unpack label 2 -------------------------------------------- LABEL 3 -------------------------------------------- failed to unpack label 3 With the same results for other devices (I went to s2 just to see if there was a non I/O error response). jeff> > - Eric
There are no ZFS-recognizable labels on this device. Did you explicitly create it on slice 2? From the looks of things, it seems like your disk label is corrupt... - Eric On Tue, Aug 21, 2007 at 05:00:37PM -0500, Jeff Bachtel wrote:> On Tue, Aug 21, 2007 at 02:56:11PM -0700, Eric Schrock wrote: > > What does ''zdb -l /dev/dsk/<device>s0'' show for each device? > > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s0 > cannot open ''/dev/dsk/c2t21000004D9600099d0s0'': I/O error > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s1 > cannot open ''/dev/dsk/c2t21000004D9600099d0s1'': I/O error > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s2 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > failed to unpack label 0 > -------------------------------------------- > LABEL 1 > -------------------------------------------- > failed to unpack label 1 > -------------------------------------------- > LABEL 2 > -------------------------------------------- > failed to unpack label 2 > -------------------------------------------- > LABEL 3 > -------------------------------------------- > failed to unpack label 3 > > With the same results for other devices (I went to s2 just to see if > there was a non I/O error response). > > jeff > > > > > - Eric-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Tue, Aug 21, 2007 at 03:09:47PM -0700, Eric Schrock wrote:> There are no ZFS-recognizable labels on this device. Did you explicitly > create it on slice 2? From the looks of things, it seems like your disk > label is corrupt...Not at all (to my recollection, I specified the devices as "c2t21000004D9600099d[0123etc.]" to the zpool create command). With this OS version, format is giving lines such as: 9. c2t21000004D9600099d0 <DEFAULT cyl 48638 alt 2 hd 255 sec 63> /pci at 0,0/pci10de,5d at e/pci1077,142 at 0/fp at 0,0/disk at w21000004d9600099,0 whereas, again to my recollection, previously the drive manufacturer was actually listed. This pool was raidz2, if I should have format do something in particular with one of the member disks, there should be some redundancy there. I did not touch or try to use these member disks during the OS upgrade and install, but is it likely the new or old installer would have done so, anyway? Jeff> > - Eric
jeff at cepheid.org said:> With this OS version, format is giving lines such as: > 9. c2t21000004D9600099d0 <DEFAULT cyl 48638 alt 2 hd 255 sec 63> > /pci at 0,0/pci10de,5d at e/pci1077,142 at 0/fp at 0,0/disk at w21000004d9600099,0 > whereas, again to my recollection, previously the drive manufacturer was > actually listed.Pardon me for butting in: What you describe is exactly what I''ve seen when format (or the scsi_vhci drivers, or whatever) sees a new drive/LUN without a valid label. It will fabricate up a default label, which defaults to the non-EFI flavor (SMC label, I think it''s called). How you got here, I can only guess, but until you get back to the "whole-disk" EFI label that zpool import" recognizes, you''re going to get nowhere. If the disk/LUN was really not reformatted, you might be able to restore the label by first converting the disk to EFI, and then copying a label from one of the working disks. Of course, if the label got lost, it doesn''t give one a lot of confidence about the rest of the disk''s contents.... Regards, Marion
> On Tue, Aug 21, 2007 at 02:56:11PM -0700, Eric Schrock wrote: > > What does ''zdb -l /dev/dsk/<device>s0'' show for each device? > > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s0 > cannot open ''/dev/dsk/c2t21000004D9600099d0s0'': I/O error > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s1 > cannot open ''/dev/dsk/c2t21000004D9600099d0s1'': I/O error > bash-3.00# zdb -l /dev/dsk/c2t21000004D9600099d0s2 > -------------------------------------------- > LABEL 0 > -------------------------------------------- > failed to unpack label 0Do you know how this pool was created? Did you explicitly pass in a disk slice? Most pools are configured on the disk, which creates an EFI label, but that doesn''t seem to be what you''re using. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
> If the disk/LUN was really not reformatted, you might be able to restore > the label by first converting the disk to EFI, and then copying a label > from one of the working disks.Well, you might get somewhere by putting an EFI label on (because then the top of the data portion will move down the disk a bit, hopefully onto the old label). (That''s certainly why I was asking how the original pool was created). But I don''t see how copying a label will do any good. Won''t that just confuse ZFS and make it think it''s talking to one of the other disks? -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
On Tue, Aug 21, 2007 at 03:55:59PM -0700, Marion Hakanson wrote:> jeff at cepheid.org said: > > With this OS version, format is giving lines such as: > > 9. c2t21000004D9600099d0 <DEFAULT cyl 48638 alt 2 hd 255 sec 63> > > /pci at 0,0/pci10de,5d at e/pci1077,142 at 0/fp at 0,0/disk at w21000004d9600099,0 > > whereas, again to my recollection, previously the drive manufacturer was > > actually listed. > > Pardon me for butting in: What you describe is exactly what I''ve seen > when format (or the scsi_vhci drivers, or whatever) sees a new drive/LUN > without a valid label. It will fabricate up a default label, which defaults > to the non-EFI flavor (SMC label, I think it''s called). How you got here, > I can only guess, but until you get back to the "whole-disk" EFI label that > zpool import" recognizes, you''re going to get nowhere. > > If the disk/LUN was really not reformatted, you might be able to restore > the label by first converting the disk to EFI, and then copying a label > from one of the working disks. > > Of course, if the label got lost, it doesn''t give one a lot of confidence > about the rest of the disk''s contents....I seriously owe you about... well, 1.5TB worth of restores and my social life worth of beers. This was absolutely the problem, and doing format -e and converting the label to EFI lets zdb -l on s0 return the information for the zpool. I''m in the process now of labelling all of the member disks, and then I''ll see if zpool import works. Seriously, thanks a ton for speaking up and sending off your email, this has saved me from an extremely frustrating evening. Jeff> > Regards, > > Marion > >-- Jeff Bachtel (root at VPR,TAMU) http://www.cepheid.org/~jeff "The sciences, each straining in [finger jeff at cepheid.org for PGP key] its own direction, have hitherto harmed us little;" - HPL, TCoC
ddunham at taos.com said:> But I don''t see how copying a label will do any good. Won''t that just > confuse ZFS and make it think it''s talking to one of the other disks?No, the disk label doesn''t contain any ZFS info, it just tells the disk drivers (scsi_vhci, in this case) where the disk slices start/end. Your typical ZFS pool will be made up of identical drives/components, so copying them is an easy way to make sure they''re all the same (that''s assuming you have a working label to copy). Regards, Marion
> ddunham at taos.com said: > > But I don''t see how copying a label will do any good. Won''t that just > > confuse ZFS and make it think it''s talking to one of the other disks? > > No, the disk label doesn''t contain any ZFS info, it just tells the disk > drivers (scsi_vhci, in this case) where the disk slices start/end. Your > typical ZFS pool will be made up of identical drives/components, so copying > them is an easy way to make sure they''re all the same (that''s assuming you > have a working label to copy).Ahh, yes. I misread the other message as suggesting to copy a ZFS label, not the disk/EFI label. Never mind! :-) -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >