Luc De Meyer
2009-Jul-23 09:53 UTC
[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???
Follow-up : happy end ... It took quite some thinkering but... i have my data back... I ended up starting without the troublesome zfs storage array, de-installed the iscsitartget software and re-installed it...just to have solaris boot without complaining about missing modules... That left me with a system that would boot as long as the storage was disconnected... Reconnecting it made the boot stop at the hostname. Then the disk activity light would flash every second or so forever... I then rebooted using milestone=none. That worked also with the storage attached! So now I was sure that some software process was causing a hangup (or what appeared to be a hangup.) I could now in milestone none verify that the pool was intact: and so it was... fortunately I had not broken the pool itself... all online with no errors to report. I then went to milestone-all which again made the system hang with the disk activity every second forever. I think the task doing this was devfsadm. I then "assumed on a gut feeling" that somehow the system was scanning or checking the pool. I left the system overnight in a desperate attempt because I calculated the 500GB checking to take about 8 hrs if the system would *really* scan everything. (I copied a 1 TB drive last week which took nearly 20 hrs, so I learned that sometimes I need to wait... copying these big disks takes a *lot* of time!) This morning I switched on the monitor and lo and behold : a login screen !!!! The store was there! Lesson for myself and others: you MUST wait at the hostname line: the system WILL eventually come online... but don''t ask how long it takes... I hate to think how long it would take if I had a 10TB system. (but then again, a file-system-check on an ext2 disk also takes forever...) I re-enabled the iscsitgtd and did a list : it saw one of the two targets ! (which was ok because I remembered that I had turned off the shareiscsi flag on the second share. I then went ahead and connected the system back into the network and "repaired" the iscsi-target on the virtual mainframe : WORKED ! Copied over the virtual disks to local store so I can at least start up these servers asap again. Then set the iscsishare on the second and most important share: OK! Listed the targets: THERE, BOTH! Repaired it''s connection too: WORKED...! I am copying everything away from the ZFS pools now, but my data is recovered... fortunately. I now have mixed feelings about the ordeal: yes Sun Solaris kept its promise: I did not loose my data. But the time and trouble it took to recover in this case (just to restart a system for example taking an overnight wait!) is something that a few of my customers would *seriously* dislike... But: a happy end after all... most important data rescued and 2nd important : I learned a lot in the process... Bye Luc De Meyer Belgium -- This message posted from opensolaris.org
Blake
2009-Jul-24 14:44 UTC
[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???
This sounds like a bug I hit - if you have zvols on your pool, and automatic snapshots enabled, the thousands of resultant snapshots have to be polled by devfsadm during boot, which take a long time - several seconds per zvol. I remove the auto-snapshot property from my zvols and the slow boot stopped. Blake On Thu, Jul 23, 2009 at 5:53 AM, Luc De Meyer<no-reply at opensolaris.org> wrote:> Follow-up : happy end ... > > It took quite some thinkering but... i have my data back... > > I ended up starting without the troublesome zfs storage array, de-installed the iscsitartget software and re-installed it...just to have solaris boot without complaining about missing modules... > > That left me with a system that would boot as long as the storage was disconnected... Reconnecting it made the boot stop at the hostname. Then the disk activity light would flash every second or so forever... I then rebooted using milestone=none. That worked also with the storage attached! So now I was sure that some software process was causing a hangup (or what appeared to be a hangup.) I could now in milestone none verify that the pool was intact: and so it was... fortunately I had not broken the pool itself... all online with no errors to report. > I then went to milestone-all which again made the system hang with the disk activity every second forever. I think the task doing this was devfsadm. I then "assumed on a gut feeling" that somehow the system was scanning or checking the pool. I left the system overnight in a desperate attempt because I calculated the 500GB checking to take about 8 hrs if the system would *really* scan everything. (I copied a 1 TB drive last week which took nearly 20 hrs, so I learned that sometimes I need to wait... copying these big disks takes a *lot* of time!) > > This morning I switched on the monitor and lo and behold : a login screen !!!! > The store was there! > > Lesson for myself and others: you MUST wait at the hostname line: the system WILL eventually come online... but don''t ask how long it takes... I hate to think how long it would take if I had a 10TB system. (but then again, a file-system-check on an ext2 disk also takes forever...) > > I re-enabled the iscsitgtd and did a list : it saw one of the two targets ! (which was ok because I remembered that I had turned off the shareiscsi flag on the second share. > I then went ahead and connected the system back into the network and "repaired" the iscsi-target on the virtual mainframe : WORKED ! Copied over the virtual disks to local store so I can at least start up these servers asap again. > Then set the iscsishare on the second and most important share: OK! Listed the targets: THERE, BOTH! Repaired it''s connection too: WORKED...! > > I am copying everything away from the ZFS pools now, but my data is recovered... fortunately. > > I now have mixed feelings about the ordeal: yes Sun Solaris kept its promise: I did not loose my data. But the time and trouble it took to recover in this case (just to restart a system for example taking an overnight wait!) is something that a few of my customers would *seriously* dislike... > > But: a happy end after all... most important data rescued and 2nd important : I learned a lot in the process... > > Bye > Luc De Meyer > Belgium > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >