thr3ads.net - zfs discuss - [zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ??? [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Luc De Meyer

2009-Jul-23 09:53 UTC

[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???

Follow-up : happy end ...

It took quite some thinkering but... i have my data back...

I ended up starting without the troublesome zfs storage array, de-installed the
iscsitartget software and re-installed it...just to have solaris boot without
complaining about missing modules...

That left me with a system that would boot as long as the storage was
disconnected... Reconnecting it made the boot stop at the hostname. Then the
disk activity light would flash every second or so forever... I then rebooted
using milestone=none. That worked also with the storage attached! So now I was
sure that some software process was causing a hangup (or what appeared to be a
hangup.) I could now in milestone none verify that the pool was intact: and so
it was... fortunately I had not broken the pool itself... all online with no
errors to report.
I then went to milestone-all which again made the system hang with the disk
activity every second forever. I think the task doing this was devfsadm. I then
"assumed on a gut feeling" that somehow the system was scanning or
checking the pool. I left the system overnight in a desperate attempt because I
calculated the 500GB checking to take about 8 hrs if the system would *really*
scan everything. (I copied a 1 TB drive last week which took nearly 20 hrs, so I
learned that sometimes I need to wait... copying these big disks takes a *lot*
of time!)

This morning I switched on the monitor and lo and behold : a login screen !!!!
The store was there!

Lesson for myself and others: you MUST wait at the hostname line: the system
WILL eventually come online... but don''t ask how long it takes... I
hate to think how long it would take if I had a 10TB system. (but then again, a
file-system-check on an ext2 disk also takes forever...)

I re-enabled the iscsitgtd and did a list : it saw one of the two targets !
(which was ok because I remembered that I had turned off the shareiscsi flag on
the second share.
I then went ahead and connected the system back into the network and
"repaired" the iscsi-target on the virtual mainframe : WORKED ! Copied
over the virtual disks to local store so I can at least start up these servers
asap again.
Then set the iscsishare on the second and most important share: OK! Listed the
targets: THERE, BOTH! Repaired it''s connection too: WORKED...!

I am copying everything away from the ZFS pools now, but my data is recovered...
fortunately.

I now have mixed feelings about the ordeal: yes Sun Solaris kept its promise: I
did not loose my data. But the time and trouble it took to recover in this case
(just to restart a system for example taking an overnight wait!) is something
that a few of my customers would *seriously* dislike...

But: a happy end after all... most important data rescued and 2nd important : I
learned a lot in the process...

Bye
Luc De Meyer
Belgium
-- 
This message posted from opensolaris.org

Blake

2009-Jul-24 14:44 UTC

head link

[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???

This sounds like a bug I hit - if you have zvols on your pool, and
automatic snapshots enabled, the thousands of resultant snapshots have
to be polled by devfsadm during boot, which take a long time - several
seconds per zvol.

I remove the auto-snapshot property from my zvols and the slow boot stopped.

Blake



On Thu, Jul 23, 2009 at 5:53 AM, Luc De Meyer<no-reply at opensolaris.org>
wrote:> Follow-up : happy end ...
>
> It took quite some thinkering but... i have my data back...
>
> I ended up starting without the troublesome zfs storage array, de-installed
the iscsitartget software and re-installed it...just to have solaris boot
without complaining about missing modules...
>
> That left me with a system that would boot as long as the storage was
disconnected... Reconnecting it made the boot stop at the hostname. Then the
disk activity light would flash every second or so forever... I then rebooted
using milestone=none. That worked also with the storage attached! So now I was
sure that some software process was causing a hangup (or what appeared to be a
hangup.) I could now in milestone none verify that the pool was intact: and so
it was... fortunately I had not broken the pool itself... all online with no
errors to report.
> I then went to milestone-all which again made the system hang with the disk
activity every second forever. I think the task doing this was devfsadm. I then
"assumed on a gut feeling" that somehow the system was scanning or
checking the pool. I left the system overnight in a desperate attempt because I
calculated the 500GB checking to take about 8 hrs if the system would *really*
scan everything. (I copied a 1 TB drive last week which took nearly 20 hrs, so I
learned that sometimes I need to wait... copying these big disks takes a *lot*
of time!)
>
> This morning I switched on the monitor and lo and behold : a login screen
!!!!
> The store was there!
>
> Lesson for myself and others: you MUST wait at the hostname line: the
system WILL eventually come online... but don''t ask how long it
takes... I hate to think how long it would take if I had a 10TB system. (but
then again, a file-system-check on an ext2 disk also takes forever...)
>
> I re-enabled the iscsitgtd and did a list : it saw one of the two targets !
(which was ok because I remembered that I had turned off the shareiscsi flag on
the second share.
> I then went ahead and connected the system back into the network and
"repaired" the iscsi-target on the virtual mainframe : WORKED ! Copied
over the virtual disks to local store so I can at least start up these servers
asap again.
> Then set the iscsishare on the second and most important share: OK! Listed
the targets: THERE, BOTH! Repaired it''s connection too: WORKED...!
>
> I am copying everything away from the ZFS pools now, but my data is
recovered... fortunately.
>
> I now have mixed feelings about the ordeal: yes Sun Solaris kept its
promise: I did not loose my data. But the time and trouble it took to recover in
this case (just to restart a system for example taking an overnight wait!) is
something that a few of my customers would *seriously* dislike...
>
> But: a happy end after all... most important data rescued and 2nd important
: I learned a lot in the process...
>
> Bye
> Luc De Meyer
> Belgium
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Apparently Analagous Threads

Search for more maybe matching threads

zfs discuss - Jul 2009 - why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???

[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???

[zfs-discuss] why is zpool import still hanging in opensolaris 2009.06 ??? no fix yet ???

Apparently Analagous Threads