thr3ads.net - zfs discuss - [zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7) [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Michael Kennedy

2005-Dec-15 22:26 UTC

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

I''m a proud owner of a shiny new Sun Ultra 20 Workstation (1 of 10),
and I ordered it with the standard SATA 80GB drive, and a secondary 250GB SATA
drive.

I''ve installed the machine with Nevada 27a, and proceeded to update
things the way I wanted them.

I used ''zpool'' to create a new datapool and assigned the 250GB
SATA drive.  I realize that this isn''t necessarily an optimal use case
with only the one drive assigned, but hey, I want to learn how zfs works and
what better way to do it than to use it.

So, ZFS works like a charm, exactly like the docs said it would.  I''m
creating filesystems, moving mountpoints, assigning quotas, etc....

Then I rebooted...  And my heart sunk when the BIOS POST froze after detecting
the drives, which it did so successfully.  After fiddling for a bit, trying to
understand what was going on, I started the dreary process of elimination. 
Lucky for me, my first inclination was to remove the drives, and add them back,
one by one, so the process of elimination didn''t last long.

The end result was that I pulled the 250GB SATA drive, and the machine posted. 
So like a good consumer, I assumed the drive had died, and proceeded to RMA it. 
In the meantime, I inserted another 250GB SATA drive from another Ultra20 yet to
be deployed (from one of the other 9 we have), and re-created my ZFS pool and
filesystems...  And rebooted.

Wasn''t I surprised to have the same problem?  Then a colleague a few
cubes over stuck his head up and said, "My machine won''t boot! 
WTF is wrong with this thing?"  To which I replied, "Did you ZFS your
data drive?", and his answer was "Yeah, why?  Is there a problem with
that?"

So, you see, a pattern is forming.  If we ZFS a drive, it won''t pass
BIOS POST and boot the machine.  I can''t find a reason for it in any
sunsolve/googling.

Does anyone here have the same experience?

Two drive models that have been used, both Sun Part 540-6521-01, are a Hitachi
Deskstar HDS722525VLSA80, and a Seagate ST3250823AS.

Machine is Sun Ultra 20 (Opteron 148, 2GB RAM) - Running Solaris 5.11 nv27a.

Any tips would be appreciated.  I will be trying nv28 next, but my breath, she
is not being held.  And I''m starting to pile up SATA drives that I
cannot use.

MK
This message posted from opensolaris.org

Dan Price

2005-Dec-15 22:56 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Thu 15 Dec 2005 at 02:26PM, Michael Kennedy wrote:> I''m a proud owner of a shiny new Sun Ultra 20 Workstation (1 of
10),
> and I ordered it with the standard SATA 80GB drive, and a secondary
> 250GB SATA drive.
> 
> I''ve installed the machine with Nevada 27a, and proceeded to
update
> things the way I wanted them.
> 
> I used ''zpool'' to create a new datapool and assigned the
250GB SATA
> drive.  I realize that this isn''t necessarily an optimal use case
with
> only the one drive assigned, but hey, I want to learn how zfs works
> and what better way to do it than to use it.
> 
> So, ZFS works like a charm, exactly like the docs said it would. 
I''m
> creating filesystems, moving mountpoints, assigning quotas, etc....
> 
> Then I rebooted...  And my heart sunk when the BIOS POST froze after
> detecting the drives, which it did so successfully.  After fiddling
> for a bit, trying to understand what was going on, I started the
> dreary process of elimination.  Lucky for me, my first inclination was
> to remove the drives, and add them back, one by one, so the process of
> elimination didn''t last long.
> 
> The end result was that I pulled the 250GB SATA drive, and the machine
> posted.  So like a good consumer, I assumed the drive had died, and
> proceeded to RMA it.  In the meantime, I inserted another 250GB SATA
> drive from another Ultra20 yet to be deployed (from one of the other 9
> we have), and re-created my ZFS pool and filesystems...  And rebooted.
> 
> Wasn''t I surprised to have the same problem?  Then a colleague a
few
> cubes over stuck his head up and said, "My machine won''t
boot!  WTF is
> wrong with this thing?"  To which I replied, "Did you ZFS your
data
> drive?", and his answer was "Yeah, why?  Is there a problem with
> that?"
I had the same problem with my U20--- precisely this sequence.  Let''s
get a bug filed.

        -dp

-- 
Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp

Bill Sommerfeld

2005-Dec-15 23:13 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Thu, 2005-12-15 at 17:56, Dan Price wrote:> I had the same problem with my U20--- precisely this sequence. 
Let''s
> get a bug filed.
sounds to me like the EFI GPT label used by ZFS when you give it the
whole disk is somehow toxic to the BIOS...

						- Bill

Dan Price

2005-Dec-15 23:21 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Fri 16 Dec 2005 at 10:22AM, Nathan Kroenert wrote:> Silly question (I don''t have a U20... :(  )...
> 
> Does the bios have boot sector monitoring (virus checking) capabilities, 
> and is it possible the label ZFS uses is hurting it''s brain?
> 
> Just a thought... :)
It has a virus protection option in the BIOS, but it is off, at least
on mine.  But I''ll dork with the BIOS settings now that I realize it
isn''t a bad drive.

        -dp

-- 
Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp

Nathan Kroenert

2005-Dec-15 23:22 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

Silly question (I don''t have a U20... :(  )...

Does the bios have boot sector monitoring (virus checking) capabilities, 
and is it possible the label ZFS uses is hurting it''s brain?

Just a thought... :)

Nathan.

Dan Price wrote:
> On Thu 15 Dec 2005 at 02:26PM, Michael Kennedy wrote:
> 
>>I''m a proud owner of a shiny new Sun Ultra 20 Workstation (1 of
10),
>>and I ordered it with the standard SATA 80GB drive, and a secondary
>>250GB SATA drive.
>>
>>I''ve installed the machine with Nevada 27a, and proceeded to
update
>>things the way I wanted them.
>>
>>I used ''zpool'' to create a new datapool and assigned
the 250GB SATA
>>drive.  I realize that this isn''t necessarily an optimal use
case with
>>only the one drive assigned, but hey, I want to learn how zfs works
>>and what better way to do it than to use it.
>>
>>So, ZFS works like a charm, exactly like the docs said it would. 
I''m
>>creating filesystems, moving mountpoints, assigning quotas, etc....
>>
>>Then I rebooted...  And my heart sunk when the BIOS POST froze after
>>detecting the drives, which it did so successfully.  After fiddling
>>for a bit, trying to understand what was going on, I started the
>>dreary process of elimination.  Lucky for me, my first inclination was
>>to remove the drives, and add them back, one by one, so the process of
>>elimination didn''t last long.
>>
>>The end result was that I pulled the 250GB SATA drive, and the machine
>>posted.  So like a good consumer, I assumed the drive had died, and
>>proceeded to RMA it.  In the meantime, I inserted another 250GB SATA
>>drive from another Ultra20 yet to be deployed (from one of the other 9
>>we have), and re-created my ZFS pool and filesystems...  And rebooted.
>>
>>Wasn''t I surprised to have the same problem?  Then a colleague
a few
>>cubes over stuck his head up and said, "My machine won''t
boot!  WTF is
>>wrong with this thing?"  To which I replied, "Did you ZFS your
data
>>drive?", and his answer was "Yeah, why?  Is there a problem
with
>>that?"
> 
> 
> I had the same problem with my U20--- precisely this sequence. 
Let''s
> get a bug filed.
> 
>         -dp
>

Bill Moore

2005-Dec-16 00:36 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Thu, Dec 15, 2005 at 06:13:47PM -0500, Bill Sommerfeld
wrote:> On Thu, 2005-12-15 at 17:56, Dan Price wrote:
> > I had the same problem with my U20--- precisely this sequence. 
Let''s
> > get a bug filed.
> 
> sounds to me like the EFI GPT label used by ZFS when you give it the
> whole disk is somehow toxic to the BIOS...
This is a recently discovered bug (by James Gosling, no less).  It seems
to be, as Bill S. points out, caused by the EFI label that ZFS uses.  It
interacts with the BIOS RAID configuration scanning code in some evil
way.  A bug has been filed and is being aggressively pursued by both Sun,
the BIOS vendor, and the Nvidia RAID folks.  The problem should go away
if you manually format the disk with a Sun VTOC label, then give ZFS a
slice:  zpool create mypool c2d0s0

The way I''ve worked around this is by unplugging the drive until the
GRUB boot screen comes up, then hot-plugging the drive in and booting
Solaris.  YMMV.


--Bill

Michael Kennedy

2005-Dec-16 03:50 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

Thanks for the insight Bill.  It''s appreciated.  I will attempt the
hot-plug ritual tomorrow morning, and if the gods are smiling, I will be able to
VTOC & get on with my life.

ZFS is showing incredible potential, and I''m discovering more uses than
any of the marketing material that has been spewing since ''04 has
eluded to.  I can''t wait for this to go GA.  This is a disruptive
technology.

Unfortunately I''m getting ahead of myself, imagining potential uses,
and solutioning problems that don''t exist...  As we are all prone to do
from time to time.

Thanks again for the steer.

MK
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2005-Dec-16 07:20 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

>I had the same problem with my U20--- precisely this sequence. 
Let''s
>get a bug filed.

Can you remove the drive fromt he boot sequence?  I''ve seen this
happen when a Tyan 2885 wanted to boot from an EFI labelled
disk.

Casper

Dan Price

2005-Dec-16 08:02 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Thu 15 Dec 2005 at 04:36PM, Bill Moore wrote:> On Thu, Dec 15, 2005 at 06:13:47PM -0500, Bill Sommerfeld wrote:
> > On Thu, 2005-12-15 at 17:56, Dan Price wrote:
> > > I had the same problem with my U20--- precisely this sequence. 
Let''s
> > > get a bug filed.
> > 
> > sounds to me like the EFI GPT label used by ZFS when you give it the
> > whole disk is somehow toxic to the BIOS...
> 
> This is a recently discovered bug (by James Gosling, no less).  It seems
> to be, as Bill S. points out, caused by the EFI label that ZFS uses.  It
> interacts with the BIOS RAID configuration scanning code in some evil
> way.  A bug has been filed and is being aggressively pursued by both Sun,
> the BIOS vendor, and the Nvidia RAID folks.  The problem should go away
BugID?
> if you manually format the disk with a Sun VTOC label, then give ZFS a
> slice:  zpool create mypool c2d0s0
> 
> The way I''ve worked around this is by unplugging the drive until
the
> GRUB boot screen comes up, then hot-plugging the drive in and booting
> Solaris.  YMMV.
I tried that, and I just wound up with a sad machine; the second drive
never got recognized, and things like format would hang for a long
while before giving up.

I also filed a bug (6364104), which maybe can now be closed as a dup.

        -dp

-- 
Daniel Price - Solaris Kernel Engineering - dp at eng.sun.com - blogs.sun.com/dp

Scott Howard

2005-Dec-16 08:43 UTC

head link

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On Fri, Dec 16, 2005 at 12:02:17AM -0800, Dan Price
wrote:> > way.  A bug has been filed and is being aggressively pursued by both
Sun,
> > the BIOS vendor, and the Nvidia RAID folks.  The problem should go
away
> 
> BugID?
CR 6363449 I''d guess.

  Scott

Keith Chan

2005-Dec-16 09:03 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS

> >I had the same problem with my U20--- precisely this
> sequence.  Let''s
> >get a bug filed.
> 
> 
> Can you remove the drive fromt he boot sequence?
>  I''ve seen this
> happen when a Tyan 2885 wanted to boot from an EFI
> labelled
> disk.
Same thing happened to me on my MSI K8N Neo4 Platinum based system - I just  set
the drive type to "None".
This message posted from opensolaris.org

Richard Elling

2005-Dec-16 18:07 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

SInce EFI is an Intel (sponsored?) standard, why are *we* just now seeing this? 
Shouldn''t everyone who does EFI see this?
 -- richard
This message posted from opensolaris.org

Cyril Plisko

2005-Dec-16 19:30 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

On 12/16/05, Richard Elling <Richard.Elling at sun.com>
wrote:> SInce EFI is an Intel (sponsored?) standard, why are *we* just now seeing
this?  Shouldn''t everyone who does EFI see this?
Richard,

EFI was designed for IA64 architecture and GPT label (aka EFI label in Solaris)
is used there. Add to this the fact that PC BIOS usually cannot cope
(didn''t we just
see that :-Q) with the EFI label and here is what we have - most of
the PC (UNIX/Win)
users just have no reason to put EFI label on their disks. ZFS just
changed this.
Oh, BTW, before ZFS one couldn''t put an EFI label on [S]ATA drive at
all even
in Solaris. I think that explains the situation.


--
Regards,
        Cyril

Casper.Dik at Sun.COM

2005-Dec-16 20:26 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

>EFI was designed for IA64 architecture and GPT label (aka EFI label in
Solaris)
>is used there. Add to this the fact that PC BIOS usually cannot cope
>(didn''t we just
>see that :-Q) with the EFI label and here is what we have - most of
>the PC (UNIX/Win)
>users just have no reason to put EFI label on their disks. ZFS just
>changed this.
>Oh, BTW, before ZFS one couldn''t put an EFI label on [S]ATA drive
at all even
>in Solaris. I think that explains the situation.
And that was a *very* recent putback; hit the gates in the two weeks
before zfs, I believe.

So the exposure of the new feature was very limited before zfs
came out..

Casper

Kyle McDonald

2005-Dec-21 17:17 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

I had nv_28 on a SPARC machine with 6 12 disk multipacks. I created a 
pool and several filesystems.
(no data yet though.) Today I ran ''zpool export'' and then
jumpstarted to
nv_29. After booting up I ran ''zpool import'' and the ouput
looked like
below.

I''m pretty sure there wasn''t any thing wrong with the disks
before re
jumpstarting.
But what I find suspicious is that it says ''c3t2d0'' is
missing, and then
says ''c3t2d0s0'' is OK and ONLINE. Which is it?

''c0'' is the boot disk controler. But there are 3 dual ultra
scsi
controllers in this box, so there really should be a ''c6''
somewhere too.

Does this look fishy to anyone else?

    -Kyle



bell# zpool import
  pool: datapool0
    id: 17061535701658615450
 state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-2Q
config:

        datapool0      DEGRADED
          raidz        DEGRADED
            c1t2d0s0   ONLINE
            c2t2d0s0   ONLINE
            c3t2d0     FAULTED   cannot open
            c3t2d0s0   ONLINE
            c4t2d0s0   ONLINE
            c5t2d0s0   ONLINE
          raidz        DEGRADED
            c1t3d0s0   ONLINE
            c2t3d0s0   ONLINE
            c3t3d0     FAULTED   cannot open
            c3t3d0s0   ONLINE
            c4t3d0s0   ONLINE
            c5t3d0s0   ONLINE
          raidz        DEGRADED
            c1t4d0s0   ONLINE
            c2t4d0s0   ONLINE
            c3t4d0     FAULTED   cannot open
            c3t4d0s0   ONLINE
            c4t4d0s0   ONLINE
            c5t4d0s0   ONLINE
          raidz        DEGRADED
            c1t5d0s0   ONLINE
            c2t5d0s0   ONLINE
            c3t5d0     FAULTED   cannot open
            c3t5d0s0   ONLINE
            c4t5d0s0   ONLINE
            c5t5d0s0   ONLINE
          raidz        DEGRADED
            c1t8d0s0   ONLINE
            c2t8d0s0   ONLINE
            c3t8d0     FAULTED   cannot open
            c3t8d0s0   ONLINE
            c4t8d0s0   ONLINE
            c5t8d0s0   ONLINE
          raidz        DEGRADED
            c1t9d0s0   ONLINE
            c2t9d0s0   ONLINE
            c3t9d0     FAULTED   cannot open
            c3t9d0s0   ONLINE
            c4t9d0s0   ONLINE
            c5t9d0s0   ONLINE
          raidz        DEGRADED
            c1t10d0s0  ONLINE
            c2t10d0s0  ONLINE
            c3t10d0    FAULTED   cannot open
            c3t10d0s0  ONLINE
            c4t10d0s0  ONLINE
            c5t10d0s0  ONLINE
          raidz        DEGRADED
            c1t11d0s0  ONLINE
            c2t11d0s0  ONLINE
            c3t11d0    FAULTED   cannot open
            c3t11d0s0  ONLINE
            c4t11d0s0  ONLINE
            c5t11d0s0  ONLINE
          raidz        DEGRADED
            c1t12d0s0  ONLINE
            c2t12d0s0  ONLINE
            c3t12d0    FAULTED   cannot open
            c3t12d0s0  ONLINE
            c4t12d0s0  ONLINE
            c5t12d0s0  ONLINE
          raidz        DEGRADED
            c1t13d0s0  ONLINE
            c2t13d0s0  ONLINE
            c3t13d0    FAULTED   cannot open
            c3t13d0s0  ONLINE
            c4t13d0s0  ONLINE
            c5t13d0s0  ONLINE
          raidz        DEGRADED
            c1t14d0s0  ONLINE
            c2t14d0s0  ONLINE
            c3t14d0    FAULTED   cannot open
            c3t14d0s0  ONLINE
            c4t14d0s0  ONLINE
            c5t14d0s0  ONLINE
          raidz        DEGRADED
            c1t15d0s0  ONLINE
            c2t15d0s0  ONLINE
            c3t15d0    FAULTED   cannot open
            c3t15d0s0  ONLINE
            c4t15d0s0  ONLINE
            c5t15d0s0  ONLINE
bell#

Eric Schrock

2005-Dec-21 18:53 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

Yes, this is probably related to:

6362672 import gets confused about overlapping slices

When you created this pool, did you use whole disks?   This might also
be related to:

6344272 re-think how whole disks are stored

The latter should be fixed in build 31; the former is on my short list.
This may also be a new pathology.  Can you recreate this?  Can you send
the output of ''zpool status'' before exporting the pool?

Thanks.

- Eric

On Wed, Dec 21, 2005 at 12:17:33PM -0500, Kyle McDonald
wrote:> I had nv_28 on a SPARC machine with 6 12 disk multipacks. I created a 
> pool and several filesystems.
> (no data yet though.) Today I ran ''zpool export'' and then
jumpstarted to
> nv_29. After booting up I ran ''zpool import'' and the
ouput looked like
> below.
> 
> I''m pretty sure there wasn''t any thing wrong with the
disks before re
> jumpstarting.
> But what I find suspicious is that it says ''c3t2d0'' is
missing, and then
> says ''c3t2d0s0'' is OK and ONLINE. Which is it?
> 
> ''c0'' is the boot disk controler. But there are 3 dual
ultra scsi
> controllers in this box, so there really should be a ''c6''
somewhere too.
> 
> Does this look fishy to anyone else?
> 
>    -Kyle
> 
> 
> 
> bell# zpool import
>  pool: datapool0
>    id: 17061535701658615450
> state: DEGRADED
> status: One or more devices are missing from the system.
> action: The pool can be imported despite missing or damaged devices.  The
>        fault tolerance of the pool may be compromised if imported.
>   see: http://www.sun.com/msg/ZFS-8000-2Q
> config:
> 
>        datapool0      DEGRADED
>          raidz        DEGRADED
>            c1t2d0s0   ONLINE
>            c2t2d0s0   ONLINE
>            c3t2d0     FAULTED   cannot open
>            c3t2d0s0   ONLINE
>            c4t2d0s0   ONLINE
>            c5t2d0s0   ONLINE
>          raidz        DEGRADED
>            c1t3d0s0   ONLINE
>            c2t3d0s0   ONLINE
>            c3t3d0     FAULTED   cannot open
>            c3t3d0s0   ONLINE
>            c4t3d0s0   ONLINE
>            c5t3d0s0   ONLINE
>          raidz        DEGRADED
>            c1t4d0s0   ONLINE
>            c2t4d0s0   ONLINE
>            c3t4d0     FAULTED   cannot open
>            c3t4d0s0   ONLINE
>            c4t4d0s0   ONLINE
>            c5t4d0s0   ONLINE
>          raidz        DEGRADED
>            c1t5d0s0   ONLINE
>            c2t5d0s0   ONLINE
>            c3t5d0     FAULTED   cannot open
>            c3t5d0s0   ONLINE
>            c4t5d0s0   ONLINE
>            c5t5d0s0   ONLINE
>          raidz        DEGRADED
>            c1t8d0s0   ONLINE
>            c2t8d0s0   ONLINE
>            c3t8d0     FAULTED   cannot open
>            c3t8d0s0   ONLINE
>            c4t8d0s0   ONLINE
>            c5t8d0s0   ONLINE
>          raidz        DEGRADED
>            c1t9d0s0   ONLINE
>            c2t9d0s0   ONLINE
>            c3t9d0     FAULTED   cannot open
>            c3t9d0s0   ONLINE
>            c4t9d0s0   ONLINE
>            c5t9d0s0   ONLINE
>          raidz        DEGRADED
>            c1t10d0s0  ONLINE
>            c2t10d0s0  ONLINE
>            c3t10d0    FAULTED   cannot open
>            c3t10d0s0  ONLINE
>            c4t10d0s0  ONLINE
>            c5t10d0s0  ONLINE
>          raidz        DEGRADED
>            c1t11d0s0  ONLINE
>            c2t11d0s0  ONLINE
>            c3t11d0    FAULTED   cannot open
>            c3t11d0s0  ONLINE
>            c4t11d0s0  ONLINE
>            c5t11d0s0  ONLINE
>          raidz        DEGRADED
>            c1t12d0s0  ONLINE
>            c2t12d0s0  ONLINE
>            c3t12d0    FAULTED   cannot open
>            c3t12d0s0  ONLINE
>            c4t12d0s0  ONLINE
>            c5t12d0s0  ONLINE
>          raidz        DEGRADED
>            c1t13d0s0  ONLINE
>            c2t13d0s0  ONLINE
>            c3t13d0    FAULTED   cannot open
>            c3t13d0s0  ONLINE
>            c4t13d0s0  ONLINE
>            c5t13d0s0  ONLINE
>          raidz        DEGRADED
>            c1t14d0s0  ONLINE
>            c2t14d0s0  ONLINE
>            c3t14d0    FAULTED   cannot open
>            c3t14d0s0  ONLINE
>            c4t14d0s0  ONLINE
>            c5t14d0s0  ONLINE
>          raidz        DEGRADED
>            c1t15d0s0  ONLINE
>            c2t15d0s0  ONLINE
>            c3t15d0    FAULTED   cannot open
>            c3t15d0s0  ONLINE
>            c4t15d0s0  ONLINE
>            c5t15d0s0  ONLINE
> bell#
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Kyle McDonald

2005-Dec-21 19:06 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

Eric Schrock wrote:> Yes, this is probably related to:
>
> 6362672 import gets confused about overlapping slices
>
> When you created this pool, did you use whole disks?   This might also
> be related to:
>
> 6344272 re-think how whole disks are stored
>
> The latter should be fixed in build 31; the former is on my short list.
> This may also be a new pathology.  Can you recreate this?  Can you send
> the output of ''zpool status'' before exporting the pool?
>
> Thanks.
>
>   I think I figured it out. maybe this will help you:

One of the multipacks was confused. I power cycled it, and ran
''disks'',
''drvconfig'', and ''devlinks'' and c6 showed
up.  This c6 should have been
numbered c3 if it had been found during the jumpstart, but since the 
disks on it were wacked it didn''t get a number at all.

I think the missing c3 in the output is zpool showing me the name of the 
device the last time it was present. The other c3 is the name of the 
device that currently is in the c3 position, but which used to be in the 
c4 position. Since the controller was missing entirely, there wasn''t
any
name to replace the old c3 with. Maybe zpool should print
''missing'' instead?

    -Kyle
> - Eric
>
> On Wed, Dec 21, 2005 at 12:17:33PM -0500, Kyle McDonald wrote:
>   
>> I had nv_28 on a SPARC machine with 6 12 disk multipacks. I created a 
>> pool and several filesystems.
>> (no data yet though.) Today I ran ''zpool export'' and
then jumpstarted to
>> nv_29. After booting up I ran ''zpool import'' and the
ouput looked like
>> below.
>>
>> I''m pretty sure there wasn''t any thing wrong with the
disks before re
>> jumpstarting.
>> But what I find suspicious is that it says ''c3t2d0''
is missing, and then
>> says ''c3t2d0s0'' is OK and ONLINE. Which is it?
>>
>> ''c0'' is the boot disk controler. But there are 3 dual
ultra scsi
>> controllers in this box, so there really should be a
''c6'' somewhere too.
>>
>> Does this look fishy to anyone else?
>>
>>    -Kyle
>>
>>
>>
>> bell# zpool import
>>  pool: datapool0
>>    id: 17061535701658615450
>> state: DEGRADED
>> status: One or more devices are missing from the system.
>> action: The pool can be imported despite missing or damaged devices. 
The
>>        fault tolerance of the pool may be compromised if imported.
>>   see: http://www.sun.com/msg/ZFS-8000-2Q
>> config:
>>
>>        datapool0      DEGRADED
>>          raidz        DEGRADED
>>            c1t2d0s0   ONLINE
>>            c2t2d0s0   ONLINE
>>            c3t2d0     FAULTED   cannot open
>>            c3t2d0s0   ONLINE
>>            c4t2d0s0   ONLINE
>>            c5t2d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t3d0s0   ONLINE
>>            c2t3d0s0   ONLINE
>>            c3t3d0     FAULTED   cannot open
>>            c3t3d0s0   ONLINE
>>            c4t3d0s0   ONLINE
>>            c5t3d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t4d0s0   ONLINE
>>            c2t4d0s0   ONLINE
>>            c3t4d0     FAULTED   cannot open
>>            c3t4d0s0   ONLINE
>>            c4t4d0s0   ONLINE
>>            c5t4d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t5d0s0   ONLINE
>>            c2t5d0s0   ONLINE
>>            c3t5d0     FAULTED   cannot open
>>            c3t5d0s0   ONLINE
>>            c4t5d0s0   ONLINE
>>            c5t5d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t8d0s0   ONLINE
>>            c2t8d0s0   ONLINE
>>            c3t8d0     FAULTED   cannot open
>>            c3t8d0s0   ONLINE
>>            c4t8d0s0   ONLINE
>>            c5t8d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t9d0s0   ONLINE
>>            c2t9d0s0   ONLINE
>>            c3t9d0     FAULTED   cannot open
>>            c3t9d0s0   ONLINE
>>            c4t9d0s0   ONLINE
>>            c5t9d0s0   ONLINE
>>          raidz        DEGRADED
>>            c1t10d0s0  ONLINE
>>            c2t10d0s0  ONLINE
>>            c3t10d0    FAULTED   cannot open
>>            c3t10d0s0  ONLINE
>>            c4t10d0s0  ONLINE
>>            c5t10d0s0  ONLINE
>>          raidz        DEGRADED
>>            c1t11d0s0  ONLINE
>>            c2t11d0s0  ONLINE
>>            c3t11d0    FAULTED   cannot open
>>            c3t11d0s0  ONLINE
>>            c4t11d0s0  ONLINE
>>            c5t11d0s0  ONLINE
>>          raidz        DEGRADED
>>            c1t12d0s0  ONLINE
>>            c2t12d0s0  ONLINE
>>            c3t12d0    FAULTED   cannot open
>>            c3t12d0s0  ONLINE
>>            c4t12d0s0  ONLINE
>>            c5t12d0s0  ONLINE
>>          raidz        DEGRADED
>>            c1t13d0s0  ONLINE
>>            c2t13d0s0  ONLINE
>>            c3t13d0    FAULTED   cannot open
>>            c3t13d0s0  ONLINE
>>            c4t13d0s0  ONLINE
>>            c5t13d0s0  ONLINE
>>          raidz        DEGRADED
>>            c1t14d0s0  ONLINE
>>            c2t14d0s0  ONLINE
>>            c3t14d0    FAULTED   cannot open
>>            c3t14d0s0  ONLINE
>>            c4t14d0s0  ONLINE
>>            c5t14d0s0  ONLINE
>>          raidz        DEGRADED
>>            c1t15d0s0  ONLINE
>>            c2t15d0s0  ONLINE
>>            c3t15d0    FAULTED   cannot open
>>            c3t15d0s0  ONLINE
>>            c4t15d0s0  ONLINE
>>            c5t15d0s0  ONLINE
>> bell#
>>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>     
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Eric Schrock

2005-Dec-21 19:26 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

On Wed, Dec 21, 2005 at 02:06:43PM -0500, Kyle McDonald
wrote:> 
> One of the multipacks was confused. I power cycled it, and ran
''disks'',
> ''drvconfig'', and ''devlinks'' and c6
showed up.  This c6 should have been
> numbered c3 if it had been found during the jumpstart, but since the 
> disks on it were wacked it didn''t get a number at all.
> 
> I think the missing c3 in the output is zpool showing me the name of the 
> device the last time it was present. The other c3 is the name of the 
> device that currently is in the c3 position, but which used to be in the 
> c4 position. Since the controller was missing entirely, there
wasn''t any
> name to replace the old c3 with. Maybe zpool should print
''missing'' instead?
>
OK.  That makes sense.  The additional ''s0'' additions are due
to the
whole-disk bug, and will be fixed soon.

This could probably be made a little more explicit, but it gets
difficult once you actually import the pool.  If we were to import your
pool, we would not know whether ''c3t0d0'' was the right name of
the
device or not.  Note that once you plugged the disk in, we would
correctly open it by devid and all would be well, modulo this bug:

6364582 need to fixup paths if they''ve changed

Which doesn''t affect correctness, but can produce confusing output when
using zpool(1M).  For example, what if you had:

	pool
	  mirror
	    c0t1d0	ONLINE
	    c0t1d0	OFFLINE		cannot open
	    c0t2d0	ONLINE

Is there any way to display this in a non-confusing manner?  One
possibility is that if there is a device with the given path, but it
doesn''t match the one we''re expecting, then we display it
differently,
either with ''missing'', or marking the path somehow, like
"(c0t1d0)".
Would any of this help?

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Casper.Dik at Sun.COM

2005-Dec-21 19:56 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

>	pool
>	  mirror
>	    c0t1d0	ONLINE
>	    c0t1d0	OFFLINE		cannot open
>	    c0t2d0	ONLINE
>
>Is there any way to display this in a non-confusing manner?  One
>possibility is that if there is a device with the given path, but it
>doesn''t match the one we''re expecting, then we display it
differently,
>either with ''missing'', or marking the path somehow, like
"(c0t1d0)".
>Would any of this help?

If I may suggest *not* mentioning the device name at all?

Whatever way you present it, it is going to be confusing.

In this case we''d have c0t1d0 and (c0t1d0).  Now the user may
well say "but c0t1d0 is there, it''s c1t1d0 that''s gone
missing!.

It makes much more sense to use some WNN id.

Casper

Kyle McDonald

2005-Dec-21 21:09 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

Eric Schrock wrote:> On Wed, Dec 21, 2005 at 02:06:43PM -0500, Kyle McDonald wrote:
>   
>> One of the multipacks was confused. I power cycled it, and ran
''disks'',
>> ''drvconfig'', and ''devlinks'' and c6
showed up.  This c6 should have been
>> numbered c3 if it had been found during the jumpstart, but since the 
>> disks on it were wacked it didn''t get a number at all.
>>
>> I think the missing c3 in the output is zpool showing me the name of
the
>> device the last time it was present. The other c3 is the name of the 
>> device that currently is in the c3 position, but which used to be in
the
>> c4 position. Since the controller was missing entirely, there
wasn''t any
>> name to replace the old c3 with. Maybe zpool should print
''missing'' instead?
>>
>>     
>
> OK.  That makes sense.  The additional ''s0'' additions are
due to the
> whole-disk bug, and will be fixed soon.
>
> This could probably be made a little more explicit, but it gets
> difficult once you actually import the pool.  If we were to import your
> pool, we would not know whether ''c3t0d0'' was the right
name of the
> device or not.  Note that once you plugged the disk in, we would
> correctly open it by devid and all would be well, modulo this bug:
>
> 6364582 need to fixup paths if they''ve changed
>
> Which doesn''t affect correctness, but can produce confusing output
when
> using zpool(1M).  For example, what if you had:
>
> 	pool
> 	  mirror
> 	    c0t1d0	ONLINE
> 	    c0t1d0	OFFLINE		cannot open
> 	    c0t2d0	ONLINE
>
> Is there any way to display this in a non-confusing manner?  One
> possibility is that if there is a device with the given path, but it
> doesn''t match the one we''re expecting, then we display it
differently,
> either with ''missing'', or marking the path somehow, like
"(c0t1d0)".
> Would any of this help?
>   I would think:

	pool
	  mirror
	    c0t1d0	ONLINE
	    missing	OFFLINE		cannot open (was c0t1d0)
	    c0t2d0	ONLINE

Would be the most useful.

	-Kyle

Eric Schrock

2005-Dec-21 21:19 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

On Wed, Dec 21, 2005 at 04:09:14PM -0500, Kyle McDonald
wrote:>
> I would think:
> 
> 	pool
> 	  mirror
> 	    c0t1d0	ONLINE
> 	    missing	OFFLINE		cannot open (was c0t1d0)
> 	    c0t2d0	ONLINE
This makes sense.  We can do this for the following cases:

1. During import, if during our device scan we never touched the disk in
   question.

2. For an active pool, if the path is valid, but doesn''t refer to the
   device we expect it to be.

However, if we have an active pool whose path and devid are invalid,
does it still make sense to display "(was c0t1d0)"?  For example, if I
unplug a USB device, or a network attached drive goes away for some
reason, does it make sense to display it as if the path is somehow
wrong?  At this point we can''t tell if the path is right or not -
can/should we distinguish between these cases?

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Casper.Dik at Sun.COM

2005-Dec-21 21:22 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

>I would think:
>
>	pool
>	  mirror
>	    c0t1d0	ONLINE
>	    missing	OFFLINE		cannot open (was c0t1d0)
>	    c0t2d0	ONLINE
>
>Would be the most useful.

I''m still not sure how useful this is; if the device was moved from
one system to another, the controller number could very well be wrong.

The unique identifier would be a better indicator; perhaps, if it''s
available, disk brand and serial.

Casper

Eric Schrock

2005-Dec-21 21:39 UTC

head link

[zfs-discuss] Zpool output is wierd after export/import.

On Wed, Dec 21, 2005 at 10:22:20PM +0100, Casper.Dik at sun.com
wrote:> 
> I''m still not sure how useful this is; if the device was moved
from
> one system to another, the controller number could very well be wrong.
Except that it may very well clue the admin into what went wrong.  If
they notice, for example, that every disk that was on (former)
controller 3 is mising, it suggests that they didn''t quite connect
controller 3 correctly, even though it is now controller 6.
> The unique identifier would be a better indicator; perhaps, if
it''s
> available, disk brand and serial.
I disagree.  If we had a unique identifier that was in any way
intelligible to the user, this would make sense.  We have both a 64-bit
GUID and a device ID, neither of which provide any useful input to the
administrator on how to fix the problem.  In the example here, how would
a bunch of devids like ''SEAGATE at
WWC49384598610004949993999/SS3948829" be
any indication of the real problem (controller 6 was setup incorrectly)?

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Anton B. Rang

2005-Dec-22 02:51 UTC

head link

[zfs-discuss] Re: Zpool output is wierd after export/import.

Perhaps we should allow users to name disks when they''re added, as well
as labeling them with the internal ID? Then there''d be an identifier --
unique if the user is careful -- which was meaningful.
This message posted from opensolaris.org

Robert Milkowski

2007-Jan-22 13:48 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

Is there an BIOS uptade for Ultra20 to make it understand EFI?
 
 
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2007-Jan-22 13:56 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

>Is there an BIOS uptade for Ultra20 to make it understand EFI?

Understanding EFI is perhaps asking too much; but I believe the
latest BIOS no longer hangs/crashes when it encountered EFI labels
on disks it examines.  (All disks it probes)

Casper

Robert Milkowski

2007-Feb-06 18:48 UTC

head link

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

Hello Casper,

Monday, January 22, 2007, 2:56:16 PM, you wrote:
>>Is there an BIOS uptade for Ultra20 to make it understand EFI?

CDSC> Understanding EFI is perhaps asking too much; but I believe the
CDSC> latest BIOS no longer hangs/crashes when it encountered EFI labels
CDSC> on disks it examines.  (All disks it probes)

That''s what we looked for. Somehow my friend missed that.
Thank you.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Eric Haycraft

2007-Feb-12 15:27 UTC

head link

[zfs-discuss] Re: Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS

I had the same issue with zfs killing my Ultra20. I can confirm that flashing
the BIOS fixed the issue.

http://www.sun.com/desktop/workstation/ultra20/downloads.jsp#Ultra

Eric
 
 
This message posted from opensolaris.org

zfs discuss - Dec 2005 - ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Zpool output is wierd after export/import.

[zfs-discuss] Re: Zpool output is wierd after export/import.

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)

[zfs-discuss] Re: Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS