thr3ads.net - zfs discuss - [zfs-discuss] zpool unavailable after reboot [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Darren J Moffat

2006-Jul-17 07:10 UTC

[zfs-discuss] zpool unavailable after reboot

Jeff Bonwick wrote> 	zpool create data unreplicated A B C
> 
> The extra typing would be annoying, but would make it almost impossible
> to get the wrong behavior by accident.
I think that is a very good idea from a usability view point.  It is 
better to have to type a few more chars to explicitly say "I know ZFS 
isn''t going to do all the data replication" when you run zpool
than to
find out later you aren''t protected (by ZFS anyway).

-- 
Darren J Moffat

Mikael Kjerrman

2006-Jul-17 08:18 UTC

head link

[zfs-discuss] zpool unavailable after reboot

Hi,

so it happened...

I have a 10 disk raidz pool running Solaris 10 U2, and after a reboot the whole
pool became unavailable after apparently loosing a diskdrive. (The drive is
seemingly ok as far as I can tell from other commands)

--- bootlog ---
Jul 17 09:57:38 expprd fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-CS,
TYPE: Fault, VER: 1, SEVERITY: Major
Jul 17 09:57:38 expprd EVENT-TIME: Mon Jul 17 09:57:38 MEST 2006
Jul 17 09:57:38 expprd PLATFORM: SUNW,UltraAX-i2, CSN: -, HOSTNAME: expprd
Jul 17 09:57:38 expprd SOURCE: zfs-diagnosis, REV: 1.0
Jul 17 09:57:38 expprd EVENT-ID: e2fd61f7-a03d-6279-d5a5-9b8755fa1af9
Jul 17 09:57:38 expprd DESC: A ZFS pool failed to open.  Refer to
http://sun.com/msg/ZFS-8000-CS for more information.
Jul 17 09:57:38 expprd AUTO-RESPONSE: No automated response will occur.
Jul 17 09:57:38 expprd IMPACT: The pool data is unavailable
Jul 17 09:57:38 expprd REC-ACTION: Run ''zpool status -x'' and
either attach the missing device or
Jul 17 09:57:38 expprd      restore from backup.
-------

--- zpool status -x ---
bash-3.00# zpool status -x
  pool: data
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        UNAVAIL      0     0     0  insufficient replicas
          c1t0d0    ONLINE       0     0     0
          c1t1d0    ONLINE       0     0     0
          c1t2d0    ONLINE       0     0     0
          c1t3d0    ONLINE       0     0     0
          c2t0d0    ONLINE       0     0     0
          c2t1d0    ONLINE       0     0     0
          c2t2d0    ONLINE       0     0     0
          c2t3d0    ONLINE       0     0     0
          c2t4d0    ONLINE       0     0     0
          c1t4d0    UNAVAIL      0     0     0  cannot open
--------------

The problem as I see it is that the pool should be able to handle 1 disk error,
no?
and the online, attach, replace.... commands doesn''t work when the pool
is unavailable. I''ve filed a case with Sun, but thought I''d
ask around here to see if anyone has experienced this before.


cheers,

//Mikael
 
 
This message posted from opensolaris.org

Jeff Bonwick

2006-Jul-17 08:35 UTC

head link

[zfs-discuss] zpool unavailable after reboot

> I have a 10 disk raidz pool running Solaris 10 U2, and after a reboot
> the whole pool became unavailable after apparently loosing a diskdrive.
> [...]
>         NAME        STATE     READ WRITE CKSUM
>         data        UNAVAIL      0     0     0  insufficient replicas
>           c1t0d0    ONLINE       0     0     0
> [...]
>           c1t4d0    UNAVAIL      0     0     0  cannot open
> --------------
> 
> The problem as I see it is that the pool should be able to handle
> 1 disk error, no?
If it were a raidz pool, that would be correct.  But according to
zpool status, it''s just a collection of disks with no replication.
Specifically, compare these two commands:

(1) zpool create data A B C

(2) zpool create data raidz A B C

Assume each disk has 500G capacity.

The first command will create an unreplicated pool with 1.5T capacity.
The second will create a single-parity RAID-Z pool with 1.0T capacity.

My guess is that you intended the latter, but actually typed the former,
perhaps assuming that RAID-Z was always present.  If so, I apologize for
not making this clearer.  If you have any suggestions for how we could
improve the zpool(1M) command or documentation, please let me know.

One option -- I confess up front that I don''t really like it -- would
be
to make ''unreplicated'' an explicit replication type (in
addition to
mirror and raidz), so that you couldn''t get it by accident:

	zpool create data unreplicated A B C

The extra typing would be annoying, but would make it almost impossible
to get the wrong behavior by accident.

Jeff

Michael Schuster - Sun Microsystems

2006-Jul-17 08:49 UTC

head link

[zfs-discuss] zpool unavailable after reboot

Jeff Bonwick wrote:
> One option -- I confess up front that I don''t really like it --
would be
> to make ''unreplicated'' an explicit replication type (in
addition to
> mirror and raidz), so that you couldn''t get it by accident:
> 
> 	zpool create data unreplicated A B C
 >> 
> The extra typing would be annoying, 
to address the "extra typing": would it be such a bad idea to offer
the
option of .. erm ... options, thus:

	zpool create pool [-u|-z|-m|unreplicated|mirror|raidz|..] vdev ...

in addition to the "long" keywords?

Michael
-- 
Michael Schuster                  (+49 89) 46008-2974 / x62974
visit the online support center:  http://www.sun.com/osc/

Recursion, n.: see ''Recursion''

Mikael Kjerrman

2006-Jul-17 10:41 UTC

head link

[zfs-discuss] Re: zpool unavailable after reboot

Jeff,

thanks for your answer, and I almost wish I did type it wrong (the easy
explanation that I messed up :-) but from what I can tell I did get it right

--- zpool commands I ran ---
bash-3.00# grep zpool /.bash_history 
zpool
zpool create data raidz c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c2t0d0 c2t1d0 c2t2d0
c2t3d0 c2t4d0
zpool list
zpool status
zpool iostat 3
zpool scrub data
zpool status
bash-3.00#

------------

the other problem I have with this is that why did it kick the disk out? I can
run all sorts of tests on the disk and it is perfectly fine... does it kick out
random disk upon boot? ;-)
 
 
This message posted from opensolaris.org

Al Hopper

2006-Jul-17 15:17 UTC

head link

[zfs-discuss] zpool unavailable after reboot

On Mon, 17 Jul 2006, Darren J Moffat wrote:
> Jeff Bonwick wrote
> > 	zpool create data unreplicated A B C
> >
> > The extra typing would be annoying, but would make it almost
impossible
> > to get the wrong behavior by accident.
>
> I think that is a very good idea from a usability view point.  It is
> better to have to type a few more chars to explicitly say "I know ZFS
> isn''t going to do all the data replication" when you run
zpool than to
> find out later you aren''t protected (by ZFS anyway).
+1

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
                OpenSolaris Governing Board (OGB) Member - Feb 2006

Richard Elling

2006-Jul-17 16:22 UTC

head link

[zfs-discuss] zpool unavailable after reboot

I too have seen this recently, due to a partially failed drive.
When I physically removed the drive, ZFS figured everything out and
I was back up and running.  Alas, I have been unable to recreate.
There is a bug lurking here, if someone has a more clever way to
test, we might be able to nail it down.
  -- richard

Mikael Kjerrman wrote:> Hi,
> 
> so it happened...
> 
> I have a 10 disk raidz pool running Solaris 10 U2, and after a reboot the
whole pool became unavailable after apparently loosing a diskdrive. (The drive
is seemingly ok as far as I can tell from other commands)
> 
> --- bootlog ---
> Jul 17 09:57:38 expprd fmd: [ID 441519 daemon.error] SUNW-MSG-ID:
ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major
> Jul 17 09:57:38 expprd EVENT-TIME: Mon Jul 17 09:57:38 MEST 2006
> Jul 17 09:57:38 expprd PLATFORM: SUNW,UltraAX-i2, CSN: -, HOSTNAME: expprd
> Jul 17 09:57:38 expprd SOURCE: zfs-diagnosis, REV: 1.0
> Jul 17 09:57:38 expprd EVENT-ID: e2fd61f7-a03d-6279-d5a5-9b8755fa1af9
> Jul 17 09:57:38 expprd DESC: A ZFS pool failed to open.  Refer to
http://sun.com/msg/ZFS-8000-CS for more information.
> Jul 17 09:57:38 expprd AUTO-RESPONSE: No automated response will occur.
> Jul 17 09:57:38 expprd IMPACT: The pool data is unavailable
> Jul 17 09:57:38 expprd REC-ACTION: Run ''zpool status -x''
and either attach the missing device or
> Jul 17 09:57:38 expprd      restore from backup.
> -------
> 
> --- zpool status -x ---
> bash-3.00# zpool status -x
>   pool: data
>  state: FAULTED
> status: One or more devices could not be opened.  There are insufficient
>         replicas for the pool to continue functioning.
> action: Attach the missing device and online it using ''zpool
online''.
>    see: http://www.sun.com/msg/ZFS-8000-D3
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         data        UNAVAIL      0     0     0  insufficient replicas
>           c1t0d0    ONLINE       0     0     0
>           c1t1d0    ONLINE       0     0     0
>           c1t2d0    ONLINE       0     0     0
>           c1t3d0    ONLINE       0     0     0
>           c2t0d0    ONLINE       0     0     0
>           c2t1d0    ONLINE       0     0     0
>           c2t2d0    ONLINE       0     0     0
>           c2t3d0    ONLINE       0     0     0
>           c2t4d0    ONLINE       0     0     0
>           c1t4d0    UNAVAIL      0     0     0  cannot open
> --------------
> 
> The problem as I see it is that the pool should be able to handle 1 disk
error, no?
> and the online, attach, replace.... commands doesn''t work when the
pool is unavailable. I''ve filed a case with Sun, but thought
I''d ask around here to see if anyone has experienced this before.
> 
> 
> cheers,
> 
> //Mikael
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

eric kustarz

2006-Jul-17 20:25 UTC

head link

[zfs-discuss] Re: zpool unavailable after reboot

Mikael Kjerrman wrote:
>Jeff,
>
>thanks for your answer, and I almost wish I did type it wrong (the easy
explanation that I messed up :-) but from what I can tell I did get it right
>
>--- zpool commands I ran ---
>bash-3.00# grep zpool /.bash_history 
>zpool
>zpool create data raidz c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c2t0d0 c2t1d0
c2t2d0 c2t3d0 c2t4d0
>zpool list
>zpool status
>zpool iostat 3
>zpool scrub data
>zpool status
>bash-3.00#
>  
>
And soon we''ll store the ''zpool create'' (and other
subcommands run)
on-disk.  I''m finishing that up:
6343741 want to store a command history on disk

eric
>------------
>
>the other problem I have with this is that why did it kick the disk out? I
can run all sorts of tests on the disk and it is perfectly fine... does it kick
out random disk upon boot? ;-)
> 
> 
>This message posted from opensolaris.org
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>  
>

Nathan Kroenert

2006-Jul-18 00:10 UTC

head link

[zfs-discuss] zpool unavailable after reboot

Jeff -

That sounds like a great idea... 

Another idea might to be have a zpool create announce the
''availability''
of any given configuration, and output the Single points of failure.

	# zpool create mypool a b c
	NOTICE: This pool has no redundancy. 
	Without hardware redundancy (raid1 / 5), 
	a single disk failure will destroy the whole pool.

	# zpool create mypool raidz a b c
	NOTICE: This pool has single disk redundancy. 
	Without hardware redundancy (raid1 / 5), 
	this pool can survive at most 1 disks failing.

	# zpool create mypool raidz2 a b c
	NOTICE: This pool has double disk redundancy. 
	Without hardware redundancy (raid1 / 5), 
	this pool can survive at most 2 disks failing.

It would be especially nice if it was able to detect silly
configurations too (like adding dimple disks to a raidz or something
like that (if it''s even possible) and announce the reduction in
reliability.

Thoughts? :)

Nathan.










On Mon, 2006-07-17 at 18:35, Jeff Bonwick wrote:> > I have a 10 disk raidz pool running Solaris 10 U2, and after a reboot
> > the whole pool became unavailable after apparently loosing a
diskdrive.
> > [...]
> >         NAME        STATE     READ WRITE CKSUM
> >         data        UNAVAIL      0     0     0  insufficient replicas
> >           c1t0d0    ONLINE       0     0     0
> > [...]
> >           c1t4d0    UNAVAIL      0     0     0  cannot open
> > --------------
> > 
> > The problem as I see it is that the pool should be able to handle
> > 1 disk error, no?
> 
> If it were a raidz pool, that would be correct.  But according to
> zpool status, it''s just a collection of disks with no replication.
> Specifically, compare these two commands:
> 
> (1) zpool create data A B C
> 
> (2) zpool create data raidz A B C
> 
> Assume each disk has 500G capacity.
> 
> The first command will create an unreplicated pool with 1.5T capacity.
> The second will create a single-parity RAID-Z pool with 1.0T capacity.
> 
> My guess is that you intended the latter, but actually typed the former,
> perhaps assuming that RAID-Z was always present.  If so, I apologize for
> not making this clearer.  If you have any suggestions for how we could
> improve the zpool(1M) command or documentation, please let me know.
> 
> One option -- I confess up front that I don''t really like it --
would be
> to make ''unreplicated'' an explicit replication type (in
addition to
> mirror and raidz), so that you couldn''t get it by accident:
> 
> 	zpool create data unreplicated A B C
> 
> The extra typing would be annoying, but would make it almost impossible
> to get the wrong behavior by accident.
> 
> Jeff
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--

Eric Schrock

2006-Jul-18 00:54 UTC

head link

[zfs-discuss] zpool unavailable after reboot

On Tue, Jul 18, 2006 at 10:10:33AM +1000, Nathan Kroenert
wrote:> Jeff -
> 
> That sounds like a great idea... 
> 
> Another idea might to be have a zpool create announce the
''availability''
> of any given configuration, and output the Single points of failure.
> 
> 	# zpool create mypool a b c
> 	NOTICE: This pool has no redundancy. 
> 	Without hardware redundancy (raid1 / 5), 
> 	a single disk failure will destroy the whole pool.
> 
> 	# zpool create mypool raidz a b c
> 	NOTICE: This pool has single disk redundancy. 
> 	Without hardware redundancy (raid1 / 5), 
> 	this pool can survive at most 1 disks failing.
> 
> 	# zpool create mypool raidz2 a b c
> 	NOTICE: This pool has double disk redundancy. 
> 	Without hardware redundancy (raid1 / 5), 
> 	this pool can survive at most 2 disks failing.
> 
> It would be especially nice if it was able to detect silly
> configurations too (like adding dimple disks to a raidz or something
> like that (if it''s even possible) and announce the reduction in
> reliability.
FYI, zpool(1M) will already detect some variations of "silly" and
force
you to use the ''-f'' option if you really mean it (for add and
create).
Examples include using vdevs of different redundancy (raidz + mirror),
as well as using different size devices.  If you have other definitions
of silly, let us know what we should be looking for.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

zfs discuss - Jul 2006 - zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] Re: zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] Re: zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot

[zfs-discuss] zpool unavailable after reboot