thr3ads.net - zfs discuss - [zfs-discuss] Uber block corruption? [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Ross Hosman

2006-Dec-11 23:52 UTC

[zfs-discuss] Uber block corruption?

A while back we had a Sun engineer come to our office and talk about the
benefits of ZFS. I asked him the question "Can the uber block become
corrupt and what happeneds if it does?", to which he did not have the
answer but swore to me that he would get it to me. I still haven''t
gotten that answer and was wondering if someone here could enlighten me?
 
 
This message posted from opensolaris.org

Jeff Victor

2006-Dec-12 00:40 UTC

head link

[zfs-discuss] Uber block corruption?

IANA ZFS guru, but I have read explanations like this:

When ZFS reads in the uberblock, it computes the uberblock''s checksum
and compares
it against the stored checksum for that block.  If they don''t match, it
uses
another copy of the uberblock.

Ross Hosman wrote:> A while back we had a Sun engineer come to our office and talk about the
> benefits of ZFS. I asked him the question "Can the uber block become
corrupt
> and what happeneds if it does?", to which he did not have the answer
but swore
> to me that he would get it to me. I still haven''t gotten that
answer and was
> wondering if someone here could enlighten me?
> 
> 
-- 
--------------------------------------------------------------------------
Jeff VICTOR              Sun Microsystems            jeff.victor @ sun.com
OS Ambassador            Sr. Technical Specialist
Solaris 10 Zones FAQ:    http://www.opensolaris.org/os/community/zones/faq
--------------------------------------------------------------------------

Darren Dunham

2006-Dec-12 01:10 UTC

head link

[zfs-discuss] Uber block corruption?

> A while back we had a Sun engineer come to our office and talk about
> the benefits of ZFS. I asked him the question "Can the uber block
> become corrupt and what happeneds if it does?", to which he did not
> have the answer but swore to me that he would get it to me. I still
> haven''t gotten that answer and was wondering if someone here could
> enlighten me?
Any data can become corrupt through a variety of processes.

To reduce the chance of it affecting the integrety of the filesystem,
there are multiple copies of the UB written, each with a checksum and a
generation number.  When starting up a pool, the oldest generation copy
that checks properly will be used.  If the import can''t find any valid
UB, then it''s not going to have access to any data.  Think of a UFS
filesystem where all copies of the superblock are corrupt.

So ''a'' UB can become corrupt, but it is unlikely that
''all'' UBs will
become corrupt through something that doesn''t also make all the data
also corrupt or inaccessible.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Robert Milkowski

2006-Dec-12 01:59 UTC

head link

[zfs-discuss] Uber block corruption?

Hello Darren,

Tuesday, December 12, 2006, 2:10:30 AM, you wrote:
>> A while back we had a Sun engineer come to our office and talk about
>> the benefits of ZFS. I asked him the question "Can the uber block
>> become corrupt and what happeneds if it does?", to which he did
not
>> have the answer but swore to me that he would get it to me. I still
>> haven''t gotten that answer and was wondering if someone here
could
>> enlighten me?
DD> Any data can become corrupt through a variety of processes.

DD> To reduce the chance of it affecting the integrety of the filesystem,
DD> there are multiple copies of the UB written, each with a checksum and a
DD> generation number.  When starting up a pool, the oldest generation copy
DD> that checks properly will be used.  If the import can''t find any
valid
DD> UB, then it''s not going to have access to any data.  Think of a
UFS
DD> filesystem where all copies of the superblock are corrupt.

Actually the latest UB, not the oldest.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Darren Dunham

2006-Dec-12 02:01 UTC

head link

[zfs-discuss] Uber block corruption?

> DD> To reduce the chance of it affecting the integrety of the
filesystem,
> DD> there are multiple copies of the UB written, each with a checksum
and a
> DD> generation number.  When starting up a pool, the oldest generation
copy
> DD> that checks properly will be used.  If the import can''t
find any valid
> DD> UB, then it''s not going to have access to any data.  Think
of a UFS
> DD> filesystem where all copies of the superblock are corrupt.
> 
> Actually the latest UB, not the oldest.
My *other* oldest...  yeah.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Casper.Dik at Sun.COM

2006-Dec-12 09:54 UTC

head link

[zfs-discuss] Uber block corruption?

>So ''a'' UB can become corrupt, but it is unlikely that
''all'' UBs will
>become corrupt through something that doesn''t also make all the
data
>also corrupt or inaccessible.

So how does this work for data which is freed and overwritten; does
the system make sure that none of the data referenced by any of the
old ueberblocks is ever overwritten?

Casper

Robert Milkowski

2006-Dec-12 12:47 UTC

head link

[zfs-discuss] Uber block corruption?

Hello Casper,

Tuesday, December 12, 2006, 10:54:27 AM, you wrote:
>>So ''a'' UB can become corrupt, but it is unlikely that
''all'' UBs will
>>become corrupt through something that doesn''t also make all the
data
>>also corrupt or inaccessible.

CDSC> So how does this work for data which is freed and overwritten; does
CDSC> the system make sure that none of the data referenced by any of the
CDSC> old ueberblocks is ever overwritten?

Why it should? If blocks are not used due to current UB I guess you
can safely assume they are free.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Casper.Dik at Sun.COM

2006-Dec-12 14:25 UTC

head link

[zfs-discuss] Uber block corruption?

>Hello Casper,
>
>Tuesday, December 12, 2006, 10:54:27 AM, you wrote:
>
>>>So ''a'' UB can become corrupt, but it is unlikely
that ''all'' UBs will
>>>become corrupt through something that doesn''t also make all
the data
>>>also corrupt or inaccessible.
>
>
>CDSC> So how does this work for data which is freed and overwritten; does
>CDSC> the system make sure that none of the data referenced by any of the
>CDSC> old ueberblocks is ever overwritten?
>
>Why it should? If blocks are not used due to current UB I guess you
>can safely assume they are free.

What if a newer UB is corrupted and you fall back to an older one?

Casper

George Wilson

2006-Dec-12 14:46 UTC

head link

[zfs-discuss] Uber block corruption?

Also note that the UB is written to every vdev (4 per disk) so the 
chances of all UBs being corrupted is rather low.

Thanks,
George

Darren Dunham wrote:>> DD> To reduce the chance of it affecting the integrety of the
filesystem,
>> DD> there are multiple copies of the UB written, each with a
checksum and a
>> DD> generation number.  When starting up a pool, the oldest
generation copy
>> DD> that checks properly will be used.  If the import can''t
find any valid
>> DD> UB, then it''s not going to have access to any data. 
Think of a UFS
>> DD> filesystem where all copies of the superblock are corrupt.
>>
>> Actually the latest UB, not the oldest.
> 
> My *other* oldest...  yeah.
>

Mark Maybee

2006-Dec-12 14:49 UTC

head link

[zfs-discuss] Uber block corruption?

Casper.Dik at Sun.COM wrote:>>Hello Casper,
>>
>>Tuesday, December 12, 2006, 10:54:27 AM, you wrote:
>>
>>
>>>>So ''a'' UB can become corrupt, but it is
unlikely that ''all'' UBs will
>>>>become corrupt through something that doesn''t also make
all the data
>>>>also corrupt or inaccessible.
>>
>>
>>CDSC> So how does this work for data which is freed and overwritten;
does
>>CDSC> the system make sure that none of the data referenced by any of
the
>>CDSC> old ueberblocks is ever overwritten?
>>
>>Why it should? If blocks are not used due to current UB I guess you
>>can safely assume they are free.
> 
> 
> 
> What if a newer UB is corrupted and you fall back to an older one?
> 
> Casper
> A block freed in transaction group N cannot be reused until transaction
group N+3; so there is no possibility of referencing an overwritten
block unless you have to back off more than two uberblocks.  At this
point, blocks that have been overwritten will show up as corrupted (bad
checksums).

-Mark

Toby Thain

2006-Dec-12 15:18 UTC

head link

[zfs-discuss] Uber block corruption?

On 12-Dec-06, at 9:46 AM, George Wilson wrote:
> Also note that the UB is written to every vdev (4 per disk) so the  
> chances of all UBs being corrupted is rather low.
Furthermore the time window where UBs are mutually inconsistent would  
be very short, since they''d be updated together?

--Toby
>
> Thanks,
> George
>
> Darren Dunham wrote:
>>> DD> To reduce the chance of it affecting the integrety of the  
>>> filesystem,
>>> DD> there are multiple copies of the UB written, each with a  
>>> checksum and a
>>> DD> generation number.  When starting up a pool, the oldest  
>>> generation copy
>>> DD> that checks properly will be used.  If the import
can''t find
>>> any valid
>>> DD> UB, then it''s not going to have access to any data.
Think of
>>> a UFS
>>> DD> filesystem where all copies of the superblock are corrupt.
>>>
>>> Actually the latest UB, not the oldest.
>> My *other* oldest...  yeah.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anton B. Rang

2006-Dec-12 16:17 UTC

head link

[zfs-discuss] Re: Uber block corruption?

> [...] there is no possibility of referencing an overwritten
> block unless you have to back off more than two uberblocks.  At this
> point, blocks that have been overwritten will show up as corrupted (bad
> checksums).
Hmmm.  Is there some way we can warn the user to scrub their pool because we had
trouble reading an ?berblock?  (Maybe some FMA rules about what to do if an
?berblock read fails?)
 
 
This message posted from opensolaris.org

Robert Milkowski

2006-Dec-12 18:02 UTC

head link

[zfs-discuss] Uber block corruption?

Hello Toby,

Tuesday, December 12, 2006, 4:18:54 PM, you wrote:

TT> On 12-Dec-06, at 9:46 AM, George Wilson wrote:
>> Also note that the UB is written to every vdev (4 per disk) so the  
>> chances of all UBs being corrupted is rather low.
It depends actually - if all your vdevs are on the same array with
write back cache set to on you actually can end-up with all UB
corrupted - at least in theory.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Darren Dunham

2006-Dec-13 00:56 UTC

head link

[zfs-discuss] Uber block corruption?

> Hello Toby,
> 
> Tuesday, December 12, 2006, 4:18:54 PM, you wrote:
> TT> On 12-Dec-06, at 9:46 AM, George Wilson wrote:
> 
> >> Also note that the UB is written to every vdev (4 per disk) so the
> >> chances of all UBs being corrupted is rather low.
> 
> It depends actually - if all your vdevs are on the same array with
> write back cache set to on you actually can end-up with all UB
> corrupted - at least in theory.
Do such caches respond to explicit flushes?  My understanding is that it
should try to flush between writing the front 2 and the back 2.

Not that even that would guarantee anything if there are real bugs in
the cache code, but it would improve the odds.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Anton B. Rang

2006-Dec-13 05:18 UTC

head link

[zfs-discuss] Re: Uber block corruption?

> Also note that the UB is written to every vdev (4 per disk) so the 
> chances of all UBs being corrupted is rather low.
The chances that they''re corrupted by the storage system, yes.

However, they are all sourced from the same in-memory buffer, so an undetected
in-memory error (e.g. kernel bug) will be replicated to all vdevs.
 
 
This message posted from opensolaris.org

Darren Dunham

2006-Dec-13 15:58 UTC

head link

[zfs-discuss] Re: Uber block corruption?

> > Also note that the UB is written to every vdev (4 per disk) so the 
> > chances of all UBs being corrupted is rather low.
> 
> The chances that they''re corrupted by the storage system, yes.
> 
> However, they are all sourced from the same in-memory buffer, so an
> undetected in-memory error (e.g. kernel bug) will be replicated to all
> vdevs.
Does a scrub attempt to read/verify UBs from the disk?  Does it only
read the current generation?

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Richard Elling

2006-Dec-13 17:36 UTC

head link

[zfs-discuss] Re: Uber block corruption?

Anton B. Rang wrote:>> Also note that the UB is written to every vdev (4 per disk) so the 
>> chances of all UBs being corrupted is rather low.
> 
> The chances that they''re corrupted by the storage system, yes.
> 
> However, they are all sourced from the same in-memory buffer, so 
> an undetected in-memory error (e.g. kernel bug) will be replicated 
> to all vdevs.
I view undetected in-memory errors from a hardware perspective,
not as a software bug.  Clearly, software bugs can exist, but
we presume testing will find these.  I''ll go out on a limb and
predict that this particular code is regularly exercised :-)

For the hardware, there are some gotchas.  Most of the low-end
(eg. buy at Fry''s) systems don''t have ECC memory.  I would be
very wary of using non-ECC memory in a server where data
integrity is important.  Spend the extra money, and you will be
much happier.
  -- richard

Anton B. Rang

2006-Dec-13 18:13 UTC

head link

[zfs-discuss] Re: Re: Uber block corruption?

> I view undetected in-memory errors from a hardware perspective,
> not as a software bug.  Clearly, software bugs can exist, but
> we presume testing will find these.
Sure.  My point is simply that, given that we have a monolithic kernel, any bug
in kernel or driver code can corrupt any memory in the kernel. I once saw a UFS
superblock with an embedded TCP packet.

Hardware errors are possible as well (particularly without parity or ECC) but
IMHO less likely. But ECC memory is a very, very good idea.

Anton
 
 
This message posted from opensolaris.org

zfs discuss - Dec 2006 - Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Re: Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Uber block corruption?

[zfs-discuss] Re: Uber block corruption?

[zfs-discuss] Re: Uber block corruption?

[zfs-discuss] Re: Uber block corruption?

[zfs-discuss] Re: Re: Uber block corruption?