thr3ads.net - zfs discuss - [zfs-discuss] The ZFS MOS and how DNODES are stored [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Bill Moloney

2007-Feb-06 19:45 UTC

[zfs-discuss] The ZFS MOS and how DNODES are stored

ZFS documentation lists snapshot limits on any single file system in a pool at
2**48 snaps, and that seems to logically imply that a snap on a file system does
not require an update to the pool?s currently active uberblock.  That is to say,
that if we take a snapshot of a file system in a pool, and then make any changes
to that file system, the copy on write behavior induced by the changes will stop
at some synchronization point below the uberblock (presumably at or below the
DNODE that is the DSL directory for that file system).  In-place updates to a
DNODE that has been allocated in a single sector sized ZFS block can be
considered atomic, since the sector write will either succeed or fail totally,
leaving either the old version or the new version, but not a combination of the
two.  This seems sensible to me, but the description of object sets beginning on
page 26 of the ZFS On-Disk Specification, states that the DNODE type
DMU_OT_DNODE (the type of the DNODE that?s included in the 1KB objset_phys_t
structure) will have a data load of an array of DNODES allocated in 128KB
blocks, and the picture (Illustration 12 in the spec) shows these blocks as
containing 1024 DNODES.  Since DNODES are 512 bytes, it would not be possible to
fit the 1024 DNODES depicted in the illustration and if DNODES did live in such
an array then they could not be atomically updated in-place.  If the blocks in
question were actually filled with an array of block pointers pointing to single
sector sized blocks that each held a DNODE then this would account for the 1024
entries per 128KB block shown, since block pointers are 128 bytes (not the 512
bytes of a DNODE), but in this case wouldn?t such 128KB blocks be considered to
be indirect block pointers, forcing the dn_nlevels field shown in the object set
DNODE at the top left of Illustration 12 to be 2, instead of the 1 that?s there
?  I?m further confused by the  illustration?s use of dotted lines to project
the contents of a structure field (as seen in the projection of the metadnode
field of the objset_phys_t structure found at the top of the picture) and arrows
to represent pointers (as seen in the projection of the block pointer array of
the DMU-OT-DNODE type dnode, also at the top of the picture), but the blocks
pointed to by these block pointers seem to actually contain instances of DNODES
(as seen from the projection of one of these instances in the lower part of the
picture).  Should this projection be replaced by a pointer to the lower DNODE ?
 
 
This message posted from opensolaris.org

Darren Dunham

2007-Feb-07 16:00 UTC

head link

[zfs-discuss] The ZFS MOS and how DNODES are stored

> ZFS documentation lists snapshot limits on any single file system in a
> pool at 2**48 snaps, and that seems to logically imply that a snap on
> a file system does not require an update to the pool???s
> currently active uberblock.
All commited changes (including snapshot creation) require a new
uberblock to be written.
> That is to say, that if we take a snapshot of a file system in a pool,
> and then make any changes to that file system, the copy on write
> behavior induced by the changes will stop at some synchronization
> point below the uberblock (presumably at or below the DNODE that is
> the DSL directory for that file system).  In-place updates to a DNODE
> that has been allocated in a single sector sized ZFS block can be
> considered atomic, since the sector write will either succeed or fail
> totally, leaving either the old version or the new version, but not a
> combination of the two.
There are no in-place updates.  Any updates to a node would also require
updating it''s parent to make the checksum in that block consistent (and
so on back up the tree).  So instead a new block is written.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Bill Moloney

2007-Feb-07 21:14 UTC

head link

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

Thanks for the input Darren, but I''m still confused about DNODE
atomicity ... it''s difficult to imagine that a change that is made
anyplace in the zpool would require copy operations all the way back up to the
uberblock (e.g. if some single file in one of many file systems in a zpool was
suddenly changed, making a new copy of all of the interceeding objects in the
tree back to the uberblock would seem to be an untenable amount of work even
though it may all be carried out in memory and not involve any IO, although if
the zpool itself was under snapshot control this would have to happen) ... the
DNODE implementation appears to include its own checksum field
(self-checksumming), and controlling
DNODEs (those that lead to decendent collections of DNODEs) are always of the
known type DMU_OT_DNODE and so their block pointers do not have to checksum the
DNODEs they point to (unlike all other block pointers that do cehcksum the data
they point to) ... this would allow for inplace updates of a DNODE, without the
need to continue further up the tree ... since all objects are controlled by a
DNODE, updates to an object''s data can stop at its DNODE if that DNODE
is not under some snapshot or clone control ... if this is not the case, than
''any'' modification in the zpool would require copying up to
the uberblock
 
 
This message posted from opensolaris.org

Darren Dunham

2007-Feb-07 21:47 UTC

head link

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

>  Thanks for the input Darren, but I''m still confused about DNODE
> atomicity ... it''s difficult to imagine that a change that is made
> anyplace in the zpool would require copy operations all the way back
> up to the uberblock (e.g. if some single file in one of many file
> systems in a zpool was suddenly changed, making a new copy of all of
> the interceeding objects in the tree back to the uberblock would seem
> to be an untenable amount of work even though it may all be carried
> out in memory and not involve any IO, although if the zpool itself was
> under snapshot control this would have to happen)
How many objects need to change?  Not that many.

 ... the DNODE> implementation appears to include its own checksum field
> (self-checksumming), and controlling DNODEs (those that lead to
> decendent collections of DNODEs) are always of the known type
> DMU_OT_DNODE and so their block pointers do not have to checksum the
> DNODEs they point to (unlike all other block pointers that do cehcksum
> the data they point to) ... this would allow for inplace updates of a
> DNODE, without the need to continue further up the tree ... since all
> objects are controlled by a DNODE, updates to an object''s data can
> stop at its DNODE if that DNODE is not under some snapshot or clone
> control ... if this is not the case, than ''any''
modification in the
> zpool would require copying up to the uberblock
I will have to go back and look at the dnode stuff in the specification.
But everything I know about it suggests that any committed change to the
filesystem structure (snapshots included) will require writing a new
uberblock.

Certainly the uberblock is updated periodically anyway.  Perhaps someone
knows an easy way to display the uberblock generation number so it can
be viewed as changes are occuring?


-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Matthew Ahrens

2007-Feb-08 22:57 UTC

head link

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

Bill Moloney wrote:> Thanks for the input Darren, but I''m still confused about DNODE
> atomicity ... it''s difficult to imagine that a change that is made
> anyplace in the zpool would require copy operations all the way back
> up to the uberblock 
This is in fact what happens.  However, these changes are all batched up 
(into a transaction group, or "txg"), so the overhead is minimal.

 > the DNODE> implementation appears to include its own checksum field
> (self-checksumming), 
That is not the case.  Only the uberblock and intent log blocks are 
self-checksumming.

 > if this is not the case, than
''any''> modification in the zpool would require copying up to the uberblock
That''s correct, any modifications require modifying the uberblock (with
the exception of intent log writes).

FYI, dnodes are not involved with the snapshot mechanism.  Snapshotting 
happens at the dsl dataset layer, while dnodes are implemented above 
that in the dmu layer.  Check out dsl_dataset.[ch].

--matt

Apparently Analagous Threads

Search for more seemingly similar threads

zfs discuss - Feb 2007 - The ZFS MOS and how DNODES are stored

[zfs-discuss] The ZFS MOS and how DNODES are stored

[zfs-discuss] The ZFS MOS and how DNODES are stored

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

[zfs-discuss] Re: The ZFS MOS and how DNODES are stored

Apparently Analagous Threads