thr3ads.net - zfs code - [zfs-code] A question about zfs code

If this information is useful, please help other people find it:
Share via:

jay xiong

2008-Feb-04 05:21 UTC

[zfs-code] A question about zfs code - dbuf related

I have two questions for zfs code:

In dbuf_hold_imple():

        /*                     
         * If this buffer is currently syncing out, and we are are
         * still referencing it from db_data, we need to make a copy
         * of it in case we decide we want to dirty it again in this txg.
         */                    
        if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID
&&
            dn->dn_object != DMU_META_DNODE_OBJECT &&
            db->db_state == DB_CACHED && db->db_data_pending) {
                dbuf_dirty_record_t *dr = db->db_data_pending;
               
                if (dr->dt.dl.dr_data == db->db_buf) {
                        arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
       
                        dbuf_set_data(db,
                           
arc_buf_alloc(db->db_dnode->dn_objset->os_spa,
                            db->db.db_size, db, type));
                        bcopy(dr->dt.dl.dr_data->b_data,
db->db.db_data,
                            db->db.db_size);
                }              
        }

In this piece of code, it will copy the data to the extra buffer if the 
current one is syncing into disks. My question is that why not deferring this
data copy to dbuf_dirty, where we check the db_data_pending and do the copy. The
similar case is in dbuf_sync_leaf():

        if (dn->dn_object != DMU_META_DNODE_OBJECT) {
                /*
                 * If this buffer is currently "in use" (i.e., there
are
                 * active holds and db_data still references it), then make
                 * a copy before we start the write so that any 
modifications
                 * from the open txg will not leak into this write.
                 *
                 * NOTE: this copy does not need to be made for objects only
                 * modified in the syncing context (e.g. DNONE_DNODE 
blocks).
                 */
                if (refcount_count(&db->db_holds) > 1 &&
*datap ==
db->db_buf) {
                        arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
                        *datap = arc_buf_alloc(os->os_spa, blksz, db, type);
                        bcopy(db->db.db_data, (*datap)->b_data, blksz);
                }
        } else {

Here it copies the data also if there are extra reference count on the 
syncing dbuf. What I don''t understand is why it need to do this 
operations since dbuf_dirty can handle this scenario perfectly.

And the further question, perhaps related to the above one, why does zfs 
need to release the arc buffer in dbuf_dirty? The comment says this is 
needed to protect other from reading the cached data block. But I don''t
know if there is other calling patch to hit the arc buffer except via 
dmu interface. Perhaps this is needed to protect the snapshot from reading the
new data?
                } else if (db->db.db_object != DMU_META_DNODE_OBJECT) {
                        /*
                         * Release the data buffer from the cache so that we
                         * can modify it without impacting possible 
other users
                         * of this cached data block.  Note that indirect
                         * blocks and private objects are not released 
until the
                         * syncing state (since they are only modified 
then).
                         */
                        arc_release(db->db_buf, db);
                        dbuf_fix_old_data(db, tx->tx_txg);
                        data_old = db->db_buf;
                }

Thanks,
Jay
--
This messages posted from opensolaris.org

Matthew Ahrens

2008-Feb-05 20:46 UTC

head link

[zfs-code] A question about zfs code - dbuf related

jay xiong wrote:> I have two questions for zfs code:
> 
> In dbuf_hold_imple():
> In this piece of code, it will copy the data to the extra buffer if the 
> current one is syncing into disks. My question is that why not deferring
this data copy to dbuf_dirty, where we check the db_data_pending and do the
copy. The similar case is in dbuf_sync_leaf():
> 
> Here it copies the data also if there are extra reference count on the 
> syncing dbuf. What I don''t understand is why it need to do this 
> operations since dbuf_dirty can handle this scenario perfectly.
The DMU provides an interface such that the db_data pointer will not change 
from the consumer''s point of view.  If we did the copy in dbuf_dirty(),
we
would have to change db_data (to the new copy).  We can not not make a new 
copy for the syncing thread, because we don''t know what state it is in
(eg,
it might have already given that pointer to the i/o subsystem to write).

Note that we need to check and possibly make a copy in both places (in 
dbuf_sync_leaf() and dbuf_hold_impl()).  The critical section is while the 
dbuf is being written out, so we must check at the beginning of that section 
(dbuf_sync_leaf), and also if we want to add a hold while we are in the 
critical section (dbuf_hold_impl).

Alternatively, dbuf_hold_impl() could simply wait for the dbuf to finish 
being written out, but that would hurt performance.
> And the further question, perhaps related to the above one, why does zfs 
> need to release the arc buffer in dbuf_dirty? The comment says this is 
> needed to protect other from reading the cached data block. But I
don''t
> know if there is other calling patch to hit the arc buffer except via 
> dmu interface. Perhaps this is needed to protect the snapshot from reading
the new data?
That is correct.  arc_release() essentially disassociates the arc_buf from 
its blkptr (ie, from the arc_buf_hdr, from the arc''s point of view),
creating
a new "copy" for the DMU to modify.  There may be many
arc_buf''s for one
blkptr because other datasets (eg, snapshots, clones) may reference the same 
block.

--matt

jinshan.xiong

2008-Feb-06 17:08 UTC

head link

[zfs-code] A question about zfs code - dbuf related

It''s our Chinese New Year today, best wishes to you.

Jay

Matthew Ahrens wrote:> jay xiong wrote:
>> I have two questions for zfs code:
>>
>> In dbuf_hold_imple():
>> In this piece of code, it will copy the data to the extra buffer if 
>> the current one is syncing into disks. My question is that why not 
>> deferring this data copy to dbuf_dirty, where we check the 
>> db_data_pending and do the copy. The similar case is in 
>> dbuf_sync_leaf():
>>
>> Here it copies the data also if there are extra reference count on 
>> the syncing dbuf. What I don''t understand is why it need to do
this
>> operations since dbuf_dirty can handle this scenario perfectly.
>
> The DMU provides an interface such that the db_data pointer will not 
> change from the consumer''s point of view.  If we did the copy in 
> dbuf_dirty(), we would have to change db_data (to the new copy).  We 
> can not not make a new copy for the syncing thread, because we
don''t
> know what state it is in (eg, it might have already given that pointer 
> to the i/o subsystem to write).
Yes, this makes sense.>
> Note that we need to check and possibly make a copy in both places (in 
> dbuf_sync_leaf() and dbuf_hold_impl()).  The critical section is while 
> the dbuf is being written out, so we must check at the beginning of 
> that section (dbuf_sync_leaf), and also if we want to add a hold while 
> we are in the critical section (dbuf_hold_impl).Understand, it would be much clear if we add an assertion in 
dbuf_hold_impl():

That is:

/*------------------------ code: dbuf_hold_impl() -----------------------*/
                           /*             
         * If this buffer is currently syncing out, and we are are
         * still referencing it from db_data, we need to make a copy
         * of it in case we decide we want to dirty it again in this txg.
         */            
        if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID
&&
            dn->dn_object != DMU_META_DNODE_OBJECT &&
            db->db_state == DB_CACHED && db->db_data_pending) {
                dbuf_dirty_record_t *dr = db->db_data_pending;
       
                if (dr->dt.dl.dr_data == db->db_buf) {
                        arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
               
+                      ASSERT(refcount_count(&db->db_holds) == 1);
+                      /* The dirty buffer is in syncing state, and no 
anybody else is grabbing this dbuf, it''s safe to reset
+                          the dbuf''s data here. We must do the data 
copy here because otherwise, we have no way to dirty this buf
+                          in case this holding is for a write. */

                        dbuf_set_data(db,
                           
arc_buf_alloc(db->db_dnode->dn_objset->os_spa,
                            db->db.db_size, db, type));
                        bcopy(dr->dt.dl.dr_data->b_data,
db->db.db_data,
                            db->db.db_size);
                }      
        }
/*------------------------ code end
--------------------------*/>
> Alternatively, dbuf_hold_impl() could simply wait for the dbuf to 
> finish being written out, but that would hurt performance.
>
>> And the further question, perhaps related to the above one, why does 
>> zfs need to release the arc buffer in dbuf_dirty? The comment says 
>> this is needed to protect other from reading the cached data block. 
>> But I don''t know if there is other calling patch to hit the
arc
>> buffer except via dmu interface. Perhaps this is needed to protect 
>> the snapshot from reading the new data?
>
> That is correct.  arc_release() essentially disassociates the arc_buf 
> from its blkptr (ie, from the arc_buf_hdr, from the arc''s point of
> view), creating a new "copy" for the DMU to modify.  There may be
many
> arc_buf''s for one blkptr because other datasets (eg, snapshots, 
> clones) may reference the same block.
>
> --matt

zfs code - Feb 2008 - A question about zfs code - dbuf related

[zfs-code] A question about zfs code - dbuf related

[zfs-code] A question about zfs code - dbuf related

[zfs-code] A question about zfs code - dbuf related