I have two questions for zfs code: In dbuf_hold_imple(): /* * If this buffer is currently syncing out, and we are are * still referencing it from db_data, we need to make a copy * of it in case we decide we want to dirty it again in this txg. */ if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID && dn->dn_object != DMU_META_DNODE_OBJECT && db->db_state == DB_CACHED && db->db_data_pending) { dbuf_dirty_record_t *dr = db->db_data_pending; if (dr->dt.dl.dr_data == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); dbuf_set_data(db, arc_buf_alloc(db->db_dnode->dn_objset->os_spa, db->db.db_size, db, type)); bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data, db->db.db_size); } } In this piece of code, it will copy the data to the extra buffer if the current one is syncing into disks. My question is that why not deferring this data copy to dbuf_dirty, where we check the db_data_pending and do the copy. The similar case is in dbuf_sync_leaf(): if (dn->dn_object != DMU_META_DNODE_OBJECT) { /* * If this buffer is currently "in use" (i.e., there are * active holds and db_data still references it), then make * a copy before we start the write so that any modifications * from the open txg will not leak into this write. * * NOTE: this copy does not need to be made for objects only * modified in the syncing context (e.g. DNONE_DNODE blocks). */ if (refcount_count(&db->db_holds) > 1 && *datap == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); *datap = arc_buf_alloc(os->os_spa, blksz, db, type); bcopy(db->db.db_data, (*datap)->b_data, blksz); } } else { Here it copies the data also if there are extra reference count on the syncing dbuf. What I don''t understand is why it need to do this operations since dbuf_dirty can handle this scenario perfectly. And the further question, perhaps related to the above one, why does zfs need to release the arc buffer in dbuf_dirty? The comment says this is needed to protect other from reading the cached data block. But I don''t know if there is other calling patch to hit the arc buffer except via dmu interface. Perhaps this is needed to protect the snapshot from reading the new data? } else if (db->db.db_object != DMU_META_DNODE_OBJECT) { /* * Release the data buffer from the cache so that we * can modify it without impacting possible other users * of this cached data block. Note that indirect * blocks and private objects are not released until the * syncing state (since they are only modified then). */ arc_release(db->db_buf, db); dbuf_fix_old_data(db, tx->tx_txg); data_old = db->db_buf; } Thanks, Jay -- This messages posted from opensolaris.org
jay xiong wrote:> I have two questions for zfs code: > > In dbuf_hold_imple(): > In this piece of code, it will copy the data to the extra buffer if the > current one is syncing into disks. My question is that why not deferring this data copy to dbuf_dirty, where we check the db_data_pending and do the copy. The similar case is in dbuf_sync_leaf(): > > Here it copies the data also if there are extra reference count on the > syncing dbuf. What I don''t understand is why it need to do this > operations since dbuf_dirty can handle this scenario perfectly.The DMU provides an interface such that the db_data pointer will not change from the consumer''s point of view. If we did the copy in dbuf_dirty(), we would have to change db_data (to the new copy). We can not not make a new copy for the syncing thread, because we don''t know what state it is in (eg, it might have already given that pointer to the i/o subsystem to write). Note that we need to check and possibly make a copy in both places (in dbuf_sync_leaf() and dbuf_hold_impl()). The critical section is while the dbuf is being written out, so we must check at the beginning of that section (dbuf_sync_leaf), and also if we want to add a hold while we are in the critical section (dbuf_hold_impl). Alternatively, dbuf_hold_impl() could simply wait for the dbuf to finish being written out, but that would hurt performance.> And the further question, perhaps related to the above one, why does zfs > need to release the arc buffer in dbuf_dirty? The comment says this is > needed to protect other from reading the cached data block. But I don''t > know if there is other calling patch to hit the arc buffer except via > dmu interface. Perhaps this is needed to protect the snapshot from reading the new data?That is correct. arc_release() essentially disassociates the arc_buf from its blkptr (ie, from the arc_buf_hdr, from the arc''s point of view), creating a new "copy" for the DMU to modify. There may be many arc_buf''s for one blkptr because other datasets (eg, snapshots, clones) may reference the same block. --matt
It''s our Chinese New Year today, best wishes to you. Jay Matthew Ahrens wrote:> jay xiong wrote: >> I have two questions for zfs code: >> >> In dbuf_hold_imple(): >> In this piece of code, it will copy the data to the extra buffer if >> the current one is syncing into disks. My question is that why not >> deferring this data copy to dbuf_dirty, where we check the >> db_data_pending and do the copy. The similar case is in >> dbuf_sync_leaf(): >> >> Here it copies the data also if there are extra reference count on >> the syncing dbuf. What I don''t understand is why it need to do this >> operations since dbuf_dirty can handle this scenario perfectly. > > The DMU provides an interface such that the db_data pointer will not > change from the consumer''s point of view. If we did the copy in > dbuf_dirty(), we would have to change db_data (to the new copy). We > can not not make a new copy for the syncing thread, because we don''t > know what state it is in (eg, it might have already given that pointer > to the i/o subsystem to write).Yes, this makes sense.> > Note that we need to check and possibly make a copy in both places (in > dbuf_sync_leaf() and dbuf_hold_impl()). The critical section is while > the dbuf is being written out, so we must check at the beginning of > that section (dbuf_sync_leaf), and also if we want to add a hold while > we are in the critical section (dbuf_hold_impl).Understand, it would be much clear if we add an assertion in dbuf_hold_impl(): That is: /*------------------------ code: dbuf_hold_impl() -----------------------*/ /* * If this buffer is currently syncing out, and we are are * still referencing it from db_data, we need to make a copy * of it in case we decide we want to dirty it again in this txg. */ if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID && dn->dn_object != DMU_META_DNODE_OBJECT && db->db_state == DB_CACHED && db->db_data_pending) { dbuf_dirty_record_t *dr = db->db_data_pending; if (dr->dt.dl.dr_data == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); + ASSERT(refcount_count(&db->db_holds) == 1); + /* The dirty buffer is in syncing state, and no anybody else is grabbing this dbuf, it''s safe to reset + the dbuf''s data here. We must do the data copy here because otherwise, we have no way to dirty this buf + in case this holding is for a write. */ dbuf_set_data(db, arc_buf_alloc(db->db_dnode->dn_objset->os_spa, db->db.db_size, db, type)); bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data, db->db.db_size); } } /*------------------------ code end --------------------------*/> > Alternatively, dbuf_hold_impl() could simply wait for the dbuf to > finish being written out, but that would hurt performance. > >> And the further question, perhaps related to the above one, why does >> zfs need to release the arc buffer in dbuf_dirty? The comment says >> this is needed to protect other from reading the cached data block. >> But I don''t know if there is other calling patch to hit the arc >> buffer except via dmu interface. Perhaps this is needed to protect >> the snapshot from reading the new data? > > That is correct. arc_release() essentially disassociates the arc_buf > from its blkptr (ie, from the arc_buf_hdr, from the arc''s point of > view), creating a new "copy" for the DMU to modify. There may be many > arc_buf''s for one blkptr because other datasets (eg, snapshots, > clones) may reference the same block. > > --matt