thr3ads.net - zfs code - [zfs-code] data missing in ZIL replay [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Stan Park

2008-Aug-22 16:49 UTC

[zfs-code] data missing in ZIL replay

I''m working on doing I/O working with only the DMU and ZIL interfaces
from the ZFS-FUSE source. I think I''ve got things working except for
ZIL
replays. For large writes and when slogging is not available, the itx
write state is set to WR_INDIRECT. If I understand the ZIL correctly, this
means that the actual data to be written isn''t a part of the
object''s
associated ZIL entry until the ZIL is being flushed through a zil_commit()
call. In the WR_INDIRECT case, a blkptr is retrieved through a
zfs_get_data() -> dmu_sync() call and stored in the ZIL entry on disk..

When I force a crash and try to run through a replay, the
zfs_replay_write() call does this: ''char *data = (char *)(lr + 1);     
/* data
follows lr_write_t */'' It seems to me that this only works when the
WR_COPIED state is used since the data should be pointed to by
lr->lr_blkptr in the WR_INDIRECT case. Indeed, when running a replay of
WR_INDIRECT txs, all the write data is missing. Do WR_COPIED and
WR_INDIRECT use different code paths during replay?
--
This messages posted from opensolaris.org

Neil Perrin

2008-Sep-04 01:31 UTC

head link

[zfs-code] data missing in ZIL replay

On 08/22/08 10:49, Stan Park wrote:> I''m working on doing I/O working with only the DMU and ZIL
interfaces
> from the ZFS-FUSE source. I think I''ve got things working except
for ZIL
> replays. For large writes and when slogging is not available, the itx
> write state is set to WR_INDIRECT. If I understand the ZIL correctly, this
> means that the actual data to be written isn''t a part of the
object''s
> associated ZIL entry until the ZIL is being flushed through a zil_commit()
> call. In the WR_INDIRECT case, a blkptr is retrieved through a
> zfs_get_data() -> dmu_sync() call and stored in the ZIL entry on disk..
Correct. A large block is written directly into the main pool and its block
pointer is saved in the zil record which is written in the intent log. When
the txg later commits, the block is just linked into the txg tree of blocks.
If a crash/power fail occurs before this, then
> 
> When I force a crash and try to run through a replay, the
> zfs_replay_write() call does this: ''char *data = (char *)(lr + 1);
/* data
> follows lr_write_t */'' It seems to me that this only works when
the
> WR_COPIED state is used since the data should be pointed to by
> lr->lr_blkptr in the WR_INDIRECT case. Indeed, when running a replay of
> WR_INDIRECT txs, all the write data is missing. Do WR_COPIED and
> WR_INDIRECT use different code paths during replay?
To make it easier later the data is read from the block when parsing the
log to extract the log records. This data read occurs in
zil_replay_log_record().
See the code after the comment: "If this is a TX_WRITE with a blkptr, suck
in the data".

BTW, we developed a tool to test the ZIL (see attached script ziltest), which
you may
be able to adapt to check out your work.

Neil.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ziltest
URL:
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20080903/7d051bd4/attachment.ksh>

zfs code - Aug 2008 - data missing in ZIL replay

[zfs-code] data missing in ZIL replay

[zfs-code] data missing in ZIL replay