Hi, A basic question regarding how zil works: For asynchronous write, will zil be used? For synchronous write, and if io is small, will the whole io be place on zil? or just the pointer be save into zil? what about large size io? Regards Victor -- This message posted from opensolaris.org
On Jul 20, 2010, at 3:09 AM, v wrote:> Hi, > A basic question regarding how zil works:The seminal blog on how the ZIL works is http://blogs.sun.com/perrin/entry/the_lumberjack> For asynchronous write, will zil be used?No.> For synchronous write, and if io is small, will the whole io be place on zil? or just the pointer be save into zil? what about large size io?Yes :-) In recent releases, there is better control over this behaviour. Roch sums up the logbias property at http://blogs.sun.com/roch/entry/synchronous_write_bias_property -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of v > > For synchronous write, and if io is small, will the whole io be place > on zil? or just the pointer be save into zil? what about large size io?This one doesn''t have a really clear answer. The best answer, I believe, is here: http://opensolaris.org/jive/thread.jspa?messageID=486531&tstart=0 Scroll down to about 1/3. There is a message from perrin, as follows:> If there''s a slog then the data, regardless of size, gets written to > the > slog. > > If there''s no slog and if the data size is greater than > zfs_immediate_write_sz/zvol_immediate_write_sz > (both default to 32K) then the data is written as a block into the pool > and the block pointer > written into the log record. This is the WR_INDIRECT write type. > > So Matt and Roy are both correct. > > But wait, there''s more complexity!: > > If logbias=throughput is set we always use WR_INDIRECT. > > If we just wrote more than 1MB for a single zil commit and there''s more > than 2MB waiting > then we start using the main pool. > > Clear as mud? This is likely to change again...
Here is another very recent blog post from ConstantThinking: http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained Very well done, a highly recommended read. Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
v writes: > Hi, > A basic question regarding how zil works: > For asynchronous write, will zil be used? > For synchronous write, and if io is small, will the whole io be place on zil? or just the pointer be save into zil? what about large size io? > Let me try. ZIL : code and data structure to track system calls into a zvol or zfs filesystem LOG : stable storage log managed by the zil keeping track of synchronous operation. SLOG: log device seperate from the regular pool of disk; typically an SSD or NVRAM based. For asynchronous writes, the ZIL keeps track of those operations; but does not write stable LOG records unless an fsync is issued. Of course we recently added zfs property "sync". If set to sync=always; then there are no more asynchronous writes. For synchronous writes, the ZIL keeps track of those operations and generates a stable LOG record. There are 2 options open to the ZIL here. Either issue an I/O for a full record and another I/O that points to it, or issue a single I/O containing ZIL metadata and file data. When issuing a 1 Byte synchronous write; it''s intuitively best to have a single I/O with the 1Byte of new data (partial zfs record) and all the ZIL metadata to handle it. Later during a Pool TXG, the whole record will be updated in the main disk pool. For a large synchronous write, it''s best to have the modified whole records be sent into the main disk pool and have the zil record only track pointers to the modified records. In between you need to make a decision between the 2 options. That decision depends on, the write size, the recordsize, the presence of log devices, the logbias setting, the current load on a given filesystem etc. The goal here is both to handle the current operation as fast as possible but also keep SLOG device available for fast handling of synchronous writes by other threads. So it''s a fairly complex set of requirements but it seems to be evolving in the right direction. -r > Regards > Victor > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss