Andrew.Rutz at Sun.COM
2009-Nov-25 19:55 UTC
[zfs-discuss] zfs: questions on ARC membership based on type/ordering of Reads/Writes
I am trying to understand the ARC''s behavior based on different permutations of (a)sync Reads and (a)sync Writes. thank you, in advance o does the data for a *sync-write* *ever* go into the ARC? eg, my understanding is that the data goes to the ZIL (and the SLOG, if present), but how does it get from the ZIL to the ZIO layer? eg, does it go to the ARC on its way to the ZIO ? o if the sync-write-data *does* go to the ARC, does it go to the ARC *after* it is written to the ZIL''s backing-store, or does the data go to the ZIL and the ARC in parallel ? o if a sync-write''s data goes to the ARC and ZIL *in parallel*, then does zfs prevent an ARC-hit until the data is confirmed to be on the ZIL''s nonvolatile media (eg, disk-platter or SLOG) ? or could a Read get an ARC-hit on a block *before* it''s written to zil''s backing-store? o is the DMU where the Serialization of transactions occurs? o if an async-Write for block-X hits the Serializer before a Read for block-X hits the Serializer, i am assuming the Read can "pass" the async-Write; eg, the Read is *not* pended behind the async-write. however, if a Read hits the Serializer after a *sync*-write, then i''m assuming the Read is pended until the sync-write is written to the ZIL''s nonvolatile media. o if a Read "passes" an async-write, then i''m assuming the Read can be satisfied by either the arc, l2arc, or disk. o it''s stated that "the L2ARC is for random-reads". however, there''s nothing to prevent the L2ARC from containing blocks derived from *sequential*-reads, right ? also, blocks from async-writes can also live in l2arc, right? how about sync-writes ? o is the l2arc literally simply a *larger* ARC? eg, does the l2arc obey the normal cache property where "everything that is in the L1$ (eg, ARC) is also in the L2$ (eg, l2arc)" ? (I have a feeling that the set-theoretic intersection of ARC and L2ARC is empty (for some reason). o does the l2arc use the ARC algorithm (as the name suggests) ? thank you, /andrew Solaris RPE
Richard Elling
2009-Nov-25 23:36 UTC
[zfs-discuss] zfs: questions on ARC membership based on type/ordering of Reads/Writes
On Nov 25, 2009, at 11:55 AM, Andrew.Rutz at Sun.COM wrote:> I am trying to understand the ARC''s behavior based on different > permutations of (a)sync Reads and (a)sync Writes. > > thank you, in advance > > > o does the data for a *sync-write* *ever* go into the ARC?always> eg, my understanding is that the data goes to the ZIL (and > the SLOG, if present), but how does it get from the ZIL to the ZIO > layer?ZIL is effectively write-only. It is only read when the pool is imported.> eg, does it go to the ARC on its way to the ZIO ?ARC is the cache for buffering data.> o if the sync-write-data *does* go to the ARC, does it go to > the ARC *after* it is written to the ZIL''s backing-store, > or does the data go to the ZIL and the ARC in parallel ?A sync write returns when the data is written to the ZIL. An async write returns when the data is in the ARC, and later the unwritten contents of the ARC are pushed to the pool when the transaction group is committed.> o if a sync-write''s data goes to the ARC and ZIL *in parallel*, > then does zfs prevent an ARC-hit until the data is confirmed > to be on the ZIL''s nonvolatile media (eg, disk-platter or SLOG) ? > or could a Read get an ARC-hit on a block *before* it''s written > to zil''s backing-store?In my mind, the ARC and ZIL are orthogonal.> o is the DMU where the Serialization of transactions occurs?Serialization?> o if an async-Write for block-X hits the Serializer before a Read > for block-X hits the Serializer, i am assuming the Read can > "pass" the async-Write; eg, the Read is *not* pended behind the > async-write. however, if a Read hits the Serializer after a > *sync*-write, then i''m assuming the Read is pended until > the sync-write is written to the ZIL''s nonvolatile media. > o if a Read "passes" an async-write, then i''m assuming the Read > can be satisfied by either the arc, l2arc, or disk.I think you are asking if write order is preserved. The answer is yes.> o it''s stated that "the L2ARC is for random-reads". however, there''s > nothing to prevent the L2ARC from containing blocks derived from > *sequential*-reads, right ? also, blocks from async-writes can > also live in l2arc, right? how about sync-writes ?Blocks which are not yet committed to the pool are locked in the ARC so they can''t be evicted. Once committed, the lock is removed.> o is the l2arc literally simply a *larger* ARC? eg, does the l2arc > obey the normal cache property where "everything that is in the L1$ > (eg, ARC) is also in the L2$ (eg, l2arc)" ? (I have a feeling that > the set-theoretic intersection of ARC and L2ARC is empty (for some > reason).No. The L2ARC is not in the datapath between the ARC and media. Further, data is not evicted from the ARC into the L2ARC. Rather, the L2ARC is filled from data near the eviction ends of the MRU and MFU lists. The movement of data to the L2ARC is throttled and grouped in sequence, improving efficiency for devices which like large writes, such as read-optimized flash. Think of it this way. Data which is in the ARC is fed into the L2ARC. If the data is later evicted from the ARC, it can still live in the L2ARC. When the L2ARC has lower read latency then the pool''s media, then it can improve performance because the data can be read from L2ARC instead of the pool. This fits the general definition of a cache, but does not work the same way as multilevel CPU caches.> o does the l2arc use the ARC algorithm (as the name suggests) ?Yes, but it really isn''t separate from the ARC, from a management point of view. To fully understand it, you need to know about how the metadata for each buffer in the ARC is managed. This will introduce the concept of the ghosts, and the L2ARC is a simple extension. The comments in the source are nicely descriptive, and you might consider reading them through once, even if you don''t dive into the code itself: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c -- richard
Andrew.Rutz at Sun.COM
2009-Dec-09 21:57 UTC
[zfs-discuss] Resend : zfs: questions on ARC membership based on type/ordering of Reads/Writes
hi, i''m re-sending this because I''m hoping that someone has some answers to the following questions. I''m working a hot Escalation on AmberRoad and am trying to understand what''s under zfs'' hood. thanks Solaris RPE /andrew rutz On 11/25/09 13:55, Andrew.Rutz at Sun.COM wrote:> I am trying to understand the ARC''s behavior based on different > permutations of (a)sync Reads and (a)sync Writes. > > thank you, in advance > > > o does the data for a *sync-write* *ever* go into the ARC? > eg, my understanding is that the data goes to the ZIL (and > the SLOG, if present), but how does it get from the ZIL to the ZIO layer? > eg, does it go to the ARC on its way to the ZIO ? > o if the sync-write-data *does* go to the ARC, does it go to > the ARC *after* it is written to the ZIL''s backing-store, > or does the data go to the ZIL and the ARC in parallel ? > o if a sync-write''s data goes to the ARC and ZIL *in parallel*, > then does zfs prevent an ARC-hit until the data is confirmed > to be on the ZIL''s nonvolatile media (eg, disk-platter or SLOG) ? > or could a Read get an ARC-hit on a block *before* it''s written > to zil''s backing-store? > > > o is the DMU where the Serialization of transactions occurs? > > o if an async-Write for block-X hits the Serializer before a Read > for block-X hits the Serializer, i am assuming the Read can > "pass" the async-Write; eg, the Read is *not* pended behind the > async-write. however, if a Read hits the Serializer after a > *sync*-write, then i''m assuming the Read is pended until > the sync-write is written to the ZIL''s nonvolatile media. > o if a Read "passes" an async-write, then i''m assuming the Read > can be satisfied by either the arc, l2arc, or disk. > > o it''s stated that "the L2ARC is for random-reads". however, there''s > nothing to prevent the L2ARC from containing blocks derived from > *sequential*-reads, right ? also, blocks from async-writes can > also live in l2arc, right? how about sync-writes ? > > o is the l2arc literally simply a *larger* ARC? eg, does the l2arc > obey the normal cache property where "everything that is in the L1$ > (eg, ARC) is also in the L2$ (eg, l2arc)" ? (I have a feeling that > the set-theoretic intersection of ARC and L2ARC is empty (for some > reason). > o does the l2arc use the ARC algorithm (as the name suggests) ? > > thank you, > > /andrew > Solaris RPE > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Andrew Rutz andrew.rutz at sun.com Solaris RPE Ph: (x64089) 512-401-1089 Austin, TX 78727 Fax: 512-401-1452
Richard Elling
2009-Dec-10 00:59 UTC
[zfs-discuss] Resend : zfs: questions on ARC membership based on type/ordering of Reads/Writes
I replied... maybe I don''t count anymore, boo hoo :-) http://opensolaris.org/jive/thread.jspa?threadID=118667&tstart=15 -- richard On Dec 9, 2009, at 1:57 PM, Andrew.Rutz at Sun.COM wrote:> hi, > i''m re-sending this because I''m hoping that someone has some answers > to the following questions. I''m working a hot Escalation on AmberRoad > and am trying to understand what''s under zfs'' hood. > > thanks > Solaris RPE > /andrew rutz > > On 11/25/09 13:55, Andrew.Rutz at Sun.COM wrote: >> I am trying to understand the ARC''s behavior based on different >> permutations of (a)sync Reads and (a)sync Writes. >> thank you, in advance >> o does the data for a *sync-write* *ever* go into the ARC? >> eg, my understanding is that the data goes to the ZIL (and >> the SLOG, if present), but how does it get from the ZIL to the ZIO >> layer? >> eg, does it go to the ARC on its way to the ZIO ? >> o if the sync-write-data *does* go to the ARC, does it go to >> the ARC *after* it is written to the ZIL''s backing-store, >> or does the data go to the ZIL and the ARC in parallel ? >> o if a sync-write''s data goes to the ARC and ZIL *in parallel*, >> then does zfs prevent an ARC-hit until the data is confirmed >> to be on the ZIL''s nonvolatile media (eg, disk-platter or >> SLOG) ? >> or could a Read get an ARC-hit on a block *before* it''s written >> to zil''s backing-store? >> o is the DMU where the Serialization of transactions occurs? >> o if an async-Write for block-X hits the Serializer before a Read >> for block-X hits the Serializer, i am assuming the Read can >> "pass" the async-Write; eg, the Read is *not* pended behind the >> async-write. however, if a Read hits the Serializer after a >> *sync*-write, then i''m assuming the Read is pended until >> the sync-write is written to the ZIL''s nonvolatile media. >> o if a Read "passes" an async-write, then i''m assuming the Read >> can be satisfied by either the arc, l2arc, or disk. >> o it''s stated that "the L2ARC is for random-reads". however, there''s >> nothing to prevent the L2ARC from containing blocks derived from >> *sequential*-reads, right ? also, blocks from async-writes can >> also live in l2arc, right? how about sync-writes ? >> o is the l2arc literally simply a *larger* ARC? eg, does the l2arc >> obey the normal cache property where "everything that is in the L1$ >> (eg, ARC) is also in the L2$ (eg, l2arc)" ? (I have a feeling that >> the set-theoretic intersection of ARC and L2ARC is empty (for some >> reason). >> o does the l2arc use the ARC algorithm (as the name suggests) ? >> thank you, >> /andrew >> Solaris RPE >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Andrew Rutz andrew.rutz at sun.com > Solaris RPE Ph: (x64089) 512-401-1089 > Austin, TX 78727 Fax: 512-401-1452 > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss