Ok here''s the thing ... A customer has some big tier 1 storage, and has presented 24 LUNs (from four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC bridge (using some of the cool features of ZFS along the way). The OI box currently has 32GB configured for the ARC, and 4x 223GB SSDs for L2ARC. It has a dual port QLogic HBA, and is currently configured to do round-robin MPXIO over two 4Gbps links. The iSCSI traffic is over a dual 10Gbps card (rather like the one Sun used to sell). I''ve just built a fresh pool, and have created 20x 100GB zvols which are mapped to iSCSI clients. I have initialised the first 20GB of each zvol with random data. I''ve had a lot of success with write performance (e.g. in earlier tests I had 20 parallel streams writing 100GB each at over 600MB/sec aggregate), but read performance is very poor. Right now I''m just playing with 20 parallel streams of reads from the first 2GB of each zvol (i.e. 40GB in all). During each run, I see lots of writes to the L2ARC, but less than a quarter the volume of reads. Yet my FC LUNS are hot with 1000s of reads per second. This doesn''t change from run to run. Why? Surely 20x 2GB of data (and it''s associated metadata) will sit nicely in 4x 223GB SSDs? Phil
I''ll throw out some (possibly bad) ideas. Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn''t add any value for this test. Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren''t taking writes fast enough, then maybe they can''t effectively load for caching. Certainly if they are saturated for writes they can''t do much for reads. Are some of the reads sequential? Sequential reads don''t go to L2ARC. What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v? -- This message posted from opensolaris.org
On 07/06/2011 20:34, Marty Scholes wrote:> I''ll throw out some (possibly bad) ideas.Thanks for taking the time.> Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn''t add any value for this test. > > Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren''t taking writes fast enough, then maybe they can''t effectively load for caching. Certainly if they are saturated for writes they can''t do much for reads.The SSDs are barely ticking over, and can deliver almost as much throughput as the current SAN storage.> Are some of the reads sequential? Sequential reads don''t go to L2ARC.That''ll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also. I do notice a benefit with a sequential update (i.e. COW for each block), and I think this is because the L2ARC satisfies most of the metadata reads instead of having to read them from the SAN.> What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?
You have un balance setup Fc 4gbps vs 10gbps nic After 10b/8b encoding it is even worse, but this not yet impact your benchmark yet Sent from my iPad Hung-Sheng Tsao ( LaoTsao) Ph.D On Jun 7, 2011, at 5:46 PM, Phil Harman <phil.harman at gmail.com> wrote:> On 07/06/2011 20:34, Marty Scholes wrote: >> I''ll throw out some (possibly bad) ideas. > > Thanks for taking the time. > >> Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn''t add any value for this test. >> >> Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren''t taking writes fast enough, then maybe they can''t effectively load for caching. Certainly if they are saturated for writes they can''t do much for reads. > > The SSDs are barely ticking over, and can deliver almost as much throughput as the current SAN storage. > >> Are some of the reads sequential? Sequential reads don''t go to L2ARC. > > That''ll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also. > > I do notice a benefit with a sequential update (i.e. COW for each block), and I think this is because the L2ARC satisfies most of the metadata reads instead of having to read them from the SAN. > >> What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v? > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On 07/06/2011 22:57, LaoTsao wrote:> You have un balance setup > Fc 4gbps vs 10gbps nicIt''s actually 2x 4Gbps (using MPXIO) vs 1x 10Gbps.> After 10b/8b encoding it is even worse, but this not yet impact your benchmark yet > > Sent from my iPad > Hung-Sheng Tsao ( LaoTsao) Ph.D > > On Jun 7, 2011, at 5:46 PM, Phil Harman<phil.harman at gmail.com> wrote: > >> On 07/06/2011 20:34, Marty Scholes wrote: >>> I''ll throw out some (possibly bad) ideas. >> Thanks for taking the time. >> >>> Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn''t add any value for this test. >>> >>> Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren''t taking writes fast enough, then maybe they can''t effectively load for caching. Certainly if they are saturated for writes they can''t do much for reads. >> The SSDs are barely ticking over, and can deliver almost as much throughput as the current SAN storage. >> >>> Are some of the reads sequential? Sequential reads don''t go to L2ARC. >> That''ll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also. >> >> I do notice a benefit with a sequential update (i.e. COW for each block), and I think this is because the L2ARC satisfies most of the metadata reads instead of having to read them from the SAN. >> >>> What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v? >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > Are some of the reads sequential? Sequential reads > don''t go to L2ARC. > > That''ll be it. I assume the L2ARC is just taking > metadata. In situations > such as mine, I would quite like the option of > routing sequential read > data to the L2ARC also.The good news is that it is almost a certaintly that actual iSCSI usage will be of a (more) random nature than your tests, suggesting higher L2ARC usage in real world application. I''m not sure how zfs makes the distinction between a random and sequential read, but the more you think about it, not caching sequential requests makes sense. -- This message posted from opensolaris.org
On 08/06/2011 14:35, Marty Scholes wrote:>>> Are some of the reads sequential? Sequential reads >> don''t go to L2ARC. >> >> That''ll be it. I assume the L2ARC is just taking >> metadata. In situations >> such as mine, I would quite like the option of >> routing sequential read >> data to the L2ARC also. > The good news is that it is almost a certaintly that actual iSCSI usage will be of a (more) random nature than your tests, suggesting higher L2ARC usage in real world application. > > I''m not sure how zfs makes the distinction between a random and sequential read, but the more you think about it, not caching sequential requests makes sense.Yes, in most cases, but I can think of some counter examples ;)
On Jun 7, 2011, at 9:12 AM, Phil Harman wrote:> Ok here''s the thing ... > > A customer has some big tier 1 storage, and has presented 24 LUNs (from four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC bridge (using some of the cool features of ZFS along the way). The OI box currently has 32GB configured for the ARC, and 4x 223GB SSDs for L2ARC. It has a dual port QLogic HBA, and is currently configured to do round-robin MPXIO over two 4Gbps links. The iSCSI traffic is over a dual 10Gbps card (rather like the one Sun used to sell).The ARC size is not big enough to hold the data for the L2ARC headers for the size of the L2ARC.> > I''ve just built a fresh pool, and have created 20x 100GB zvols which are mapped to iSCSI clients. I have initialised the first 20GB of each zvol with random data. I''ve had a lot of success with write performance (e.g. in earlier tests I had 20 parallel streams writing 100GB each at over 600MB/sec aggregate), but read performance is very poor. > > Right now I''m just playing with 20 parallel streams of reads from the first 2GB of each zvol (i.e. 40GB in all). During each run, I see lots of writes to the L2ARC, but less than a quarter the volume of reads. Yet my FC LUNS are hot with 1000s of reads per second. This doesn''t change from run to run. Why?Writes to the L2ARC devices are throttled to 8 or 16 MB/sec. If the L2ARC fill cannot keep up, the data is unceremoniously evicted.> Surely 20x 2GB of data (and it''s associated metadata) will sit nicely in 4x 223GB SSDs?On Jun 7, 2011, at 12:34 PM, Marty Scholes wrote:> I''ll throw out some (possibly bad) ideas. > > Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn''t add any value for this test. > > Are the SSD devices saturated from an I/O standpoint? Put another way, can ZFS put data to them fast enough? If they aren''t taking writes fast enough, then maybe they can''t effectively load for caching. Certainly if they are saturated for writes they can''t do much for reads. > > Are some of the reads sequential? Sequential reads don''t go to L2ARC.This is not a true statement. If the primarycache policy is set to the default, all data will be cached in the ARC.> > What does iostat say for the SSD units? What does arc_summary.pl (maybe spelled differently) say about the ARC / L2ARC usage? How much of the SSD units are in use as reported in zpool iostat -v?The ARC statistics are nicely documented in arc.c and available as kstats. -- richard
> This is not a true statement. If the primarycache > policy is set to the default, all data will > be cached in the ARC.Richard, you know this stuff so well that I am hesitant to disagree with you. At the same time, I have seen this myself, trying to load video files into L2ARC without success.> The ARC statistics are nicely documented in arc.c and > available as kstats.And I looked in the source. My C is a little rusty, yet it appears that prefetch items are not stored in L2ARC by default. Prefetches will satisfy a good portion of sequential reads but won''t go to L2ARC. -- This message posted from opensolaris.org
On Wed, Jun 08, 2011 at 11:44:16AM -0700, Marty Scholes wrote:> And I looked in the source. My C is a little rusty, yet it appears > that prefetch items are not stored in L2ARC by default. Prefetches > will satisfy a good portion of sequential reads but won''t go to > L2ARC.Won''t go to L2ARC while they''re still speculative reads, maybe. Once they''re actually used by the app to satisfy a good portion of the actual reads, they''ll have hits stats and will. I suspect the problem is the threshold for l2arc writes. Sequential reads can be much faster than this rate, meaning it can take a lot of effort/time to fill. You could test by doing slow sequential reads, and see if the l2arc fills any more for the same reads spread over a longer time. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110609/a895f56a/attachment.bin>