Hi, I am running a series of tests, and I am surprised by the results. If I use about 10 clients, then the write performance of my system is 2 GB/s, and the read performance of my system is also 2 GB/s. These are the results when I run either reads or writes, but not both at the same time. But, when I have 10 clients doing reads, and a different 10 clients doing writes, the write performance barely drops, but the read performance drops to about 150 MB/s. I am using the deadline scheduler. I have played around with various parameters, such as nr_requests, writes_starved, and fifo_batch, but with little impact. Can anyone suggest why this is occurring, and how to address it? In my idealized world, the read and write performance should be proportional to the initial read and write performance, and the ratio of the number of read and write clients. Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com http://www.terascala.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110531/af8f21fe/attachment.html
> If I use about 10 clients, then the write performance of my > system is 2 GB/s, and the read performance of my system is > also 2 GB/s. These are the results when I run either reads or > writes, but not both at the same time.Unfortunately these numbers are meaningless without an idea of the storage system and the access patterns.> But, when I have 10 clients doing reads, and a different 10 > clients doing writes, the write performance barely drops, but > the read performance drops to about 150 MB/s.This may be entirely the right thing for it to happen.> I am using the deadline scheduler.That''s a good choice in many cases to avoid starvation problems with CFQ.> Can anyone suggest why this is occurring, and how to address > it?Well depending on storage system and access patterns, the right figure might be 150MB/s and wrong figure the 2GB/s. You are not even saying whether the 10 and 10 are reading/writing to the same file(s) or different ones (which might matter a great deal for MDS/OSS/client inode sync), or how many client systems those threads run on. Anyhow, it could be because of Lustre caching writes but not reads (usually) which often results in writes being faster than reads (but usually by less than that) your storage system does not have the IOPS to do better, flusher issues, disk host adapter issues (lots have buggy elevators or caching policies), too much multithreading. it could be that the prefetcher of (awful) Linux page cache needs huge read-aheads to work well for sequential loads. It could be the clients caching instead, or limitations in the client networking, or in the switches you are using or their setup. Run ''iostat -xd 1'' on the OSSes to get a better idea of what is going on, and also ideally on the MDS. Also run on OSSes and MDS: watch -n1 cat /proc/meminfo and observe how ''Dirty'' and ''Writeback'' go. Try also to run concurrent send/receive (using 2 ''nuttcp'' instances) between clients and servers to check the networking is simulataneous. Some network chipsets (cheaper ones) can only process N packets per seconds, where N is less than what is necessary to simultaneously run both tranmitter and receiver at full speed (as a rule the transmitter wins, so if you got those in your OSSes bad news for reads).> In my idealized world, the read and write performance should > be proportional to the initial read and write performance, and > the ratio of the number of read and write clients.That''s movingly innocent and optimistic. Storage and system performance is extremely anisotropic, especially with Lustre and large storage systems in general.
> -----Original Message----- > From: Peter Grandi [mailto:pg_mh at mh.to.sabi.co.UK] > Sent: Thursday, June 02, 2011 4:10 PM > To: Roger Spellman > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Reads Starved > > Unfortunately these numbers are meaningless without an idea of > the storage system and the access patterns.The storage system is a Dell MD3200.> > > But, when I have 10 clients doing reads, and a different 10 > > clients doing writes, the write performance barely drops, but > > the read performance drops to about 150 MB/s. > > This may be entirely the right thing for it to happen.Each test is doing large block, streaming I/O. The reads and write are accessing different files. I am running 1 thread per client. If I increase the thread count on the reads, the performance does increase, but not nearly to the level of the writes. I had increased max_rpcs quite large, as this gives excellent performance. However, this turns out to be the cause of this read/write disparity. The system can queue up many writes, but reads are issued one at a time. By decreasing max_rpcs, the read performance increased considerably. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/>
You didn''t mention which version of Lustre you were using for the tests. If you are using 1.8.5 or earlier it''s possible you could be seeing problems caused by bugzilla 23081. I imagine you were filling the page cache quickly with writes. Reads could end up getting read and purged repeatedly because they are added to the wrong end of the LRU list so they get purged from memory pressure or hitting max_cached_mb. If you have a sequential work load take a look at your read ahead stats "lctl get_param llite.*.read_ahead_stats" to see. If you have an excessive number of "miss inside window" and "read but discarded" for your misses that might be what your seeing. Jeremy On Thu, Jun 2, 2011 at 5:02 PM, Roger Spellman <Roger.Spellman at terascala.com> wrote:> > -----Original Message----- > > From: Peter Grandi [mailto:pg_mh at mh.to.sabi.co.UK] > > Sent: Thursday, June 02, 2011 4:10 PM > > To: Roger Spellman > > Cc: lustre-discuss at lists.lustre.org > > Subject: Re: [Lustre-discuss] Reads Starved > > > > Unfortunately these numbers are meaningless without an idea of > > the storage system and the access patterns. > > The storage system is a Dell MD3200. > > > > > > But, when I have 10 clients doing reads, and a different 10 > > > clients doing writes, the write performance barely drops, but > > > the read performance drops to about 150 MB/s. > > > > This may be entirely the right thing for it to happen. > > Each test is doing large block, streaming I/O. The reads and write are > accessing different files. I am running 1 thread per client. If I > increase the thread count on the reads, the performance does increase, > but not nearly to the level of the writes. > > I had increased max_rpcs quite large, as this gives excellent > performance. However, this turns out to be the cause of this read/write > disparity. The system can queue up many writes, but reads are issued > one at a time. By decreasing max_rpcs, the read performance increased > considerably. > > Roger Spellman > Staff Engineer > Terascala, Inc. > 508-588-1501 > www.terascala.com <http://www.terascala.com/> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110602/c862e254/attachment-0001.html