Christopher J.Walker
2009-Jun-19 11:32 UTC
[Lustre-discuss] Benchmarking lustre with iozone
I''ve run some benchmarks on our Lustre setup using iozone, but have stopped before reaching the full potential of our storage due to problems with iozone. Iozone seems to hang occasionally. Looking at the source, it looks like it uses UDP packets to tell our worker nodes to start and stop the benchmarks. With debugging turned on, I saw one packet missing in the output when it had hung. Unfortunately, on a network I''m saturating with lustre traffic, the odd packet UDP packet is bound to get lost... The iozone author has suggested building a timing network to address this problem - but I can''t justify the time and expense to do this. Does anybody have a patch to fix this? Chris -- Dr Christopher J. Walker Queen Mary, University of London
On Jun 19, 2009 12:32 +0100, Christopher J.Walker wrote:> I''ve run some benchmarks on our Lustre setup using iozone, but have > stopped before reaching the full potential of our storage due to > problems with iozone. > > Iozone seems to hang occasionally. Looking at the source, it looks like > it uses UDP packets to tell our worker nodes to start and stop the > benchmarks. With debugging turned on, I saw one packet missing in the > output when it had hung. Unfortunately, on a network I''m saturating with > lustre traffic, the odd packet UDP packet is bound to get lost... > > The iozone author has suggested building a timing network to address > this problem - but I can''t justify the time and expense to do this.Usually any cluster will have a separate "administrative" network from the IO/compute network to ensure that heavy IO/MPI traffic doesn''t make the nodes unreachable by e.g. the job scheduler. You can run iozone over the admin network. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Fri, 2009-06-19 at 12:32 +0100, Christopher J.Walker wrote:> With debugging turned on, I saw one packet missing in the > output when it had hung. Unfortunately, on a network I''m saturating with > lustre traffic, the odd packet UDP packet is bound to get lost...Why not use a multi-node benchmark that uses something with more reliable delivery than a(n itself unreliable) protocol built on an unreliable transport? IOR would fit this bill as it uses MPI for interclient communications, which itself will use rsh (or equivalent) as it''s internode transport. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090622/26049db0/attachment.bin