thr3ads.net - Lustre discuss - [Lustre-discuss] xdd versus sgp

If this information is useful, please help other people find it:
Share via:

Peter Bojanic

2008-May-03 18:54 UTC

[Lustre-discuss] xdd versus sgp_dd

Shane,

I''ve seen a couple of references to ORNL using xdd versus sgp_dd for  
low-level disk performance benchmarking. Could you please summarize  
the differences and advise if our engineer team as well as Lustre  
partners should be considering this alternative?

Thanks,
Bojanic

David Dillow

2008-May-04 20:40 UTC

head link

[Lustre-discuss] xdd versus sgp_dd

On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:> I''ve seen a couple of references to ORNL using xdd versus sgp_dd
for
> low-level disk performance benchmarking. Could you please summarize
> the differences and advise if our engineer team as well as Lustre
> partners should be considering this alternative?
We originally started using xdd for testing as it had features that made
it easy to synchronize runs involving multiple hosts -- this is
important for the testing we''ve doing against LSI''s XBB-2
system and
DDN''s 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600
MB/s against a single IB port, but each singlet topped out at ~2650 to
2700 MB/s or so when hit by two hosts. To get realistic aggregate
numbers for both systems, requires that we hit them with four IO hosts
or OSSes.

When run in direct IO (-dio) mode against the SCSI disk device on recent
kernels, xdd takes a very similar path to Lustre''s use case -- building
up bio''s and using submit_bio() directly, without going through the
page
cache and triggering the read-ahead code and associated problems. In
this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
matched up nicely against the ~5000 MB/s we obtained with an IOR run
against a Lustre filesystem on the same hardware. We saw the expected
10% hit from the filesystem vs raw disk.

In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
indicate a maximum 4400 MB/s from the array assuming perfect scaling.
That would mean we got a result on the filesystem of 113.6% of raw
performance, which doesn''t sit well.

That said, there are a few caveats to using xdd -- the largest being
that it does not give perfectly sequential requests when run with a
queue depth greater than 1. It uses multiple threads when it wants to
have more than 1 request in flight, and that leads to the requests being
generally ascending, but not perfectly sequential. This can cause
performance regressions when the array does not internally reorder
requests.

It is only possible to run xdd in direct IO mode against block devices
in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
than that, it must go through the page cache, and that may cause lower
performance to be measured.

Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
which will require manual cleanup when you hit the system limit.

It looks like it should be possible to run xdd in a manner suitable for
sgpdd-survey so that we could run tests against multiple regions of the
disk at the same time. I''ve not spent any time looking closely at that
option.

I''m not sure why sgd_dd was getting lower numbers on the 2.6.24 kernel
I
was testing against -- there may be a performance regression with the
SCSI generic devices.

Hope this helps, feel free to ask further questions.
--
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

Peter Bojanic

2008-May-05 00:45 UTC

head link

[Lustre-discuss] xdd versus sgp_dd

Dave, thanks for the great response --this could easily be elavorated  
as a short LCE whitepaper, btw.

I look forward to hearing from Andreas, Alex and other Lustre  
engineers on this.

Bojanic

On 4-May-08, at 17:40, David Dillow <dillowda at ornl.gov> wrote:
>
> On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:
>> I''ve seen a couple of references to ORNL using xdd versus
sgp_dd for
>> low-level disk performance benchmarking. Could you please summarize
>> the differences and advise if our engineer team as well as Lustre
>> partners should be considering this alternative?
>
> We originally started using xdd for testing as it had features that  
> made
> it easy to synchronize runs involving multiple hosts -- this is
> important for the testing we''ve doing against LSI''s XBB-2
system and
> DDN''s 9900. For example, the 9900 was able to hit ~1550 MB/s to
1600
> MB/s against a single IB port, but each singlet topped out at ~2650   
> to
> 2700 MB/s or so when hit by two hosts. To get realistic aggregate
> numbers for both systems, requires that we hit them with four IO hosts
> or OSSes.
>
> When run in direct IO (-dio) mode against the SCSI disk device on  
> recent
> kernels, xdd takes a very similar path to Lustre''s use case --  
> building
> up bio''s and using submit_bio() directly, without going through
the
> page
> cache and triggering the read-ahead code and associated problems. In
> this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
> matched up nicely against the ~5000 MB/s we obtained with an IOR run
> against a Lustre filesystem on the same hardware. We saw the expected
> 10% hit from the filesystem vs raw disk.
>
> In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
> indicate a maximum 4400 MB/s from the array assuming perfect scaling.
> That would mean we got a result on the filesystem of 113.6% of raw
> performance, which doesn''t sit well.
>
> That said, there are a few caveats to using xdd -- the largest being
> that it does not give perfectly sequential requests when run with a
> queue depth greater than 1. It uses multiple threads when it wants to
> have more than 1 request in flight, and that leads to the requests  
> being
> generally ascending, but not perfectly sequential. This can cause
> performance regressions when the array does not internally reorder
> requests.
>
> It is only possible to run xdd in direct IO mode against block devices
> in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
> than that, it must go through the page cache, and that may cause lower
> performance to be measured.
>
> Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
> which will require manual cleanup when you hit the system limit.
>
> It looks like it should be possible to run xdd in a manner suitable  
> for
> sgpdd-survey so that we could run tests against multiple regions of  
> the
> disk at the same time. I''ve not spent any time looking closely at
that
> option.
>
> I''m not sure why sgd_dd was getting lower numbers on the 2.6.24  
> kernel I
> was testing against -- there may be a performance regression with the
> SCSI generic devices.
>
> Hope this helps, feel free to ask further questions.
> -- 
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
>

Andreas Dilger

2008-May-05 19:42 UTC

head link

[Lustre-discuss] xdd versus sgp_dd

On May 04, 2008  21:45 -0300, Peter Bojanic wrote:> Dave, thanks for the great response --this could easily be elavorated  
> as a short LCE whitepaper, btw.
> 
> I look forward to hearing from Andreas, Alex and other Lustre  
> engineers on this.
I haven''t personally been using sgp_dd or xdd very much, but the
requirement for kernels >= 2.6.23 pretty much rules this out for
use at most of our customers, since the latest vendor kernel (RHEL5)
is based on 2.6.18.

As for the issue of mutli-threaded processes not having perfectly
sequential IO, that is fine also, because the way we use sgp_dd
already has similar issues, and this is also true of Lustre OSTs
as well.
> On 4-May-08, at 17:40, David Dillow <dillowda at ornl.gov> wrote:
> 
> >
> > On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:
> >> I''ve seen a couple of references to ORNL using xdd versus
sgp_dd for
> >> low-level disk performance benchmarking. Could you please
summarize
> >> the differences and advise if our engineer team as well as Lustre
> >> partners should be considering this alternative?
> >
> > We originally started using xdd for testing as it had features that  
> > made
> > it easy to synchronize runs involving multiple hosts -- this is
> > important for the testing we''ve doing against LSI''s
XBB-2 system and
> > DDN''s 9900. For example, the 9900 was able to hit ~1550 MB/s
to 1600
> > MB/s against a single IB port, but each singlet topped out at ~2650   
> > to
> > 2700 MB/s or so when hit by two hosts. To get realistic aggregate
> > numbers for both systems, requires that we hit them with four IO hosts
> > or OSSes.
> >
> > When run in direct IO (-dio) mode against the SCSI disk device on  
> > recent
> > kernels, xdd takes a very similar path to Lustre''s use case
--
> > building
> > up bio''s and using submit_bio() directly, without going
through the
> > page
> > cache and triggering the read-ahead code and associated problems. In
> > this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
> > matched up nicely against the ~5000 MB/s we obtained with an IOR run
> > against a Lustre filesystem on the same hardware. We saw the expected
> > 10% hit from the filesystem vs raw disk.
> >
> > In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
> > indicate a maximum 4400 MB/s from the array assuming perfect scaling.
> > That would mean we got a result on the filesystem of 113.6% of raw
> > performance, which doesn''t sit well.
> >
> > That said, there are a few caveats to using xdd -- the largest being
> > that it does not give perfectly sequential requests when run with a
> > queue depth greater than 1. It uses multiple threads when it wants to
> > have more than 1 request in flight, and that leads to the requests  
> > being
> > generally ascending, but not perfectly sequential. This can cause
> > performance regressions when the array does not internally reorder
> > requests.
> >
> > It is only possible to run xdd in direct IO mode against block devices
> > in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
> > than that, it must go through the page cache, and that may cause lower
> > performance to be measured.
> >
> > Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
> > which will require manual cleanup when you hit the system limit.
> >
> > It looks like it should be possible to run xdd in a manner suitable  
> > for
> > sgpdd-survey so that we could run tests against multiple regions of  
> > the
> > disk at the same time. I''ve not spent any time looking
closely at that
> > option.
> >
> > I''m not sure why sgd_dd was getting lower numbers on the
2.6.24
> > kernel I
> > was testing against -- there may be a performance regression with the
> > SCSI generic devices.
> >
> > Hope this helps, feel free to ask further questions.
> > -- 
> > Dave Dillow
> > National Center for Computational Science
> > Oak Ridge National Laboratory
> > (865) 241-6602 office
> >
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - May 2008 - xdd versus sgp_dd

[Lustre-discuss] xdd versus sgp_dd

[Lustre-discuss] xdd versus sgp_dd

[Lustre-discuss] xdd versus sgp_dd

[Lustre-discuss] xdd versus sgp_dd