Hi list, is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre file system shows up with a 100% CPU load within ''top''? The reason why I am asking this is that I can write from one client to one OST with 500 MB/s. The CPU load will be at 100% in this case. If I stripe over two OSTs (which use different OSS servers and different RAID controllers) I will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will be at 100% again. A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% CPU load. Are there ways to tune this behavior? Changing max_rpcs_in_flight and max_dirty_mb did not help. Regards, Michael -- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5997 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101020/66aa2a58/attachment.bin
Is this client CPU or server CPU? If you are using Ethernet it will definitely be CPU hungry and can easily saturate a single core. Cheers, Andreas On 2010-10-20, at 8:41, Michael Kluge <Michael.Kluge at tu-dresden.de> wrote:> Hi list, > > is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre > file system shows up with a 100% CPU load within ''top''? The reason why I > am asking this is that I can write from one client to one OST with 500 > MB/s. The CPU load will be at 100% in this case. If I stripe over two > OSTs (which use different OSS servers and different RAID controllers) I > will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will > be at 100% again. > > A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% > CPU load. > > Are there ways to tune this behavior? Changing max_rpcs_in_flight and > max_dirty_mb did not help. > > > Regards, Michael > > -- > > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
That is normal and probably comes from the page cache, should be about the same for lustre, ldiskfs, ext4, xfs, etc. It goes down if you specify "-odirect", but which is obviously not optimal on Lustre clients. Cheers, Bernd On Wednesday, October 20, 2010, Andreas Dilger wrote:> Is this client CPU or server CPU? If you are using Ethernet it will > definitely be CPU hungry and can easily saturate a single core. > > Cheers, Andreas > > On 2010-10-20, at 8:41, Michael Kluge <Michael.Kluge at tu-dresden.de> wrote: > > Hi list, > > > > is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre > > file system shows up with a 100% CPU load within ''top''? The reason why I > > am asking this is that I can write from one client to one OST with 500 > > MB/s. The CPU load will be at 100% in this case. If I stripe over two > > OSTs (which use different OSS servers and different RAID controllers) I > > will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will > > be at 100% again. > > > > A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% > > CPU load. > > > > Are there ways to tune this behavior? Changing max_rpcs_in_flight and > > max_dirty_mb did not help. > > > > > > Regards, Michael > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
It is the CPU load on the client. The dd/IOR process is using one core completely. The clients and the servers are connected via DDR IB. LNET bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 patchless. Micha Am 20.10.2010 18:15, schrieb Andreas Dilger:> Is this client CPU or server CPU? If you are using Ethernet it will definitely be CPU hungry and can easily saturate a single core. > > Cheers, Andreas > > On 2010-10-20, at 8:41, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: > >> Hi list, >> >> is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre >> file system shows up with a 100% CPU load within ''top''? The reason why I >> am asking this is that I can write from one client to one OST with 500 >> MB/s. The CPU load will be at 100% in this case. If I stripe over two >> OSTs (which use different OSS servers and different RAID controllers) I >> will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will >> be at 100% again. >> >> A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% >> CPU load. >> >> Are there ways to tune this behavior? Changing max_rpcs_in_flight and >> max_dirty_mb did not help. >> >> >> Regards, Michael >> >> -- >> >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih
On 2010-10-20, at 10:40, Michael Kluge <Michael.Kluge at tu-dresden.de> wrote:> It is the CPU load on the client. The dd/IOR process is using one core completely. The clients and the servers are connected via DDR IB. LNET bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 patchless.If you only have a single threaded write, then this is somewhat unavoidable to saturate a CPU due to copy_from_user(). O_DIRECT will avoid this. Also, disabling data checksums and debugging can help considerably. There is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to reduce this overhead, but still not as fast as no checksum at all. Cheers, Andreas> Am 20.10.2010 18:15, schrieb Andreas Dilger: >> Is this client CPU or server CPU? If you are using Ethernet it will definitely be CPU hungry and can easily saturate a single core. >> >> Cheers, Andreas >> >> On 2010-10-20, at 8:41, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: >> >>> Hi list, >>> >>> is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre >>> file system shows up with a 100% CPU load within ''top''? The reason why I >>> am asking this is that I can write from one client to one OST with 500 >>> MB/s. The CPU load will be at 100% in this case. If I stripe over two >>> OSTs (which use different OSS servers and different RAID controllers) I >>> will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will >>> be at 100% again. >>> >>> A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% >>> CPU load. >>> >>> Are there ways to tune this behavior? Changing max_rpcs_in_flight and >>> max_dirty_mb did not help. >>> >>> >>> Regards, Michael >>> >>> -- >>> >>> Michael Kluge, M.Sc. >>> >>> Technische Universit?t Dresden >>> Center for Information Services and >>> High Performance Computing (ZIH) >>> D-01062 Dresden >>> Germany >>> >>> Contact: >>> Willersbau, Room A 208 >>> Phone: (+49) 351 463-34217 >>> Fax: (+49) 351 463-37773 >>> e-mail: michael.kluge at tu-dresden.de >>> WWW: http://www.tu-dresden.de/zih >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > -- > Michael Kluge, M.Sc. > > Technische Universit?t Dresden > Center for Information Services and > High Performance Computing (ZIH) > D-01062 Dresden > Germany > > Contact: > Willersbau, Room WIL A 208 > Phone: (+49) 351 463-34217 > Fax: (+49) 351 463-37773 > e-mail: michael.kluge at tu-dresden.de > WWW: http://www.tu-dresden.de/zih
On Wednesday, October 20, 2010, Andreas Dilger wrote:> On 2010-10-20, at 10:40, Michael Kluge <Michael.Kluge at tu-dresden.de> wrote: > > It is the CPU load on the client. The dd/IOR process is using one core > > completely. The clients and the servers are connected via DDR IB. LNET > > bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 > > patchless. > > If you only have a single threaded write, then this is somewhat unavoidable > to saturate a CPU due to copy_from_user(). O_DIRECT will avoid this. > > Also, disabling data checksums and debugging can help considerably. There > is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to > reduce this overhead, but still not as fast as no checksum at all.I think checksums are only visible in ptlrpc CPU time (and most also only for reads), but not in the user space benchmark process. Cheers, Bernd -- Bernd Schubert DataDirect Networks
Using O_DIRECT reduces the CPU load but the magical limit of 500 MB/s for one thread remains. Are the CRC sums calculated on a per thread base? Or stripe base? Is there a way to test the checksumming speed only? Michael Am 20.10.2010 18:53, schrieb Andreas Dilger:> On 2010-10-20, at 10:40, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: >> It is the CPU load on the client. The dd/IOR process is using one core completely. The clients and the servers are connected via DDR IB. LNET bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 patchless. > > If you only have a single threaded write, then this is somewhat unavoidable to saturate a CPU due to copy_from_user(). O_DIRECT will avoid this. > > Also, disabling data checksums and debugging can help considerably. There is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to reduce this overhead, but still not as fast as no checksum at all. > > Cheers, Andreas > >> Am 20.10.2010 18:15, schrieb Andreas Dilger: >>> Is this client CPU or server CPU? If you are using Ethernet it will definitely be CPU hungry and can easily saturate a single core. >>> >>> Cheers, Andreas >>> >>> On 2010-10-20, at 8:41, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: >>> >>>> Hi list, >>>> >>>> is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre >>>> file system shows up with a 100% CPU load within ''top''? The reason why I >>>> am asking this is that I can write from one client to one OST with 500 >>>> MB/s. The CPU load will be at 100% in this case. If I stripe over two >>>> OSTs (which use different OSS servers and different RAID controllers) I >>>> will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will >>>> be at 100% again. >>>> >>>> A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% >>>> CPU load. >>>> >>>> Are there ways to tune this behavior? Changing max_rpcs_in_flight and >>>> max_dirty_mb did not help. >>>> >>>> >>>> Regards, Michael >>>> >>>> -- >>>> >>>> Michael Kluge, M.Sc. >>>> >>>> Technische Universit?t Dresden >>>> Center for Information Services and >>>> High Performance Computing (ZIH) >>>> D-01062 Dresden >>>> Germany >>>> >>>> Contact: >>>> Willersbau, Room A 208 >>>> Phone: (+49) 351 463-34217 >>>> Fax: (+49) 351 463-37773 >>>> e-mail: michael.kluge at tu-dresden.de >>>> WWW: http://www.tu-dresden.de/zih >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> >> >> -- >> Michael Kluge, M.Sc. >> >> Technische Universit?t Dresden >> Center for Information Services and >> High Performance Computing (ZIH) >> D-01062 Dresden >> Germany >> >> Contact: >> Willersbau, Room WIL A 208 >> Phone: (+49) 351 463-34217 >> Fax: (+49) 351 463-37773 >> e-mail: michael.kluge at tu-dresden.de >> WWW: http://www.tu-dresden.de/zih >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih
Disabling checksums boosts the performance to 660 MB/s for a single thread. Now placing 6 IOR processes one my eight core box gives with some striping 1.6 GB/s which is close to the LNET bandwidth. Thanks a lot again! Michael Am 20.10.2010 19:13, schrieb Michael Kluge:> Using O_DIRECT reduces the CPU load but the magical limit of 500 MB/s > for one thread remains. Are the CRC sums calculated on a per thread > base? Or stripe base? Is there a way to test the checksumming speed only? > > > Michael > > Am 20.10.2010 18:53, schrieb Andreas Dilger: >> On 2010-10-20, at 10:40, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: >>> It is the CPU load on the client. The dd/IOR process is using one core completely. The clients and the servers are connected via DDR IB. LNET bandwidth is at 1.8 GB/s. Servers have 1.8.3, the client has 1.8.3 patchless. >> >> If you only have a single threaded write, then this is somewhat unavoidable to saturate a CPU due to copy_from_user(). O_DIRECT will avoid this. >> >> Also, disabling data checksums and debugging can help considerably. There is a patch in bugzilla to add support for h/w crc32c on Nehalem CPUs to reduce this overhead, but still not as fast as no checksum at all. >> >> Cheers, Andreas >> >>> Am 20.10.2010 18:15, schrieb Andreas Dilger: >>>> Is this client CPU or server CPU? If you are using Ethernet it will definitely be CPU hungry and can easily saturate a single core. >>>> >>>> Cheers, Andreas >>>> >>>> On 2010-10-20, at 8:41, Michael Kluge<Michael.Kluge at tu-dresden.de> wrote: >>>> >>>>> Hi list, >>>>> >>>>> is it normal, that a ''dd'' or an ''IOR'' pushing 10MB blocks to a lustre >>>>> file system shows up with a 100% CPU load within ''top''? The reason why I >>>>> am asking this is that I can write from one client to one OST with 500 >>>>> MB/s. The CPU load will be at 100% in this case. If I stripe over two >>>>> OSTs (which use different OSS servers and different RAID controllers) I >>>>> will get 500 as well (seeing 2x250 MB/s on the OSTs). The CPU load will >>>>> be at 100% again. >>>>> >>>>> A ''dd'' on my desktop pushing 10M blocks to the local disk shows 7-10% >>>>> CPU load. >>>>> >>>>> Are there ways to tune this behavior? Changing max_rpcs_in_flight and >>>>> max_dirty_mb did not help. >>>>> >>>>> >>>>> Regards, Michael >>>>> >>>>> -- >>>>> >>>>> Michael Kluge, M.Sc. >>>>> >>>>> Technische Universit?t Dresden >>>>> Center for Information Services and >>>>> High Performance Computing (ZIH) >>>>> D-01062 Dresden >>>>> Germany >>>>> >>>>> Contact: >>>>> Willersbau, Room A 208 >>>>> Phone: (+49) 351 463-34217 >>>>> Fax: (+49) 351 463-37773 >>>>> e-mail: michael.kluge at tu-dresden.de >>>>> WWW: http://www.tu-dresden.de/zih >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>> >>> >>> -- >>> Michael Kluge, M.Sc. >>> >>> Technische Universit?t Dresden >>> Center for Information Services and >>> High Performance Computing (ZIH) >>> D-01062 Dresden >>> Germany >>> >>> Contact: >>> Willersbau, Room WIL A 208 >>> Phone: (+49) 351 463-34217 >>> Fax: (+49) 351 463-37773 >>> e-mail: michael.kluge at tu-dresden.de >>> WWW: http://www.tu-dresden.de/zih >> > >-- Michael Kluge, M.Sc. Technische Universit?t Dresden Center for Information Services and High Performance Computing (ZIH) D-01062 Dresden Germany Contact: Willersbau, Room WIL A 208 Phone: (+49) 351 463-34217 Fax: (+49) 351 463-37773 e-mail: michael.kluge at tu-dresden.de WWW: http://www.tu-dresden.de/zih