Hi, during my Lustre tests (vers. 1.4.1, kernel 2.6.12.5 patched with bugzilla patches, 1 MDS, 2 OSTs, Gigabit network) I find extremely low performance for I/O with small files like in extracting a Linux kernel source. This is about 20 times slower than the performance on an OST''s native filesystem. The same applies to the file creation/deletion part of bonnie++ as shown in the below example. Version 1.02b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ni-01-01 8G 37512 75 55839 49 37416 91 37760 98 56343 90 59.7 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 199 3 304 13 56 0 305 5 380 16 64 0 These figures are almost 100 times lower than on the OST''s native filesystem which is. Version 1.02b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP sn-03-1 8G 37693 96 181285 66 76051 36 30826 90 194777 61 313.61 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 25232 100 +++++ +++ 21106 97 24226 99 +++++ +++ 20339 100 Do I have to live with this, or is there a way to improve. Thanks, Roland
>>>>> "Andreas" == Andreas Dilger <adilger@clusterfs.com> writes:>> Also, when running bonnie++, my read performance is quite low >> compared to write, even though the OSTs have equal read/write >> throughput. Do you have an idea, where this could come from? Andreas> Traditionally Lustre has been better at write than read. Ok. >> Concerning performance: When running bonnie++ on a single >> client with no I/O on any other client, I obtain only 100MB/s >> block write throughput, and 60MB/s block read throughput, even >> though each OST gives 200MB/s on the raw device. I''m using >> Infiniband TCP/IP as interconnect which gives me 300MB/s >> max. throughput. Andreas> You may consider increasing the Andreas> /proc/fs/lustre/osc/*/max_rpcs_in_flight parameter for Andreas> your clients. What value would be appropriate? Andreas> What stripe count are you using? stripe count is 0. Andreas> If you need high single-client performance a stripe Andreas> count of 4 or will likely saturate your network. Will this decrease parallel throughput? Andreas> Do you get better aggregate performance when multiple Andreas> clients are writing? One of the strengths of Lustre is Andreas> that often the aggregate performance will increase as Andreas> more clients are added. Yes I do. Andreas> We have also made several performance improvements for Andreas> newer lustre releases, and this is an ongoing process. Andreas> For some specific workloads there are tunings that will Andreas> improve things noticably, but aren''t suitable for Andreas> e.g. 1000-client HPC clusters so can''t go in by default. Well, this cluster has 170 nodes. Thanks, Roland
Thanks for all the tips. I will share when I get it working. Steve -----Original Message----- From: lustre-discuss-admin@lists.clusterfs.com [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Roland Fehrenbacher Sent: Wednesday, August 31, 2005 9:48 AM To: Andreas Dilger Cc: lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] I/O performance on small files>>>>> "Andreas" =3D=3D Andreas Dilger <adilger@clusterfs.com> writes:Andreas> On Aug 24, 2005 12:54 +0200, Roland Fehrenbacher wrote: >> during my Lustre tests (vers. 1.4.1, kernel 2.6.12.5 patched >> with bugzilla patches, 1 MDS, 2 OSTs, Gigabit network) I find >> extremely low performance for I/O with small files like in >> extracting a Linux kernel source. This is about 20 times slower >> than the performance on an OST''s native filesystem. The same >> applies to the file creation/deletion part of bonnie++ as shown >> in the below example. >>=20 >> Do I have to live with this, or is there a way to improve. Andreas> In general, Lustre performs best for large files and Andreas> concurrent operation of many clients. While the metadata Andreas> and small file performance of a single client is not Andreas> outstanding, it can scale efficiently to thousands of Andreas> clients doing concurrent operations. Andreas> For nodes which are expected to have a lot of interactive Andreas> use (e.g. login nodes) it is possible to increase the DLM Andreas> LRU size for these nodes to reduce interactive latency. Andreas> This can be done on a smallish number of nodes (10-20) Andreas> without problems, but isn''t optimal for all clients in Andreas> very large clusters. Andreas> for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do Andreas> case LRU in Andreas> MDC*) echo 2000 > $LRU ;; Andreas> OSC*) echo 1000 > $LRU ;; Andreas> esac Andreas> done This helped improve things indeed. Thanks for the hint. Andreas> This tuning has shown dramatic improvements for the Andreas> performance of tasks like untar/compile of a kernel which Andreas> touch a lot of small files. During my stress tests I have occasional error messages on the OSS nodes like: [693471.980356] LustreError: 2542:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81007c75f800 x485514/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 [693471.985913] LustreError: 4337:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81002f7fa400 x485515/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 [693471.985960] LustreError: 4337:0:(recov_thread.c:396:log_commit_thread()) commit ffff810030dce000:ffff81006a657e00 drop 128 cookies: rc -110 [693472.038458] LustreError: 2542:0:(recov_thread.c:396:log_commit_thread()) commit ffff810003322000:ffff81006a657e00 drop 128 cookies: rc -110 [693472.400685] LustreError: 26226:0:(lib-move.c:162:lib_match_md()) 2886796053: Dropping PUT from 2886796285.12345 portal 16 match 0x7688a offset 0 length 64: no match Is this serious? Also, when running bonnie++, my read performance is quite low compared to write, even though the OSTs have equal read/write throughput. Do you have an idea, where this could come from? Version 1.02b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ni-01-01 8G 49056 99 87752 78 40259 98 38538 99 62344 99 231.7 4 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 1392 24 950 41 907 8 1456 24 957 39 1108 9 ni-01-01,8G,49056,99,87752,78,40259,98,38538,99,62344,99,231.7,4,16,1392 ,24,950,41,907,8,1456,24,957,39,1108,9 Concerning performance: When running bonnie++ on a single client with no I/O on any other client, I obtain only 100MB/s block write throughput, and 60MB/s block read throughput, even though each OST gives 200MB/s on the raw device. I''m using Infiniband TCP/IP as interconnect which gives me 300MB/s max. throughput. Thanks, Roland _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.clusterfs.com https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
On Aug 31, 2005 16:48 +0200, Roland Fehrenbacher wrote:> During my stress tests I have occasional error messages on the OSS > nodes like: > [693471.980356] LustreError: 2542:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81007c75f800 x485514/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 > [693471.985913] LustreError: 4337:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81002f7fa400 x485515/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 > [693471.985960] LustreError: 4337:0:(recov_thread.c:396:log_commit_thread()) commit ffff810030dce000:ffff81006a657e00 drop 128 cookies: rc -110 > [693472.038458] LustreError: 2542:0:(recov_thread.c:396:log_commit_thread()) commit ffff810003322000:ffff81006a657e00 drop 128 cookies: rc -110 > [693472.400685] LustreError: 26226:0:(lib-move.c:162:lib_match_md()) 2886796053: Dropping PUT from 2886796285.12345 portal 16 match 0x7688a offset 0 length 64: no matchI believe this was resolved in a newer version of lustre. These particular timeouts are not serious.> Also, when running bonnie++, my read performance is quite low compared > to write, even though the OSTs have equal read/write throughput. Do > you have an idea, where this could come from?Traditionally Lustre has been better at write than read.> Concerning performance: When running bonnie++ on a single client with > no I/O on any other client, I obtain only 100MB/s block write > throughput, and 60MB/s block read throughput, even though each OST gives > 200MB/s on the raw device. I''m using Infiniband TCP/IP as interconnect > which gives me 300MB/s max. throughput.You may consider increasing the /proc/fs/lustre/osc/*/max_rpcs_in_flight parameter for your clients. What stripe count are you using? If you need high single-client performance a stripe count of 4 or will likely saturate your network. Do you get better aggregate performance when multiple clients are writing? One of the strengths of Lustre is that often the aggregate performance will increase as more clients are added. We have also made several performance improvements for newer lustre releases, and this is an ongoing process. For some specific workloads there are tunings that will improve things noticably, but aren''t suitable for e.g. 1000-client HPC clusters so can''t go in by default. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Aug 24, 2005 12:54 +0200, Roland Fehrenbacher wrote:> during my Lustre tests (vers. 1.4.1, kernel 2.6.12.5 patched with > bugzilla patches, 1 MDS, 2 OSTs, Gigabit network) I find extremely low > performance for I/O with small files like in extracting a Linux kernel > source. This is about 20 times slower than the performance on an > OST''s native filesystem. The same applies to the file > creation/deletion part of bonnie++ as shown in the below example. > > Do I have to live with this, or is there a way to improve.In general, Lustre performs best for large files and concurrent operation of many clients. While the metadata and small file performance of a single client is not outstanding, it can scale efficiently to thousands of clients doing concurrent operations. For nodes which are expected to have a lot of interactive use (e.g. login nodes) it is possible to increase the DLM LRU size for these nodes to reduce interactive latency. This can be done on a smallish number of nodes (10-20) without problems, but isn''t optimal for all clients in very large clusters. for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do case LRU in MDC*) echo 2000 > $LRU ;; OSC*) echo 1000 > $LRU ;; esac done This tuning has shown dramatic improvements for the performance of tasks like untar/compile of a kernel which touch a lot of small files. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
>>>>> "Andreas" == Andreas Dilger <adilger@clusterfs.com> writes:Andreas> On Aug 24, 2005 12:54 +0200, Roland Fehrenbacher wrote: >> during my Lustre tests (vers. 1.4.1, kernel 2.6.12.5 patched >> with bugzilla patches, 1 MDS, 2 OSTs, Gigabit network) I find >> extremely low performance for I/O with small files like in >> extracting a Linux kernel source. This is about 20 times slower >> than the performance on an OST''s native filesystem. The same >> applies to the file creation/deletion part of bonnie++ as shown >> in the below example. >> >> Do I have to live with this, or is there a way to improve. Andreas> In general, Lustre performs best for large files and Andreas> concurrent operation of many clients. While the metadata Andreas> and small file performance of a single client is not Andreas> outstanding, it can scale efficiently to thousands of Andreas> clients doing concurrent operations. Andreas> For nodes which are expected to have a lot of interactive Andreas> use (e.g. login nodes) it is possible to increase the DLM Andreas> LRU size for these nodes to reduce interactive latency. Andreas> This can be done on a smallish number of nodes (10-20) Andreas> without problems, but isn''t optimal for all clients in Andreas> very large clusters. Andreas> for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do Andreas> case LRU in Andreas> MDC*) echo 2000 > $LRU ;; Andreas> OSC*) echo 1000 > $LRU ;; Andreas> esac Andreas> done This helped improve things indeed. Thanks for the hint. Andreas> This tuning has shown dramatic improvements for the Andreas> performance of tasks like untar/compile of a kernel which Andreas> touch a lot of small files. During my stress tests I have occasional error messages on the OSS nodes like: [693471.980356] LustreError: 2542:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81007c75f800 x485514/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 [693471.985913] LustreError: 4337:0:(client.c:815:ptlrpc_expire_one_request()) @@@ timeout (sent at 1125497780, 5s ago) req@ffff81002f7fa400 x485515/t0 o401->@NET_0xac1103fd_UUID:15 lens 4168/64 ref 1 fl Rpc:/0/0 rc 0/0 [693471.985960] LustreError: 4337:0:(recov_thread.c:396:log_commit_thread()) commit ffff810030dce000:ffff81006a657e00 drop 128 cookies: rc -110 [693472.038458] LustreError: 2542:0:(recov_thread.c:396:log_commit_thread()) commit ffff810003322000:ffff81006a657e00 drop 128 cookies: rc -110 [693472.400685] LustreError: 26226:0:(lib-move.c:162:lib_match_md()) 2886796053: Dropping PUT from 2886796285.12345 portal 16 match 0x7688a offset 0 length 64: no match Is this serious? Also, when running bonnie++, my read performance is quite low compared to write, even though the OSTs have equal read/write throughput. Do you have an idea, where this could come from? Version 1.02b ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ni-01-01 8G 49056 99 87752 78 40259 98 38538 99 62344 99 231.7 4 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 1392 24 950 41 907 8 1456 24 957 39 1108 9 ni-01-01,8G,49056,99,87752,78,40259,98,38538,99,62344,99,231.7,4,16,1392,24,950,41,907,8,1456,24,957,39,1108,9 Concerning performance: When running bonnie++ on a single client with no I/O on any other client, I obtain only 100MB/s block write throughput, and 60MB/s block read throughput, even though each OST gives 200MB/s on the raw device. I''m using Infiniband TCP/IP as interconnect which gives me 300MB/s max. throughput. Thanks, Roland
On Monday 29 August 2005 20:37, Andreas Dilger wrote:> For nodes which are expected to have a lot of interactive use (e.g. login > nodes) it is possible to increase the DLM LRU size for these nodes to > reduce interactive latency. This can be done on a smallish number of > nodes (10-20) without problems, but isn''t optimal for all clients in very > large clusters. > > for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do > case LRU in > MDC*) echo 2000 > $LRU ;; > OSC*) echo 1000 > $LRU ;; > esac > done > > This tuning has shown dramatic improvements for the performance of tasks > like untar/compile of a kernel which touch a lot of small files. >Hi, this doesn''t seem to help anymore. On our lustre setup version 1.4.6.2 we see very good io performance (600MB/s using infiniband) on large files, but small file performance is really bad. For instance untarring the linux kernel source is 20 times slower on the lustre file system than on local disk: [royd@compute-1-2 c1-2]$ cd /mnt/lustre [royd@compute-1-2 lustre]$ time tar xf /tmp/linux-2.6.9.tar real 1m7.848s user 0m0.203s sys 0m32.832s [royd@compute-1-2 lustre]$ cd /tmp [royd@compute-1-2 tmp]$ time tar xf /tmp/linux-2.6.9.tar real 0m2.590s user 0m0.169s sys 0m1.470s I''ve also tried to change the max_rpcs_in_flight parameter mentioned later in this thread, but that doesn''t seem to help either. We would like to deploy a pair of interactive nodes where we want to beef up the small file performance. Are there any more knobs to turn? Our setup is like this: OSTs: 2 Nexan Satabeasts, 10TB each, in the above case 4 OSTs of 1.5TB on each. OSSs: 2 HP Proliant 380s (i386) with FC channels to the beasts and infiniband to the clients. One of these is the MDS too. 100 HP rx4640s (ia64) as clients. CentOS 4.2, with the latest errata kernel, 2.6.9-34EL, lustre 1.4.6.2 patches and Voltaire IBHOST stack. Any hints is greatly appreciated. Regards, r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no
On Thu, 22 Jun 2006, Roy Dragseth wrote:>> This tuning has shown dramatic improvements for the performance of tasks >> like untar/compile of a kernel which touch a lot of small files. >> > > Hi, this doesn''t seem to help anymore.Have you tried *really* larger values, ie. in the order of the number of files in the tarball (eg. 30000 for MDS LRU)? I''m not sure how much this is advisable though. ;) -- Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net
On Jun 22, 2006 14:34 +0200, Roy Dragseth wrote:> On our lustre setup version 1.4.6.2 we see very good io performance > (600MB/s using infiniband) on large files, but small file performance > is really bad. For instance untarring the linux kernel source is 20 times > slower on the lustre file system than on local disk: > > [royd@compute-1-2 c1-2]$ cd /mnt/lustre > [royd@compute-1-2 lustre]$ time tar xf /tmp/linux-2.6.9.tar > > real 1m7.848s > user 0m0.203s > sys 0m32.832s > [royd@compute-1-2 lustre]$ cd /tmp > [royd@compute-1-2 tmp]$ time tar xf /tmp/linux-2.6.9.tar > > real 0m2.590s > user 0m0.169s > sys 0m1.470s > > I''ve also tried to change the max_rpcs_in_flight parameter mentioned later in > this thread, but that doesn''t seem to help either.How large is your tarball? My 2.6.9-34.EL kernel is 201MB, so this is exceeding the Lustre-imposed maximum client cache size (32 MB per OSC). To make this a fair test, in addition to increasing the lock LRU size you should also increase the /proc/fs/lustre/osc/*/max_dirty_mb value to, say, 128MB so that the client can cache as much of the dataset locally as possible, and then flush it out in the background. Also, it would be prudent to do the "local" benchmark on the OSS node mounting one of the OST filesystems temporarily (after stopping lustre of course) so that the same disk hardware is used. The local filesystem isn''t writing all of the tarball to disk before returns, and it is likely caching all of it. Lustre does more aggressive write flushing than local filesystems, because it is undesirable to have many GB of outstanding writes in client cache when there are thousands of clients. When there are a smaller number of clients doing this kind of operation these restrictions can be removed.> Our setup is like this: > OSTs: 2 Nexan Satabeasts, 10TB each, in the above case 4 OSTs of 1.5TB on > each. > OSSs: 2 HP Proliant 380s (i386) with FC channels to the beasts and infiniband > to the clients. One of these is the MDS too. > 100 HP rx4640s (ia64) as clients.Just to confirm, what is the number of stripes per file? Having more than a single stripe on small files is pure overhead. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Thursday 22 June 2006 23:36, Andreas Dilger wrote:> On Jun 22, 2006 14:34 +0200, Roy Dragseth wrote: > > On our lustre setup version 1.4.6.2 we see very good io performance > > (600MB/s using infiniband) on large files, but small file performance > > is really bad. For instance untarring the linux kernel source is 20 > > times slower on the lustre file system than on local disk: > > > > [royd@compute-1-2 c1-2]$ cd /mnt/lustre > > [royd@compute-1-2 lustre]$ time tar xf /tmp/linux-2.6.9.tar > > > > real 1m7.848s > > user 0m0.203s > > sys 0m32.832s > > [royd@compute-1-2 lustre]$ cd /tmp > > [royd@compute-1-2 tmp]$ time tar xf /tmp/linux-2.6.9.tar > > > > real 0m2.590s > > user 0m0.169s > > sys 0m1.470s > > > > I''ve also tried to change the max_rpcs_in_flight parameter mentioned > > later in this thread, but that doesn''t seem to help either. > > How large is your tarball? My 2.6.9-34.EL kernel is 201MB, so this is > exceeding the Lustre-imposed maximum client cache size (32 MB per OSC).#ll -h /tmp/linux-2.6.9.tar -rw-r--r-- 1 root root 196M Jun 23 08:55 /tmp/linux-2.6.9.tar> > To make this a fair test, in addition to increasing the lock LRU size you > should also increase the /proc/fs/lustre/osc/*/max_dirty_mb value to, > say, 128MB so that the client can cache as much of the dataset locally > as possible, and then flush it out in the background. Also, it would > be prudent to do the "local" benchmark on the OSS node mounting one > of the OST filesystems temporarily (after stopping lustre of course) > so that the same disk hardware is used.I mounted one of the devices as /lshared1 and reran on the OSS, it shows a unpacking time about the same as previous: # cd /lshared1 # time tar xf /tmp/linux-2.6.9.tar real 0m2.557s user 0m0.170s sys 0m1.677s Increasing the max_dirty_mb didn''t help either: # cat /proc/fs/lustre/osc/*/max_dirty_mb 256 256 256 256 256 256 256 256 $ cd /mnt/lustre $ time tar xf /tmp/linux-2.6.9.tar real 0m42.357s user 0m0.199s sys 0m21.379s> > The local filesystem isn''t writing all of the tarball to disk before > returns, and it is likely caching all of it. Lustre does more aggressive > write flushing than local filesystems, because it is undesirable to have > many GB of outstanding writes in client cache when there are thousands > of clients. When there are a smaller number of clients doing this kind > of operation these restrictions can be removed.Yes, but even if I include the sync time it still runs around 10 times faster against local disk than over lustre: # time bash -c "tar xf /tmp/linux-2.6.9.tar ; sync" real 0m4.363s user 0m0.168s sys 0m1.933s> > > Our setup is like this: > > OSTs: 2 Nexan Satabeasts, 10TB each, in the above case 4 OSTs of 1.5TB on > > each. > > OSSs: 2 HP Proliant 380s (i386) with FC channels to the beasts and > > infiniband to the clients. One of these is the MDS too. > > 100 HP rx4640s (ia64) as clients. > > Just to confirm, what is the number of stripes per file? Having more > than a single stripe on small files is pure overhead.The default is one stripe per file: lmc -m $CONFIG --add lov --lov lov-work --mds mds-work --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 We run the MDS filesystem on the same arrays that is hosting the OSTs, but moving the mds to a ramfs, e.g. /dev/shm, doesn''t seem to affect performance at all. But, it seems to me like the overhead is in the file creation as this little experiment shows: first we collect the directory structure in the linux tarball, then all the filenames. #cd /tmp #tar xf /tmp/linux-2.6.9.tar #find linux-2.6.9 -type d > /tmp/linuxsrcdirs.txt #find linux-2.6.9 -type f > /tmp/linuxsrcfiles.txt Creating the dir structure is really fast, creating the files is really slow: # cd /mnt/lustre # time bash -c "cat /tmp/linuxsrcdirs.txt | xargs mkdir" real 0m0.651s user 0m0.005s sys 0m0.260s # time bash -c "cat /tmp/linuxsrcfiles.txt | xargs touch" real 0m32.470s user 0m0.135s sys 0m16.546s So, in this case 32 of the 42 seconds seems to be spent in creating the files. Attached you''ll find the script used to create the filesystem, maybe it is something obvious I''m doing wrong? Regards, r. -------------- next part -------------- A non-text attachment was scrubbed... Name: createLustreStripe.sh.zip Type: application/x-zip Size: 1018 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060623/4e5bced5/createLustreStripe.sh-0001.bin
Roy Dragseth wrote:> #cd /tmp > #tar xf /tmp/linux-2.6.9.tar > #find linux-2.6.9 -type d > /tmp/linuxsrcdirs.txt > #find linux-2.6.9 -type f > /tmp/linuxsrcfiles.txt > > Creating the dir structure is really fast, creating the files is really slow: > > # cd /mnt/lustre > # time bash -c "cat /tmp/linuxsrcdirs.txt | xargs mkdir" > real 0m0.651s > user 0m0.005s > sys 0m0.260s > > # time bash -c "cat /tmp/linuxsrcfiles.txt | xargs touch" > real 0m32.470s > user 0m0.135s > sys 0m16.546s > > So, in this case 32 of the 42 seconds seems to be spent in creating the files.One problem with this test is that there is over an order of magnitude more files than directories in the linux source tree. While I dont contend that this will shed more light on your problem, it would be better to do an apples to apples test. -- | David Vasil <dmvasil@ornl.gov> | Oak Ridge National Laboratory NCCS Division | High Performance Computing Systems Administrator
On Friday 23 June 2006 14:12, David Vasil wrote:> One problem with this test is that there is over an order of magnitude > more files than directories in the linux source tree. ?While I dont > contend that this will shed more light on your problem, it would be > better to do an apples to apples test.It was intended more as a breakdown of events concerning the metadata than a comparison between the speed of directory creation and file creation. r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no
Hi Roy, Boy, it''s a pleasure to work through this with you. Iirc we _should_ be able to do well on unpacking, but it has been a while. However, until we have our metadata writeback cache (2008?), we cannot win over a local file system. In a local file system the cache flushes write the newly created files to disk in huge batches. Let''s first look at an order of magnitude issue here. I suspect you have about 15,000 files perhaps? So the creation data below shows that you create about 500/sec. On big machines we see up to ~14,000 creates / second, on smaller systems maybe a few 1000. If we assume 2000, that means you''ll still be sitting in file creations for 7.5 seconds. What kind of an MDS do you have? First, let''s turn debugging off completely, on all nodes please: echo 0 > /proc/sys/portals/debug Then try again. The files has objects on the OST which are supposed to be pre-created, and it shouldn''t interfere too much with performance. Normal file creation performance is on-par with directory creation. But something there is clearly awry. Could you see what "create_count" is set to in the OSC /proc/fs/lustre/OSC/*/... directories? Putting something like a Let me also ask you, how wide are you striping the files (lfs getstripe <unpackdir>)? For best performance you want to set the stripe count on a subdirectory to 1: lfs setstripe <unpack-dir> 4194304 -1 1 Try these (one at a time please) and let us know how you are doing. It may unfortunately take a few more iterations to get this right. - Peter -> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > Roy Dragseth > Sent: Friday, June 23, 2006 3:10 AM > To: lustre-discuss@clusterfs.com > Subject: Re: [Lustre-discuss] I/O performance on small files > > On Thursday 22 June 2006 23:36, Andreas Dilger wrote: > > On Jun 22, 2006 14:34 +0200, Roy Dragseth wrote: > > > On our lustre setup version 1.4.6.2 we see very good io > performance > > > (600MB/s using infiniband) on large files, but small file > > > performance is really bad. For instance untarring the > linux kernel > > > source is 20 times slower on the lustre file system than > on local disk: > > > > > > [royd@compute-1-2 c1-2]$ cd /mnt/lustre > > > [royd@compute-1-2 lustre]$ time tar xf /tmp/linux-2.6.9.tar > > > > > > real 1m7.848s > > > user 0m0.203s > > > sys 0m32.832s > > > [royd@compute-1-2 lustre]$ cd /tmp > > > [royd@compute-1-2 tmp]$ time tar xf /tmp/linux-2.6.9.tar > > > > > > real 0m2.590s > > > user 0m0.169s > > > sys 0m1.470s > > > > > > I''ve also tried to change the max_rpcs_in_flight > parameter mentioned > > > later in this thread, but that doesn''t seem to help either. > > > > How large is your tarball? My 2.6.9-34.EL kernel is 201MB, > so this is > > exceeding the Lustre-imposed maximum client cache size (32 > MB per OSC). > > #ll -h /tmp/linux-2.6.9.tar > -rw-r--r-- 1 root root 196M Jun 23 08:55 /tmp/linux-2.6.9.tar > > > > > To make this a fair test, in addition to increasing the > lock LRU size > > you should also increase the > /proc/fs/lustre/osc/*/max_dirty_mb value > > to, say, 128MB so that the client can cache as much of the dataset > > locally as possible, and then flush it out in the > background. Also, > > it would be prudent to do the "local" benchmark on the OSS node > > mounting one of the OST filesystems temporarily (after > stopping lustre > > of course) so that the same disk hardware is used. > > I mounted one of the devices as /lshared1 and reran on the > OSS, it shows a unpacking time about the same as previous: > > # cd /lshared1 > # time tar xf /tmp/linux-2.6.9.tar > > real 0m2.557s > user 0m0.170s > sys 0m1.677s > > Increasing the max_dirty_mb didn''t help either: > > # cat /proc/fs/lustre/osc/*/max_dirty_mb > 256 > 256 > 256 > 256 > 256 > 256 > 256 > 256 > > $ cd /mnt/lustre > $ time tar xf /tmp/linux-2.6.9.tar > > real 0m42.357s > user 0m0.199s > sys 0m21.379s > > > > > > The local filesystem isn''t writing all of the tarball to > disk before > > returns, and it is likely caching all of it. Lustre does more > > aggressive write flushing than local filesystems, because it is > > undesirable to have many GB of outstanding writes in client > cache when > > there are thousands of clients. When there are a smaller number of > > clients doing this kind of operation these restrictions can > be removed. > > Yes, but even if I include the sync time it still runs around > 10 times faster against local disk than over lustre: > > # time bash -c "tar xf /tmp/linux-2.6.9.tar ; sync" > > real 0m4.363s > user 0m0.168s > sys 0m1.933s > > > > > > Our setup is like this: > > > OSTs: 2 Nexan Satabeasts, 10TB each, in the above case 4 OSTs of > > > 1.5TB on each. > > > OSSs: 2 HP Proliant 380s (i386) with FC channels to the > beasts and > > > infiniband to the clients. One of these is the MDS too. > > > 100 HP rx4640s (ia64) as clients. > > > > Just to confirm, what is the number of stripes per file? > Having more > > than a single stripe on small files is pure overhead. > > The default is one stripe per file: > > lmc -m $CONFIG --add lov --lov lov-work --mds mds-work --stripe_sz > 1048576 --stripe_cnt 0 --stripe_pattern 0 > > > We run the MDS filesystem on the same arrays that is hosting > the OSTs, but > moving the mds to a ramfs, e.g. /dev/shm, doesn''t seem to > affect performance > at all. > > But, it seems to me like the overhead is in the file creation > as this little > experiment shows: > > first we collect the directory structure in the linux > tarball, then all the > filenames. > > #cd /tmp > #tar xf /tmp/linux-2.6.9.tar > #find linux-2.6.9 -type d > /tmp/linuxsrcdirs.txt > #find linux-2.6.9 -type f > /tmp/linuxsrcfiles.txt > > Creating the dir structure is really fast, creating the files > is really slow: > > # cd /mnt/lustre > # time bash -c "cat /tmp/linuxsrcdirs.txt | xargs mkdir" > real 0m0.651s > user 0m0.005s > sys 0m0.260s > > # time bash -c "cat /tmp/linuxsrcfiles.txt | xargs touch" > real 0m32.470s > user 0m0.135s > sys 0m16.546s > > So, in this case 32 of the 42 seconds seems to be spent in > creating the files. > > Attached you''ll find the script used to create the > filesystem, maybe it is > something obvious I''m doing wrong? > > Regards, > r. >
Roy, Besides kernel untar, I''d suggest "fileop" metadata benchmark which gives operations/second for various metadata operations. Its part of IOzone benchmark suite. Sample output is given below. /mnt/scratch_256/fileop 60 -------------------------------------- | Fileop | | $Revision: 1.20 $ | | | | by | | | | Don Capps | -------------------------------------- mkdir: Dirs = 3660 Total Time = 0.281478405 seconds Avg mkdir(s)/sec = 13002.77 ( 0.000076907 seconds/op) Best mkdir(s)/sec = 72315.59 ( 0.000013828 seconds/op) Worst mkdir(s)/sec = 30.96 ( 0.032297134 seconds/op) rmdir: Dirs = 3660 Total Time = 0.320229292 seconds Avg rmdir(s)/sec = 11429.31 ( 0.000087494 seconds/op) Best rmdir(s)/sec = 67650.06 ( 0.000014782 seconds/op) Worst rmdir(s)/sec = 163.94 ( 0.006099939 seconds/op) create: Files = 216000 Total Time = 32.995336533 seconds Avg create(s)/sec = 6546.38 ( 0.000152756 seconds/op) Best create(s)/sec = 72315.59 ( 0.000013828 seconds/op) Worst create(s)/sec = 0.55 ( 1.808913946 seconds/op) write: Files = 216000 Total Time = 1.749092102 seconds Avg write(s)/sec = 123492.64 ( 0.000008098 seconds/op) Best write(s)/sec = 167772.16 ( 0.000005960 seconds/op) Worst write(s)/sec = 495.55 ( 0.002017975 seconds/op) close: Files = 216000 Total Time = 38.892731667 seconds Avg close(s)/sec = 5553.74 ( 0.000180059 seconds/op) Best close(s)/sec = 209715.20 ( 0.000004768 seconds/op) Worst close(s)/sec = 0.38 ( 2.653007030 seconds/op) stat: Files = 216000 Total Time = 0.476946831 seconds Avg stat(s)/sec = 452880.67 ( 0.000002208 seconds/op) Best stat(s)/sec = 1048576.00 ( 0.000000954 seconds/op) Worst stat(s)/sec = 16644.06 ( 0.000060081 seconds/op) access: Files = 216000 Total Time = 0.510522604 seconds Avg access(s)/sec = 423095.86 ( 0.000002364 seconds/op) Best access(s)/sec = 1048576.00 ( 0.000000954 seconds/op) Worst access(s)/sec = 1709.17 ( 0.000585079 seconds/op) chmod: Files = 216000 Total Time = 17.839460373 seconds Avg chmod(s)/sec = 12107.99 ( 0.000082590 seconds/op) Best chmod(s)/sec = 262144.00 ( 0.000003815 seconds/op) Worst chmod(s)/sec = 0.45 ( 2.227634907 seconds/op) readdir: Files = 3600 Total Time = 0.052351236 seconds Avg readdir(s)/sec = 68766.28 ( 0.000014542 seconds/op) Best readdir(s)/sec = 83886.08 ( 0.000011921 seconds/op) Worst readdir(s)/sec = 11491.24 ( 0.000087023 seconds/op) link: Files = 216000 Total Time = 42.742334366 seconds Avg link(s)/sec = 5053.54 ( 0.000197881 seconds/op) Best link(s)/sec = 91180.52 ( 0.000010967 seconds/op) Worst link(s)/sec = 0.50 ( 1.982422113 seconds/op) unlink: Files = 216000 Total Time = 31.314268112 seconds Avg unlink(s)/sec = 6897.81 ( 0.000144973 seconds/op) Best unlink(s)/sec = 113359.57 ( 0.000008821 seconds/op) Worst unlink(s)/sec = 0.21 ( 4.793473005 seconds/op) delete: Files = 216000 Total Time = 55.935692072 seconds Avg delete(s)/sec = 3861.58 ( 0.000258962 seconds/op) Best delete(s)/sec = 29537.35 ( 0.000033855 seconds/op) Worst delete(s)/sec = 0.65 ( 1.544568062 seconds/op)>>> Roy Dragseth <Roy.Dragseth@cc.uit.no> 6/23/2006 6:40 AM >>>On Friday 23 June 2006 14:12, David Vasil wrote:> One problem with this test is that there is over an order ofmagnitude> more files than directories in the linux source tree. While I dont > contend that this will shed more light on your problem, it would be > better to do an apples to apples test.It was intended more as a breakdown of events concerning the metadata than a comparison between the speed of directory creation and file creation. r. -- The Computer Center, University of Troms?, N-9037 TROMS? Norway. phone:+47 77 64 41 07, fax:+47 77 64 41 00 Roy Dragseth, High Performance Computing System Administrator Direct call: +47 77 64 62 56. email: royd@cc.uit.no _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Kumaran, One might want to pickup the latest version from the Iozone web site. The current version is Revision 1.37 The new version has many new nifty features: -------------------------------------- | Fileop | | $Revision: 1.37 $ | | | | by | | | | Don Capps | -------------------------------------- fileop [-f # ] [-l # -u #] [-s Y] [-t] [-v] [-e] [-b] -[w] -f # Force factor. X^3 files will be created and removed. -l # Lower limit on the value of the Force factor. (optional) -u # Upper limit on the value of the Force factor.(optional) -s # Optional. Sets filesize for the create/write.(optional) -t # Verbose output option.(optional) -v # Version information.(optional) -e # Excel importable format.(optional) -b Output best case results(optional) -w Output worst case results(optional) The structure of the file tree is: X number of Level 1 directories, with X number of level 2 directories, with X number of files in each of the level 2 directories. Example: fileop 2 dir_1 dir_2 / \ / \ sdir_1 sdir_2 sdir_1 sdir_2 / \ / \ / \ / \ file_1 file_2 file_1 file_2 file_1 file_2 file_1 file_2 Each file will be created, and then Y bytes is written to the file. Enjoy, Don Capps ----- Original Message ----- From: "Kumaran Rajaram" <krajaram@lnxi.com> To: "Roy Dragseth" <Roy.Dragseth@cc.uit.no>; <lustre-discuss@clusterfs.com> Sent: Friday, June 23, 2006 10:35 AM Subject: Re: [Lustre-discuss] I/O performance on small files> Roy, > > Besides kernel untar, I''d suggest "fileop" metadata benchmark which > gives operations/second for various metadata operations. Its part of > IOzone benchmark suite. Sample output is given below. > > > /mnt/scratch_256/fileop 60 > > -------------------------------------- > | Fileop | > | $Revision: 1.20 $ | > | | > | by | > | | > | Don Capps | > -------------------------------------- > > mkdir: Dirs = 3660 Total Time = 0.281478405 seconds > Avg mkdir(s)/sec = 13002.77 ( 0.000076907 seconds/op) > Best mkdir(s)/sec = 72315.59 ( 0.000013828 seconds/op) > Worst mkdir(s)/sec = 30.96 ( 0.032297134 seconds/op) > > rmdir: Dirs = 3660 Total Time = 0.320229292 seconds > Avg rmdir(s)/sec = 11429.31 ( 0.000087494 seconds/op) > Best rmdir(s)/sec = 67650.06 ( 0.000014782 seconds/op) > Worst rmdir(s)/sec = 163.94 ( 0.006099939 seconds/op) > > create: Files = 216000 Total Time = 32.995336533 seconds > Avg create(s)/sec = 6546.38 ( 0.000152756 seconds/op) > Best create(s)/sec = 72315.59 ( 0.000013828 seconds/op) > Worst create(s)/sec = 0.55 ( 1.808913946 seconds/op) > > write: Files = 216000 Total Time = 1.749092102 seconds > Avg write(s)/sec = 123492.64 ( 0.000008098 seconds/op) > Best write(s)/sec = 167772.16 ( 0.000005960 seconds/op) > Worst write(s)/sec = 495.55 ( 0.002017975 seconds/op) > > close: Files = 216000 Total Time = 38.892731667 seconds > Avg close(s)/sec = 5553.74 ( 0.000180059 seconds/op) > Best close(s)/sec = 209715.20 ( 0.000004768 seconds/op) > Worst close(s)/sec = 0.38 ( 2.653007030 seconds/op) > > stat: Files = 216000 Total Time = 0.476946831 seconds > Avg stat(s)/sec = 452880.67 ( 0.000002208 seconds/op) > Best stat(s)/sec = 1048576.00 ( 0.000000954 seconds/op) > Worst stat(s)/sec = 16644.06 ( 0.000060081 seconds/op) > > access: Files = 216000 Total Time = 0.510522604 seconds > Avg access(s)/sec = 423095.86 ( 0.000002364 seconds/op) > Best access(s)/sec = 1048576.00 ( 0.000000954 seconds/op) > Worst access(s)/sec = 1709.17 ( 0.000585079 seconds/op) > > chmod: Files = 216000 Total Time = 17.839460373 seconds > Avg chmod(s)/sec = 12107.99 ( 0.000082590 seconds/op) > Best chmod(s)/sec = 262144.00 ( 0.000003815 seconds/op) > Worst chmod(s)/sec = 0.45 ( 2.227634907 seconds/op) > > readdir: Files = 3600 Total Time = 0.052351236 seconds > Avg readdir(s)/sec = 68766.28 ( 0.000014542 seconds/op) > Best readdir(s)/sec = 83886.08 ( 0.000011921 seconds/op) > Worst readdir(s)/sec = 11491.24 ( 0.000087023 seconds/op) > > link: Files = 216000 Total Time = 42.742334366 seconds > Avg link(s)/sec = 5053.54 ( 0.000197881 seconds/op) > Best link(s)/sec = 91180.52 ( 0.000010967 seconds/op) > Worst link(s)/sec = 0.50 ( 1.982422113 seconds/op) > > unlink: Files = 216000 Total Time = 31.314268112 seconds > Avg unlink(s)/sec = 6897.81 ( 0.000144973 seconds/op) > Best unlink(s)/sec = 113359.57 ( 0.000008821 seconds/op) > Worst unlink(s)/sec = 0.21 ( 4.793473005 seconds/op) > > delete: Files = 216000 Total Time = 55.935692072 seconds > Avg delete(s)/sec = 3861.58 ( 0.000258962 seconds/op) > Best delete(s)/sec = 29537.35 ( 0.000033855 seconds/op) > Worst delete(s)/sec = 0.65 ( 1.544568062 seconds/op) > > > >>>> Roy Dragseth <Roy.Dragseth@cc.uit.no> 6/23/2006 6:40 AM >>> > On Friday 23 June 2006 14:12, David Vasil wrote: >> One problem with this test is that there is over an order of > magnitude >> more files than directories in the linux source tree. While I dont >> contend that this will shed more light on your problem, it would be >> better to do an apples to apples test. > > It was intended more as a breakdown of events concerning the metadata > than a > comparison between the speed of directory creation and file creation. > > r. > > -- > > The Computer Center, University of Troms?, N-9037 TROMS? Norway. > phone:+47 77 64 41 07, fax:+47 77 64 41 00 > Roy Dragseth, High Performance Computing System Administrator > Direct call: +47 77 64 62 56. email: royd@cc.uit.no > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Friday 23 June 2006 15:15, Peter J. Braam wrote:> Hi Roy, > > Boy, it''s a pleasure to work through this with you. Iirc we _should_ be > able to do well on unpacking, but it has been a while. However, until > we have our metadata writeback cache (2008?), we cannot win over a local > file system. In a local file system the cache flushes write the newly > created files to disk in huge batches.Just to make things clear, I''m not whining, I just want to know if I''ve done something stupid;-) We''re planning to deploy lustre both for a scratch file area, huge files - lots of io, and a home area, not so much io, but possibly lots of tar unpacks and compiles. Just to make a comparison with our current NFS setup I ran the same tests, and it turns out that lustre isn''t that much worse in this case than nfs (although the nfs system was busy during the tests). The file creation test took 27 secs on nfs in comparison to 32 secs on lustre: $cd ~/tmp $ time bash -c "cat /tmp/linuxsrcdirs.txt | xargs mkdir" real 0m0.619s user 0m0.008s sys 0m0.083s $ time bash -c "cat /tmp/linuxsrcfiles.txt | xargs touch" real 0m27.775s user 0m0.107s sys 0m2.298s $ rm -rf linux-2.6.9/ $ time tar xf /tmp/linux-2.6.9.tar real 0m34.403s user 0m0.175s sys 0m4.133s> > Let''s first look at an order of magnitude issue here. I suspect you > have about 15,000 files perhaps?Yup: $ wc -l /tmp/linuxsrcfiles.txt 16448 /tmp/linuxsrcfiles.txt> So the creation data below shows that > you create about 500/sec. On big machines we see up to ~14,000 creates > / second, on smaller systems maybe a few 1000. If we assume 2000, that > means you''ll still be sitting in file creations for 7.5 seconds. What > kind of an MDS do you have?The MDS is running on a dual cpu 3.4GHz/2GB RAM HP Proliant DL380, this machine is also serving as one of two OSSs. We have two of these and the plan is to run one MDS (and also use them as OSS) on each of these with failover. The MDS storage is placed on the same storage as the OSTs, but moving it to another file system doesn''t seem to matter.> > First, let''s turn debugging off completely, on all nodes please: > > echo 0 > /proc/sys/portals/debugHey, hey, now we''re talking! Turning off debugging on both the client and the MDSs and OSSs bring the time down from 32 to 14 secs: $ cd /mnt/lustre $ rm -rf linux-2.6.9/ $ time bash -c "cat /tmp/linuxsrcdirs.txt | xargs mkdir" real 0m0.301s user 0m0.009s sys 0m0.064s $ time bash -c "cat /tmp/linuxsrcfiles.txt | xargs touch" real 0m14.346s user 0m0.107s sys 0m2.989s (Still slapping my forehead for not thinking of this one...)> > Then try again. > > The files has objects on the OST which are supposed to be pre-created, > and it shouldn''t interfere too much with performance. Normal file > creation performance is on-par with directory creation. But something > there is clearly awry. > > Could you see what "create_count" is set to in the OSC > /proc/fs/lustre/OSC/*/... directories? Putting something like a$ cat /proc/fs/lustre/osc/*/create_count 32 32 32 32 32 32 32 32 Increasing these numbers doesn''t give any change. Nor does the changes suggested earlier in this thread. I''ve tried the following: /proc/fs/lustre/osc/*/create_count = 128 /proc/fs/lustre/osc/*/max_rpcs_in_flight = 32 /proc/fs/lustre/ldlm/namespaces/MDC*/lru_size = 2000 /proc/fs/lustre/ldlm/namespaces/OSC*/lru_size = 1000> > Let me also ask you, how wide are you striping the files (lfs getstripe > <unpackdir>)? For best performance you want to set the stripe count on > a subdirectory to 1: > > lfs setstripe <unpack-dir> 4194304 -1 1The default stripe count on the file system is 1 and the stripe size i 1MB: lmc -m $CONFIG --add lov --lov lov-work --mds mds-work --stripe_sz 1048576 --stripe_cnt 0 --stripe_pattern 0 so that should be ok.> > Try these (one at a time please) and let us know how you are doing. It > may unfortunately take a few more iterations to get this right. >Turning off debugging gives us a significant increase in the performance over our current NFS setup. With this setting we see a file creation rate of around 1000 per sec, do you suggest that this could be further increased? I''ll be happy to test any ideas you might have. We are now seeing a significant performance increase over our current nfs setup: IO for large files has increased from 50MB/s to >600MB/s. File creation rate has doubled. Besides, the lustre system doesn''t seem to fall over and die as soon as a few clients starts hitting it as our current nfs setup does. This was an important design goal:-) Best regards and have a nice weekend, r.
Hi Roy, Can this be faster? Certainly running an OST on the MDS is not going to help, but I think you have the bulk of the performance now. More tuning is possible when you have multiple systems using the MDS simultaneously - Andreas pointed out that the higher numbers I mentioned can absolutely only be achieved with more than one client. In a year or two we will have a writeback cache - at that point we hope to get closer to the local file system situation. Best wishes, - Peter -