thr3ads.net - Lustre discuss - Slow Copy (Small Files) 1 RPC In Flight? [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Mast

2013-Jun-21 21:42 UTC

Slow Copy (Small Files) 1 RPC In Flight?

Hello, I am new to Lustre and wanted to run a small simple small copy test
between 2 virtual machines from MDT/OST server to client''s local disk.

I realize small file performance is never fast, but this seems particularly
slow considering the data is all buffered in memory with little to no disk
activity.

Setup Info
Version is 2.4.50
Average file size is small. < 10KB
The amount of data being copied is about 250MB.
The VMs are on separate hosts.

Performance
7 minutes over a gigabit network.
NFS takes only 3 minutes.

Observations
iostat on the OST/MDT is usually 0% during the copy. Assuming all buffered.
Additional network traffic is minimal.
CPU load on the VMs is 15-20% during copy.

RPC stats on the client shows only 1 RPC in flight at a time. max inflight
is set to 64. Is that expected behavior for a copy?

Here is a snapshot of rpc_stats early during the copy:

   read write pages per rpc rpcs % cum % | rpcs % cum % 1: 1653 90 90 | 0 0
0 2: 164 8 98 | 0 0 0 4: 7 0 99 | 0 0 0 8: 3 0 99 | 0 0 0 16: 3 0 99 | 0 0
0 32: 5 0 99 | 0 0 0 64: 0 0 99 | 0 0 0 128: 1 0 100 | 0 0 0 read write
rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 1836 100 100
| 0 0 0 read write offset rpcs % cum % | rpcs % cum % 0: 1836 100 100 | 0 0
0

As I am new, any suggestions for what to look for or improve would be
greatly appreciated.


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lee, Brett

2013-Jun-21 22:22 UTC

head link

Re: Slow Copy (Small Files) 1 RPC In Flight?

"test between 2 virtual machines from MDT/OST server to client''s
local disk."

Andrew,

I''m confused by the description of your test.  Can you clarify?

--
Brett Lee
Sr. Systems Engineer
Intel High Performance Data Division

From: lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
[mailto:lustre-discuss-bounces@lists.lustre.org] On Behalf Of Andrew Mast
Sent: Friday, June 21, 2013 3:42 PM
To: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
Subject: [Lustre-discuss] Slow Copy (Small Files) 1 RPC In Flight?

Hello, I am new to Lustre and wanted to run a small simple small copy test
between 2 virtual machines from MDT/OST server to client''s local disk.

I realize small file performance is never fast, but this seems particularly slow
considering the data is all buffered in memory with little to no disk activity.

Setup Info
Version is 2.4.50
Average file size is small. < 10KB
The amount of data being copied is about 250MB.
The VMs are on separate hosts.

Performance
7 minutes over a gigabit network.
NFS takes only 3 minutes.

Observations
iostat on the OST/MDT is usually 0% during the copy. Assuming all buffered.
Additional network traffic is minimal.
CPU load on the VMs is 15-20% during copy.

RPC stats on the client shows only 1 RPC in flight at a time. max inflight is
set to 64. Is that expected behavior for a copy?

Here is a snapshot of rpc_stats early during the copy:

   read                                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                         1653  90  90   |          0   0   0
2:                          164   8  98   |          0   0   0
4:                            7   0  99   |          0   0   0
8:                            3   0  99   |          0   0   0
16:                          3   0  99   |          0   0   0
32:                          5   0  99   |          0   0   0
64:                          0   0  99   |          0   0   0
128:                        1   0 100   |          0   0   0

                                  read                             write
rpcs in flight        rpcs   % cum % |       rpcs   % cum %
0:                            0   0   0   |          0   0   0
1:                         1836 100 100   |          0   0   0

                                  read                             write
offset                rpcs   % cum % |       rpcs   % cum %
0:                         1836 100 100   |          0   0   0

As I am new, any suggestions for what to look for or improve would be greatly
appreciated.


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Drokin, Oleg

2013-Jun-21 22:42 UTC

head link

Re: Slow Copy (Small Files) 1 RPC In Flight?

Hello!

On Jun 21, 2013, at 5:42 PM, Andrew Mast wrote:
> Hello, I am new to Lustre and wanted to run a small simple small copy test
between 2 virtual machines from MDT/OST server to client''s local disk.
> I realize small file performance is never fast, but this seems particularly
slow considering the data is all buffered in memory with little to no disk
activity.
> 
> RPC stats on the client shows only 1 RPC in flight at a time. max inflight
is set to 64. Is that expected behavior for a copy?
Well, it seems you are reading from Lustre. Small files too.
So Lustre reads a single file at a time (I assume you copy with somehing like cp
- single threadedly), readahead does not come into play because file size
is smaller than 1 RPC.
So before we are done with a single file, we cannot guess there''d be
another request to the next file. That''s why you have only one RPC in
flight.

Also Lustre metadata protocol is somewhat more heavy than NFS, which would
explain why it''s slower than NFS.
Situation should improve once you start trying bigger files.

Bye,
    Oleg

Andrew Mast

2013-Jun-22 00:44 UTC

head link

Re: Slow Copy (Small Files) 1 RPC In Flight?

Hi Brett,

Sorry, I think my choice in wording is not correct.

One VM is holding the metadata and the objects. I guess that would mean ti
is the OSS and MDS?
Another VM is the client.It has mounted the lusture filesystem and also has
some local disks. The test is to just to use cp to read data to local disk.

Thanks,
Andy



On Fri, Jun 21, 2013 at 3:22 PM, Lee, Brett
<brett.lee-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>  “test between 2 virtual machines from MDT/OST server to client''s
local
> disk.”****
>
> ** **
>
> Andrew,****
>
> ** **
>
> I’m confused by the description of your test.  Can you clarify?****
>
> ** **
>
> --****
>
> Brett Lee****
>
> Sr. Systems Engineer****
>
> Intel High Performance Data Division****
>
> ** **
>
> *From:* lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
[mailto:
> lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org] *On Behalf
Of *Andrew Mast
> *Sent:* Friday, June 21, 2013 3:42 PM
> *To:* lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
> *Subject:* [Lustre-discuss] Slow Copy (Small Files) 1 RPC In Flight?****
>
> ** **
>
> Hello, I am new to Lustre and wanted to run a small simple small copy test
> between 2 virtual machines from MDT/OST server to client''s local
disk.****
>
> ** **
>
> I realize small file performance is never fast, but this seems
> particularly slow considering the data is all buffered in memory with
> little to no disk activity.****
>
> ** **
>
> Setup Info****
>
> Version is 2.4.50****
>
> Average file size is small. < 10KB****
>
> The amount of data being copied is about 250MB.****
>
> The VMs are on separate hosts.****
>
> ** **
>
> Performance****
>
> 7 minutes over a gigabit network. ****
>
> NFS takes only 3 minutes.****
>
> ** **
>
> Observations****
>
> iostat on the OST/MDT is usually 0% during the copy. Assuming all buffered.
> ****
>
> Additional network traffic is minimal. ****
>
> CPU load on the VMs is 15-20% during copy.****
>
> ** **
>
> RPC stats on the client shows only 1 RPC in flight at a time. max inflight
> is set to 64. Is that expected behavior for a copy?****
>
> ** **
>
> Here is a snapshot of rpc_stats early during the copy:****
>
> ** **
>
>    read                                    write****
>
> pages per rpc         rpcs   % cum % |       rpcs   % cum %****
>
> 1:                         1653  90  90   |          0   0   0****
>
> 2:                          164   8  98   |          0   0   0****
>
> 4:                            7   0  99   |          0   0   0****
>
> 8:                            3   0  99   |          0   0   0****
>
> 16:                          3   0  99   |          0   0   0****
>
> 32:                          5   0  99   |          0   0   0****
>
> 64:                          0   0  99   |          0   0   0****
>
> 128:                        1   0 100   |          0   0   0****
>
> ** **
>
>                                   read                             write**
> **
>
> rpcs in flight        rpcs   % cum % |       rpcs   % cum %****
>
> 0:                            0   0   0   |          0   0   0****
>
> 1:                         1836 100 100   |          0   0   0****
>
> ** **
>
>                                   read                             write**
> **
>
> offset                rpcs   % cum % |       rpcs   % cum %****
>
> 0:                         1836 100 100   |          0   0   0****
>
> ** **
>
> As I am new, any suggestions for what to look for or improve would be
> greatly appreciated.****
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andrew Mast

2013-Jun-22 01:07 UTC

head link

Re: Slow Copy (Small Files) 1 RPC In Flight?

Oleg,

Very clear, thank you for the explanation, I misunderstood readahead. Yes
the 1gb and 10gb file transfer tests was on par with NFS.

Our use case is typically compiling and find/grep through (30gb) amounts of
source code so it seems we are stuck with small files.

Andy

On Fri, Jun 21, 2013 at 3:42 PM, Drokin, Oleg
<oleg.drokin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Hello!
>
> On Jun 21, 2013, at 5:42 PM, Andrew Mast wrote:
>
> > Hello, I am new to Lustre and wanted to run a small simple small copy
> test between 2 virtual machines from MDT/OST server to client''s
local disk.
>
> > I realize small file performance is never fast, but this seems
> particularly slow considering the data is all buffered in memory with
> little to no disk activity.
> >
> > RPC stats on the client shows only 1 RPC in flight at a time. max
> inflight is set to 64. Is that expected behavior for a copy?
>
> Well, it seems you are reading from Lustre. Small files too.
> So Lustre reads a single file at a time (I assume you copy with somehing
> like cp - single threadedly), readahead does not come into play because
> file size
> is smaller than 1 RPC.
> So before we are done with a single file, we cannot guess there''d
be
> another request to the next file. That''s why you have only one RPC
in
> flight.
>
> Also Lustre metadata protocol is somewhat more heavy than NFS, which would
> explain why it''s slower than NFS.
> Situation should improve once you start trying bigger files.
>
> Bye,
>     Oleg

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Drokin, Oleg

2013-Jun-22 02:47 UTC

head link

Re: Slow Copy (Small Files) 1 RPC In Flight?

Hello!

On Jun 21, 2013, at 9:07 PM, Andrew Mast wrote:> Very clear, thank you for the explanation, I misunderstood readahead. Yes
the 1gb and 10gb file transfer tests was on par with NFS.
>
> Our use case is typically compiling and find/grep through (30gb) amounts of
source code so it seems we are stuck with small files.
Generally this sort of workload is pretty bad for network filesystems due to
large amounts of synchronous RPC traffic that you cannot easily predict.
You can get certain speedup by doing several copies in parallel (e.g. one copy
per top level subtree or whatever) as then you''ll at least get
concurrent RPCs.

I know some people try to combat this by running a block device on top of
network filesystem and then running some sort of a local fs (say, ext4)
on top of that block device (loopback based). That allows readahead to work,
caching to work much better and so on. But this is not without limitations too,
only single node could have this filesystem-file mounted at any single time.

IF you do not have any significant writes to this fileset (if any at all) but a
lot of consecutive reads/greps…, you might want just store entire workset as a
tar file, that you will read and unpack locally on a client (should be pretty
fast) to say a ramfs (need tons of RAM of course) and then do the searches. Also
not ideal, but at least network filesystem would then be doing what
it''s best suited for - large transfers.

If you can come up with some other way of storing large number of smaller files
in a single large combined file that you will then access with special tools
(like, I dunno, fuse-tarfs or whatever - assuming those don''t read
unneeded data, but just skip over it, or something more specific to your case) -
this might be a winner too.

Bye,
Oleg

Lustre discuss - Jun 2013 - Slow Copy (Small Files) 1 RPC In Flight?

Slow Copy (Small Files) 1 RPC In Flight?

Re: Slow Copy (Small Files) 1 RPC In Flight?

Re: Slow Copy (Small Files) 1 RPC In Flight?

Re: Slow Copy (Small Files) 1 RPC In Flight?

Re: Slow Copy (Small Files) 1 RPC In Flight?

Re: Slow Copy (Small Files) 1 RPC In Flight?