thr3ads.net - Lustre discuss - [Lustre-discuss] Large directories optimization [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Lukas Hejtmanek

2009-Sep-17 11:28 UTC

[Lustre-discuss] Large directories optimization

Hello,

is it possible to optimize Lustre so that is supports really large directories
(with 30k small files in it)? We have 8 physical clients which process jpeg
files stored on Lustre volume and I get sooner or later client freezes - ls in
Lustre directory waits forever. I there something I could do to improve
performance?

The lustre server is Build Version:
1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19

The lustre client is Build Version:
1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustre

I got the following messages on the client:
Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service
stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations using
this service will wait for recovery to complete.
Lustre: Skipped 2 previous similar messages
LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway
LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 37
previous similar messages
LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11
LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped 37
previous similar messages
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 8s
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 13s
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 18s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp.
The mds_connect operation failed with -16
Lustre: Request x112815827 sent from stable-OST0001-osc-ffff8802855b7800 to
NID x.x.x.x at tcp 100s ago has timed out (limit 100s).
Lustre: Skipped 9 previous similar messages
Lustre: stable-OST0001-osc-ffff8802855b7800: Connection to service
stable-OST0001 via nid x.x.x.x at tcp was lost; in progress operations using
this service will wait for recovery to complete.
Lustre: Skipped 1 previous similar message
LustreError: 128:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from
cancel RPC: canceling anyway
LustreError: 128:0:(ldlm_request.c:1622:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11
Lustre: stable-OST0001-osc-ffff8802855b7800: Connection restored to service
stable-OST0001 using nid x.x.x.x at tcp.
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 23s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 166-1: MGCx.x.x.x at tcp: Connection to service MGS via nid
x.x.x.x at tcp was lost; in progress operations using this service will fail.
Lustre: MGCx.x.x.x at tcp: Reactivating import
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 28s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 33s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp.
The mds_connect operation failed with -16
LustreError: Skipped 5 previous similar messages
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 38s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 3158:0:(events.c:66:request_out_callback()) @@@ type 4, status -5
req at ffff8801002dd800 x112816084/t0
o103->stable-OST0001_UUID at 10.0.0.1@o2ib:17/18 lens 648/256 e 0 to 1 dl
1253177767 ref 2 fl Rpc:N/0/0 rc 0/0
LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway
LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 194
previous similar messages


-- 
Luk?? Hejtm?nek

Brian J. Murrell

2009-Sep-17 14:38 UTC

head link

[Lustre-discuss] Large directories optimization

On Thu, 2009-09-17 at 13:28 +0200, Lukas Hejtmanek
wrote:> Hello,
Hi,
> is it possible to optimize Lustre so that is supports really large
directories
> (with 30k small files in it)?
We already have optimizations for large directories (i.e. HTREE
indexing, etc.).
> We have 8 physical clients which process jpeg
> files stored on Lustre volume and I get sooner or later client freezes - ls
in
> Lustre directory waits forever. I there something I could do to improve
> performance?
I suspect you don''t really have a "performance" issue.
> I got the following messages on the client:
> Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service
> stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations
using
> this service will wait for recovery to complete.
So this immediately points to the MDT.  The above message is saying that
the connection to the MDT was lost.  The question becomes "why?".

What did the MDS log around the time the above message was logged on the
client?  FWIW, it''s much better to look at the syslog for this sort of
thing than dmesg as the syslog provides timing context.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090917/37896fce/attachment.bin

Peter Kjellstrom

2009-Sep-17 14:58 UTC

head link

[Lustre-discuss] Large directories optimization

On Thursday 17 September 2009, Lukas Hejtmanek wrote:> Hello,
>
> is it possible to optimize Lustre so that is supports really large
> directories (with 30k small files in it)? We have 8 physical clients which
> process jpeg files stored on Lustre volume and I get sooner or later client
> freezes - ls in Lustre directory waits forever.
Just to be clear, is that a color ls or an "ls -l" (anthing that stats
all
files) or just a regular ls? The former should be a lot slower since a stat 
requires the file size and the file size requires lustre to get information 
from all involved OSTs...

/Peter
> I there something I could 
> do to improve performance?...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090917/6b84fd01/attachment-0001.bin

Oleg Drokin

2009-Sep-17 20:17 UTC

head link

[Lustre-discuss] Large directories optimization

Hello!

On Sep 17, 2009, at 7:28 AM, Lukas Hejtmanek wrote:> LustreError: 11-0: an error occurred while communicating with  
> x.x.x.x at tcp.
> The mds_connect operation failed with -16
> Lustre: Request x112815827 sent from stable-OST0001-osc- 
> ffff8802855b7800 to
> NID x.x.x.x at tcp 100s ago has timed out (limit 100s).
This looks like your OSTs are overloaded (do you get any "slow ..."  
messages
in the logs there?, watchdog triggers?) dragging down MDS with them  
(trying
to do e.g. creates which is slow and so client times out from MDS as  
well,
though you did not show it in your log - we see MDS refuses client  
connection
because it thinks it is still processing a request from this client).
The spurious eviction is addressed by adaptive timeouts (enabled by  
default
in 1.8).
If you bring down the load on the OSTs (read this list, recently there  
were
several methods discussed like bringing down number of service threads)
that should help.
> LustreError: 166-1: MGCx.x.x.x at tcp: Connection to service MGS via nid
> x.x.x.x at tcp was lost; in progress operations using this service will  
> fail.
> Lustre: MGCx.x.x.x at tcp: Reactivating import
Now this is unexpected and I do not see a timeout so I do not know
what actually happened there.

Bye,
     Oleg

Andreas Dilger

2009-Sep-17 21:55 UTC

head link

[Lustre-discuss] Large directories optimization

On Sep 17, 2009  13:28 +0200, Lukas Hejtmanek wrote:> is it possible to optimize Lustre so that is supports really large
directories
> (with 30k small files in it)? We have 8 physical clients which process jpeg
> files stored on Lustre volume and I get sooner or later client freezes - ls
in
> Lustre directory waits forever. I there something I could do to improve
> performance?
We regularly test directories with 1M files in them, so I don''t
consider
30k files to be a large directory.  If you are always using small files
there are different things you can do to optimize performance, such as
using RAID-1+0 instead of RAID-5/6 on the OSTs.
> The lustre server is Build Version:
>
1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19
> 
> The lustre client is Build Version:
>
1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustre
Using a more common kernel (e.g. RHEL5.3) means a lot more people are
testing the same code as you are.
> Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service
> stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations
using
> this service will wait for recovery to complete.
As other''s mentioned, this seems like a network or server problem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lukas Hejtmanek

2009-Sep-22 11:10 UTC

head link

[Lustre-discuss] Large directories optimization

On Thu, Sep 17, 2009 at 04:17:54PM -0400, Oleg Drokin
wrote:> If you bring down the load on the OSTs (read this list, recently
> there were
> several methods discussed like bringing down number of service threads)
> that should help.
Thanks. I got a question regarding number of service threads.

parm:           oss_num_create_threads:number of OSS create threads to start
(int)
parm:           ost_num_threads:number of OST service threads to start
(deprecated) (int)
parm:           oss_num_threads:number of OSS service threads to start (int)

I saw recommendation to limit ost_num_threads but it is deprecated. Should
I limit oss_num_threads instead?

-- 
Luk?? Hejtm?nek

Oleg Drokin

2009-Sep-22 14:04 UTC

head link

[Lustre-discuss] Large directories optimization

Hello!

On Sep 22, 2009, at 7:10 AM, Lukas Hejtmanek wrote:
> On Thu, Sep 17, 2009 at 04:17:54PM -0400, Oleg Drokin wrote:
>> If you bring down the load on the OSTs (read this list, recently
>> there were
>> several methods discussed like bringing down number of service  
>> threads)
>> that should help.
> Thanks. I got a question regarding number of service threads.
>
> parm:           ost_num_threads:number of OST service threads to start
> (deprecated) (int)
> parm:           oss_num_threads:number of OSS service threads to  
> start (int)
> I saw recommendation to limit ost_num_threads but it is deprecated.  
> Should
> I limit oss_num_threads instead?
Yes. (they are the same thing anyway)

Bye,
     Oleg

Lukas Hejtmanek

2009-Sep-23 11:47 UTC

head link

[Lustre-discuss] Large directories optimization

On Tue, Sep 22, 2009 at 10:04:57AM -0400, Oleg Drokin
wrote:> >I saw recommendation to limit ost_num_threads but it is
> >deprecated. Should
> >I limit oss_num_threads instead?
> 
> Yes. (they are the same thing anyway)
Thanks Oleg. One more question, this limit is per kernel module or per OST
mount? E.g., I have 1 physical server that hosts 2 OST servers - OST0, OST1.
This limit will be per OST0, OST1, or sum for both?

-- 
Luk?? Hejtm?nek

Oleg Drokin

2009-Sep-23 16:21 UTC

head link

[Lustre-discuss] Large directories optimization

Hello!

On Sep 23, 2009, at 7:47 AM, Lukas Hejtmanek wrote:>>> I limit oss_num_threads instead?
>> Yes. (they are the same thing anyway)
> Thanks Oleg. One more question, this limit is per kernel module or  
> per OST
> mount? E.g., I have 1 physical server that hosts 2 OST servers -  
> OST0, OST1.
> This limit will be per OST0, OST1, or sum for both?
This is total number of ost service threads on that node (well, in  
fact it is
multiplied by 2 due to different types of service served).
The service threads serve all OSTs on that OSS.

Bye,
     Oleg

Lustre discuss - Sep 2009 - Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization

[Lustre-discuss] Large directories optimization