Hello, is it possible to optimize Lustre so that is supports really large directories (with 30k small files in it)? We have 8 physical clients which process jpeg files stored on Lustre volume and I get sooner or later client freezes - ls in Lustre directory waits forever. I there something I could do to improve performance? The lustre server is Build Version: 1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19 The lustre client is Build Version: 1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustre I got the following messages on the client: Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 2 previous similar messages LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 37 previous similar messages LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped 37 previous similar messages Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 8s Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 13s Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 18s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp. The mds_connect operation failed with -16 Lustre: Request x112815827 sent from stable-OST0001-osc-ffff8802855b7800 to NID x.x.x.x at tcp 100s ago has timed out (limit 100s). Lustre: Skipped 9 previous similar messages Lustre: stable-OST0001-osc-ffff8802855b7800: Connection to service stable-OST0001 via nid x.x.x.x at tcp was lost; in progress operations using this service will wait for recovery to complete. Lustre: Skipped 1 previous similar message LustreError: 128:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 128:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Lustre: stable-OST0001-osc-ffff8802855b7800: Connection restored to service stable-OST0001 using nid x.x.x.x at tcp. Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 23s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 166-1: MGCx.x.x.x at tcp: Connection to service MGS via nid x.x.x.x at tcp was lost; in progress operations using this service will fail. Lustre: MGCx.x.x.x at tcp: Reactivating import Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 28s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 33s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp. The mds_connect operation failed with -16 LustreError: Skipped 5 previous similar messages Lustre: 3170:0:(import.c:507:import_select_connection()) stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency to 38s Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous similar message LustreError: 3158:0:(events.c:66:request_out_callback()) @@@ type 4, status -5 req at ffff8801002dd800 x112816084/t0 o103->stable-OST0001_UUID at 10.0.0.1@o2ib:17/18 lens 648/256 e 0 to 1 dl 1253177767 ref 2 fl Rpc:N/0/0 rc 0/0 LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 194 previous similar messages -- Luk?? Hejtm?nek
On Thu, 2009-09-17 at 13:28 +0200, Lukas Hejtmanek wrote:> Hello,Hi,> is it possible to optimize Lustre so that is supports really large directories > (with 30k small files in it)?We already have optimizations for large directories (i.e. HTREE indexing, etc.).> We have 8 physical clients which process jpeg > files stored on Lustre volume and I get sooner or later client freezes - ls in > Lustre directory waits forever. I there something I could do to improve > performance?I suspect you don''t really have a "performance" issue.> I got the following messages on the client: > Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service > stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations using > this service will wait for recovery to complete.So this immediately points to the MDT. The above message is saying that the connection to the MDT was lost. The question becomes "why?". What did the MDS log around the time the above message was logged on the client? FWIW, it''s much better to look at the syslog for this sort of thing than dmesg as the syslog provides timing context. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090917/37896fce/attachment.bin
On Thursday 17 September 2009, Lukas Hejtmanek wrote:> Hello, > > is it possible to optimize Lustre so that is supports really large > directories (with 30k small files in it)? We have 8 physical clients which > process jpeg files stored on Lustre volume and I get sooner or later client > freezes - ls in Lustre directory waits forever.Just to be clear, is that a color ls or an "ls -l" (anthing that stats all files) or just a regular ls? The former should be a lot slower since a stat requires the file size and the file size requires lustre to get information from all involved OSTs... /Peter> I there something I could > do to improve performance?... -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090917/6b84fd01/attachment-0001.bin
Hello! On Sep 17, 2009, at 7:28 AM, Lukas Hejtmanek wrote:> LustreError: 11-0: an error occurred while communicating with > x.x.x.x at tcp. > The mds_connect operation failed with -16 > Lustre: Request x112815827 sent from stable-OST0001-osc- > ffff8802855b7800 to > NID x.x.x.x at tcp 100s ago has timed out (limit 100s).This looks like your OSTs are overloaded (do you get any "slow ..." messages in the logs there?, watchdog triggers?) dragging down MDS with them (trying to do e.g. creates which is slow and so client times out from MDS as well, though you did not show it in your log - we see MDS refuses client connection because it thinks it is still processing a request from this client). The spurious eviction is addressed by adaptive timeouts (enabled by default in 1.8). If you bring down the load on the OSTs (read this list, recently there were several methods discussed like bringing down number of service threads) that should help.> LustreError: 166-1: MGCx.x.x.x at tcp: Connection to service MGS via nid > x.x.x.x at tcp was lost; in progress operations using this service will > fail. > Lustre: MGCx.x.x.x at tcp: Reactivating importNow this is unexpected and I do not see a timeout so I do not know what actually happened there. Bye, Oleg
On Sep 17, 2009 13:28 +0200, Lukas Hejtmanek wrote:> is it possible to optimize Lustre so that is supports really large directories > (with 30k small files in it)? We have 8 physical clients which process jpeg > files stored on Lustre volume and I get sooner or later client freezes - ls in > Lustre directory waits forever. I there something I could do to improve > performance?We regularly test directories with 1M files in them, so I don''t consider 30k files to be a large directory. If you are always using small files there are different things you can do to optimize performance, such as using RAID-1+0 instead of RAID-5/6 on the OSTs.> The lustre server is Build Version: > 1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19 > > The lustre client is Build Version: > 1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustreUsing a more common kernel (e.g. RHEL5.3) means a lot more people are testing the same code as you are.> Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service > stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations using > this service will wait for recovery to complete.As other''s mentioned, this seems like a network or server problem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Thu, Sep 17, 2009 at 04:17:54PM -0400, Oleg Drokin wrote:> If you bring down the load on the OSTs (read this list, recently > there were > several methods discussed like bringing down number of service threads) > that should help.Thanks. I got a question regarding number of service threads. parm: oss_num_create_threads:number of OSS create threads to start (int) parm: ost_num_threads:number of OST service threads to start (deprecated) (int) parm: oss_num_threads:number of OSS service threads to start (int) I saw recommendation to limit ost_num_threads but it is deprecated. Should I limit oss_num_threads instead? -- Luk?? Hejtm?nek
Hello! On Sep 22, 2009, at 7:10 AM, Lukas Hejtmanek wrote:> On Thu, Sep 17, 2009 at 04:17:54PM -0400, Oleg Drokin wrote: >> If you bring down the load on the OSTs (read this list, recently >> there were >> several methods discussed like bringing down number of service >> threads) >> that should help. > Thanks. I got a question regarding number of service threads. > > parm: ost_num_threads:number of OST service threads to start > (deprecated) (int) > parm: oss_num_threads:number of OSS service threads to > start (int) > I saw recommendation to limit ost_num_threads but it is deprecated. > Should > I limit oss_num_threads instead?Yes. (they are the same thing anyway) Bye, Oleg
On Tue, Sep 22, 2009 at 10:04:57AM -0400, Oleg Drokin wrote:> >I saw recommendation to limit ost_num_threads but it is > >deprecated. Should > >I limit oss_num_threads instead? > > Yes. (they are the same thing anyway)Thanks Oleg. One more question, this limit is per kernel module or per OST mount? E.g., I have 1 physical server that hosts 2 OST servers - OST0, OST1. This limit will be per OST0, OST1, or sum for both? -- Luk?? Hejtm?nek
Hello! On Sep 23, 2009, at 7:47 AM, Lukas Hejtmanek wrote:>>> I limit oss_num_threads instead? >> Yes. (they are the same thing anyway) > Thanks Oleg. One more question, this limit is per kernel module or > per OST > mount? E.g., I have 1 physical server that hosts 2 OST servers - > OST0, OST1. > This limit will be per OST0, OST1, or sum for both?This is total number of ost service threads on that node (well, in fact it is multiplied by 2 due to different types of service served). The service threads serve all OSTs on that OSS. Bye, Oleg