Gianluca Tresoldi
2010-Nov-19 11:15 UTC
[Lustre-discuss] Performance Issue with lustre 2.0
Hi everyone, I''m testing lustre 2.0 in a pre-production environment and I encountered some problems. My test environment consists of: 1 MDS/MGS (not combined) with 48G of ram and 16 xeon E5520 core 1 OSS with the same characteristics. 1 Client Both MDS and OSS are connected to an ISCSI shared storage with different volumes. I''ve also production environment with lustre 1.8.2 and 2 filesystem served by: 2 MDS/MGS where each server is active for one file system and passive for other. 2 OSS: Same configuration of MDS. 50 Client that runs apache web-server. I''ve performed a 8 hours stress test in both environment using apache log''s for replicate real traffic, this is the results: Lustre 1.8.2 Avg load: on Mds 0,8 on Oss 0,2 and network load on mds (on gigabit NIC dedicated to lustre traffic) about 50.000 Kb/s. Lustre 2.0.0 Avg load: on Mds 2,8 on Oss 0,1 and network load on mds (on gigabit NIC dedicated to lustre traffic) about 180.000 Kb/s. On lustre 2.0 the network traffic is much higher than on Lustre 1.8.2, if we considered that in addition to the stress test, on Lustre 1.8.2, also production runs, i think there is a problem. With TCPDump I saw that when I run multiple "ls -l" on Client connected to lustre 1.8.2 only some time client send packet to MDS, with client connected to lustre 2.0 all the time. I Think that client 2.0 doesn'' t cache files'' attr. could someone help me? Thanks in advice and have a very nice day. ;) P.S. Sorry for my not perfect English. -- Gianluca Tresoldi ***SysAdmin*** ***Demon''s Trainer*** Tuttogratis Italia Spa E-mail: gianluca.tresoldi at tuttogratis.com http://www.tuttogratis.it Tel Centralino 02-57313101 Tel Diretto 02-57313136 <http://www.gnu.org>Be open... ************ Confidentiality Notice & Disclaimer ************* This message, together with any attachments, is for the confidential and exclusive use of the addressee(s). If you receive it in error, please delete the message and its attachments from your system immediately and notify us by return e-mail. Do not disclose, copy, circulate or use any information contained in this e-mail. * -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101119/dc72059a/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: linux40.jpg Type: image/jpeg Size: 1399 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101119/dc72059a/attachment.jpg
Gianluca, if you don''t have plan to use mix of 1.8 and 2.0 clients, you can apply small patch, which should be reduce MDS traffic. diff --git a/lustre/llite/dcache.c b/lustre/llite/dcache.c index 22a929d..b133956 100644 --- a/lustre/llite/dcache.c +++ b/lustre/llite/dcache.c @@ -427,9 +427,6 @@ int ll_revalidate_it(struct dentry *de, int lookup_flags, ll_frob_intent(&it, &lookup_it); LASSERT(it); - if (it->it_op == IT_LOOKUP && !(de->d_flags & DCACHE_LUSTRE_INVALID)) - GOTO(out_sa, rc = 1); - ll_prepare_mdc_op_data(&op_data, parent, de->d_inode, de->d_name.name, de->d_name.len, 0, NULL); that patch add more cache on client side, but have some interop issue in mixed env, which need to be fixed on MDS side. On Nov 19, 2010, at 14:15, Gianluca Tresoldi wrote:> Hi everyone, > > I''m testing lustre 2.0 in a pre-production environment and I encountered some problems. > > My test environment consists of: > 1 MDS/MGS (not combined) with 48G of ram and 16 xeon E5520 core > 1 OSS with the same characteristics. > 1 Client > Both MDS and OSS are connected to an ISCSI shared storage with different volumes. > > I''ve also production environment with lustre 1.8.2 and 2 filesystem served by: > 2 MDS/MGS where each server is active for one file system and passive for other. > 2 OSS: Same configuration of MDS. > 50 Client that runs apache web-server. > > I''ve performed a 8 hours stress test in both environment using apache log''s for replicate real traffic, this is the results: > > Lustre 1.8.2 > Avg load: on Mds 0,8 on Oss 0,2 and network load on mds (on gigabit NIC dedicated to lustre traffic) about 50.000 Kb/s. > > Lustre 2.0.0 > Avg load: on Mds 2,8 on Oss 0,1 and network load on mds (on gigabit NIC dedicated to lustre traffic) about 180.000 Kb/s. > > On lustre 2.0 the network traffic is much higher than on Lustre 1.8.2, if we considered that in addition to the stress test, on Lustre 1.8.2, also production runs, i think there is a problem. > > With TCPDump I saw that when I run multiple "ls -l" on Client connected to lustre 1.8.2 only some time client send packet to MDS, with client connected to lustre 2.0 all the time. > > I Think that client 2.0 doesn'' t cache files'' attr. > > > could someone help me? > Thanks in advice and have a very nice day. ;) > > P.S. > Sorry for my not perfect English. > > -- > Gianluca Tresoldi > ***SysAdmin*** > ***Demon''s Trainer*** > Tuttogratis Italia Spa > E-mail: gianluca.tresoldi at tuttogratis.com > http://www.tuttogratis.it > Tel Centralino 02-57313101 > Tel Diretto 02-57313136 > <linux40.jpg>Be open... > *********** Confidentiality Notice & Disclaimer ************* > This message, together with any attachments, is for the confidential > and exclusive use of the addressee(s). If you receive it in error, > please delete the message and its attachments from your system > immediately and notify us by return e-mail. > Do not disclose, copy, circulate or use any information contained in > this e-mail. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101120/89e16bcf/attachment-0001.html