Hi, We recently switched from a NetApp FAS250 with NFS to a dual-head FAS270c with iSCSI + Lustre. We are running Lustre 1.4.7.3 both on the Lustre OSS/MDS and on about 20 client machines, all RHEL 4ES Up4 boxes. Our setup is pretty straightforward, with both NetApp heads active, each serving one LUN over iSCSI. We have a pair of Lustre boxes, each seeing both LUN''s as the same devices and in the same order (using the multipath driver). Each Lustre box serves as an OSS for one of the LUN''s and a failover OSS for the other. For a failover MDS setup, we''ve put the MDS partition on the iSCSI fabric and not on local storage. One Lustre box acts as both and OSS and the cluster FS MDS. We''ve been trying to improve Lustre performance for our Web serving environment (millions of small files). Naturally, there are mostly read ops, with the ocassional write-intensive cron jobs (rsyncs for cold backups, publishing processes, etc.). After going through the 2006 mailing list archives and Bugzilla: https://bugzilla.clusterfs.com/show_bug.cgi?id=10265 https://bugzilla.clusterfs.com/show_bug.cgi?id=6252 I''ve come up with the following to run on our OST''s/MDS and clients, as appropriate: echo 0 > /proc/sys/portals/debug for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do case LRU in MDC*) echo 2500 > $LRU ;; OSC*) echo 1000 > $LRU ;; esac; done for i in `find /proc/fs/lustre -name max_read_ahead_mb`; do echo 4 > $i; done for i in `find /proc/fs/lustre -name max_read_ahead_whole_mb`; do echo 1 > $i; done Does this seem appropriate for a web serving environment? Are there tunables that I am not using correctly? Comments would be much appreciated. Thank you & best, Zlatin Zlatin Ivanov Systems Administrator New York Magazine 444 Madison Ave, 4th Fl New York, NY 10022 212.508.0521
Hi, We have a similar set up. The biggest performance improvement we achieved was with: echo 0 > /proc/sys/portals/debug Static HTML and images work fine, it''s PHP applications we find hit lustre the hardest. Are you having any specific performance problems? Ivanov, Zlatin wrote:> Hi, > > We recently switched from a NetApp FAS250 with NFS to a dual-head > FAS270c with iSCSI + Lustre. We are running Lustre 1.4.7.3 both on the > Lustre OSS/MDS and on about 20 client machines, all RHEL 4ES Up4 boxes. > > Our setup is pretty straightforward, with both NetApp heads active, each > serving one LUN over iSCSI. We have a pair of Lustre boxes, each seeing > both LUN''s as the same devices and in the same order (using the > multipath driver). Each Lustre box serves as an OSS for one of the LUN''s > and a failover OSS for the other. For a failover MDS setup, we''ve put > the MDS partition on the iSCSI fabric and not on local storage. One > Lustre box acts as both and OSS and the cluster FS MDS. > > We''ve been trying to improve Lustre performance for our Web serving > environment (millions of small files). Naturally, there are mostly read > ops, with the ocassional write-intensive cron jobs (rsyncs for cold > backups, publishing processes, etc.). > > After going through the 2006 mailing list archives and Bugzilla: > > https://bugzilla.clusterfs.com/show_bug.cgi?id=10265 > https://bugzilla.clusterfs.com/show_bug.cgi?id=6252 > > I''ve come up with the following to run on our OST''s/MDS and clients, as > appropriate: > > echo 0 > /proc/sys/portals/debug > for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do > case LRU in > MDC*) > echo 2500 > $LRU ;; > OSC*) > echo 1000 > $LRU ;; > esac; > done > for i in `find /proc/fs/lustre -name max_read_ahead_mb`; do > echo 4 > $i; > done > for i in `find /proc/fs/lustre -name max_read_ahead_whole_mb`; do > echo 1 > $i; > done > > Does this seem appropriate for a web serving environment? Are there > tunables that I am not using correctly? Comments would be much > appreciated. > > Thank you & best, Zlatin > > Zlatin Ivanov > Systems Administrator > New York Magazine > 444 Madison Ave, 4th Fl > New York, NY 10022 > 212.508.0521 > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Hi Adam, We already have been running with "echo 0 > /proc/sys/portals/debug" across all Lustre boxes: MDS, OSSs, clients. Dear Lustre community: Please jump to the bottom for specific questions. Thank you. Since we switched to Lustre 2 weeks ago, we''ve noticed the following: - The site and its various components seem to load faster - The same measured through Keynote, a metrics and monitoring company, report that on average, pages now load a bit slower. The _average_ page load time in recent weeks and months used to be 1.7 - 2.1 sec; for the past 2 weeks it has been 2.0 - 2.4 sec. Traffic levels have remained unchanged. - When we crawl the Lustre file system, say for indexing or regular-expression substitution purposes, processing about 1MM files takes ~50 min over NFS, and ~2 hrs on Lustre, for an identical set of files. We used to mount NFS with: rw,rsize=8192,wsize=8192,soft,intr,async,nodev We mount Lustre with: defaults,_netdev on most clients and defaults,_netdev,flock on a couple of special clients running applications like Subversion requiring exclusive file locks defaults,_netdev,ro on a few special clients Additionally, we mount the OSTs on the OSSs with: fstype ldiskfs mountfsoptions extents,mballoc On the MDS: ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_mds-prod/lru_size /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_mds-prod/lru_size cat /proc/fs/lustre/ldlm/namespaces/*/lru_size 1000 1000 On the clients: ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size /proc/fs/lustre/ldlm/namespaces/MDC_lustre1_mds-prod_MNT_client-prod-f7f 23600/lru_size /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_MNT_client-prod-f7f236 00/lru_size /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_MNT_client-prod-f7f236 00/lru_size cat /proc/fs/lustre/ldlm/namespaces/*/lru_size 400 400 400 /proc/fs/lustre/llite/fs0/max_read_ahead_mb 4 /proc/fs/lustre/llite/fs0/max_read_ahead_whole_mb 1 Specific questions: 1) I am not sure about max_read_ahead_whole_mb - it''s unclear to me whether 0 or 1 should be preferred here. 2) Should I prefer 2500 for /proc/fs/lustre/ldlm/namespaces/MDC*/lru_size and 1000 or 1500 for /proc/fs/lustre/ldlm/namespaces/OSC*/lru_size? 3) If yes, should I be setting this across the board, on the clients only, or on the MDS/OSSs only? 4) In general, should I still stick to avoiding mounting with flock unless explicitly required? Thank you very much, Zlatin
On Dec 28, 2006 16:18 -0500, Ivanov, Zlatin wrote:> - When we crawl the Lustre file system, say for indexing or > regular-expression substitution purposes, processing about 1MM files > takes ~50 min over NFS, and ~2 hrs on Lustre, for an identical set of > files.What is your average file size? Lustre isn''t really tuned for the millions of small files case yet.> ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size > /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_mds-prod/lru_size > /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_mds-prod/lru_size > cat /proc/fs/lustre/ldlm/namespaces/*/lru_size > 1000 > 1000lru_size has no meaning on the MDS node.> On the clients: > > ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size > /proc/fs/lustre/ldlm/namespaces/MDC_lustre1_mds-prod_MNT_client-prod-f7f > 23600/lru_size > /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_MNT_client-prod-f7f236 > 00/lru_size > /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_MNT_client-prod-f7f236 > 00/lru_size > cat /proc/fs/lustre/ldlm/namespaces/*/lru_size > 400 > 400 > 400If you are re-using many files, I would suggest increasing these to at least 5000, 2500, 2500, if not 2-5x that. Since you have a very small cluster the number of locks held by the clients isn''t going to hurt the servers, and it will allow you to cache much more content on the clients.> /proc/fs/lustre/llite/fs0/max_read_ahead_mb > 4 > > /proc/fs/lustre/llite/fs0/max_read_ahead_whole_mb > 1> 1) I am not sure about max_read_ahead_whole_mb - it''s unclear to me > whether 0 or 1 should be preferred here.This means "files smaller than 1MB will be read in their entirety on the first read". There isn''t really any point in making this smaller.> 2) Should I prefer 2500 for > /proc/fs/lustre/ldlm/namespaces/MDC*/lru_size and 1000 or 1500 for > /proc/fs/lustre/ldlm/namespaces/OSC*/lru_size?At least, yes.> 3) If yes, should I be setting this across the board, on the clients > only, or on the MDS/OSSs only?On the clients only.> 4) In general, should I still stick to avoiding mounting with flock > unless explicitly required?Well, there are occasional problems with the flock code, but if you are not enabling it consistently across your cluster it means that some nodes may not cooperate in the locking correctly. I''d enable it across the board, and if flock is unused on some nodes then no harm done. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.