thr3ads.net - Lustre discuss - [Lustre-discuss] Web serving with Lustre [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Ivanov, Zlatin

2006-Dec-27 23:58 UTC

[Lustre-discuss] Web serving with Lustre

Hi,

We recently switched from a NetApp FAS250 with NFS to a dual-head
FAS270c with iSCSI + Lustre. We are running Lustre 1.4.7.3 both on the
Lustre OSS/MDS and on about 20 client machines, all RHEL 4ES Up4 boxes.

Our setup is pretty straightforward, with both NetApp heads active, each
serving one LUN over iSCSI. We have a pair of Lustre boxes, each seeing
both LUN''s as the same devices and in the same order (using the
multipath driver). Each Lustre box serves as an OSS for one of the
LUN''s
and a failover OSS for the other. For a failover MDS setup, we''ve put
the MDS partition on the iSCSI fabric and not on local storage. One
Lustre box acts as both and OSS and the cluster FS MDS.

We''ve been trying to improve Lustre performance for our Web serving
environment (millions of small files). Naturally, there are mostly read
ops, with the ocassional write-intensive cron jobs (rsyncs for cold
backups, publishing processes, etc.).

After going through the 2006 mailing list archives and Bugzilla:

https://bugzilla.clusterfs.com/show_bug.cgi?id=10265
https://bugzilla.clusterfs.com/show_bug.cgi?id=6252

I''ve come up with the following to run on our OST''s/MDS and
clients, as
appropriate:

echo 0 > /proc/sys/portals/debug
for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do
  case LRU in
    MDC*)
      echo 2500 > $LRU ;;
    OSC*)
      echo 1000 > $LRU ;;
  esac;
done
for i in `find /proc/fs/lustre -name max_read_ahead_mb`; do
  echo 4 > $i;
done
for i in `find /proc/fs/lustre -name max_read_ahead_whole_mb`; do
  echo 1 > $i;
done

Does this seem appropriate for a web serving environment? Are there
tunables that I am not using correctly? Comments would be much
appreciated.

Thank you & best, Zlatin

Zlatin Ivanov
Systems Administrator
New York Magazine
444 Madison Ave, 4th Fl
New York, NY 10022
212.508.0521

Adam Cassar

2006-Dec-28 01:11 UTC

head link

[Lustre-discuss] Web serving with Lustre

Hi,

We have a similar set up. The biggest performance improvement we 
achieved was with:

echo 0 > /proc/sys/portals/debug

Static HTML and images work fine, it''s PHP applications we find hit 
lustre the hardest.

Are you having any specific performance problems?

Ivanov, Zlatin wrote:> Hi,
>
> We recently switched from a NetApp FAS250 with NFS to a dual-head
> FAS270c with iSCSI + Lustre. We are running Lustre 1.4.7.3 both on the
> Lustre OSS/MDS and on about 20 client machines, all RHEL 4ES Up4 boxes.
>
> Our setup is pretty straightforward, with both NetApp heads active, each
> serving one LUN over iSCSI. We have a pair of Lustre boxes, each seeing
> both LUN''s as the same devices and in the same order (using the
> multipath driver). Each Lustre box serves as an OSS for one of the
LUN''s
> and a failover OSS for the other. For a failover MDS setup, we''ve
put
> the MDS partition on the iSCSI fabric and not on local storage. One
> Lustre box acts as both and OSS and the cluster FS MDS.
>
> We''ve been trying to improve Lustre performance for our Web
serving
> environment (millions of small files). Naturally, there are mostly read
> ops, with the ocassional write-intensive cron jobs (rsyncs for cold
> backups, publishing processes, etc.).
>
> After going through the 2006 mailing list archives and Bugzilla:
>
> https://bugzilla.clusterfs.com/show_bug.cgi?id=10265
> https://bugzilla.clusterfs.com/show_bug.cgi?id=6252
>
> I''ve come up with the following to run on our OST''s/MDS
and clients, as
> appropriate:
>
> echo 0 > /proc/sys/portals/debug
> for LRU in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do
>   case LRU in
>     MDC*)
>       echo 2500 > $LRU ;;
>     OSC*)
>       echo 1000 > $LRU ;;
>   esac;
> done
> for i in `find /proc/fs/lustre -name max_read_ahead_mb`; do
>   echo 4 > $i;
> done
> for i in `find /proc/fs/lustre -name max_read_ahead_whole_mb`; do
>   echo 1 > $i;
> done
>
> Does this seem appropriate for a web serving environment? Are there
> tunables that I am not using correctly? Comments would be much
> appreciated.
>
> Thank you & best, Zlatin
>
> Zlatin Ivanov
> Systems Administrator
> New York Magazine
> 444 Madison Ave, 4th Fl
> New York, NY 10022
> 212.508.0521
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
>

Ivanov, Zlatin

2006-Dec-29 00:07 UTC

head link

[Lustre-discuss] Web serving with Lustre

Hi Adam,

We already have been running with "echo 0 >
/proc/sys/portals/debug"
across all Lustre boxes: MDS, OSSs, clients.

Dear Lustre community: Please jump to the bottom for specific questions.
Thank you.

Since we switched to Lustre 2 weeks ago, we''ve noticed the following:

- The site and its various components seem to load faster
- The same measured through Keynote, a metrics and monitoring company,
report that on average, pages now load a bit slower. The _average_ page
load time in recent weeks and months used to be 1.7 - 2.1 sec; for the
past 2 weeks it has been 2.0 - 2.4 sec. Traffic levels have remained
unchanged.
- When we crawl the Lustre file system, say for indexing or
regular-expression substitution purposes, processing about 1MM files
takes ~50 min over NFS, and ~2 hrs on Lustre, for an identical set of
files.

We used to mount NFS with:

rw,rsize=8192,wsize=8192,soft,intr,async,nodev

We mount Lustre with:

defaults,_netdev       on most clients and
defaults,_netdev,flock on a couple of special clients running
applications like Subversion requiring exclusive file locks
defaults,_netdev,ro    on a few special clients

Additionally, we mount the OSTs on the OSSs with:

fstype ldiskfs
mountfsoptions extents,mballoc

On the MDS:

ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size
/proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_mds-prod/lru_size
/proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_mds-prod/lru_size
cat /proc/fs/lustre/ldlm/namespaces/*/lru_size
1000
1000

On the clients:

ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size
/proc/fs/lustre/ldlm/namespaces/MDC_lustre1_mds-prod_MNT_client-prod-f7f
23600/lru_size
/proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_MNT_client-prod-f7f236
00/lru_size
/proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_MNT_client-prod-f7f236
00/lru_size
cat /proc/fs/lustre/ldlm/namespaces/*/lru_size
400
400
400

/proc/fs/lustre/llite/fs0/max_read_ahead_mb
4

/proc/fs/lustre/llite/fs0/max_read_ahead_whole_mb
1

Specific questions:
1) I am not sure about max_read_ahead_whole_mb - it''s unclear to me
whether 0 or 1 should be preferred here.
2) Should I prefer 2500 for
/proc/fs/lustre/ldlm/namespaces/MDC*/lru_size and 1000 or 1500 for
/proc/fs/lustre/ldlm/namespaces/OSC*/lru_size?
3) If yes, should I be setting this across the board, on the clients
only, or on the MDS/OSSs only?
4) In general, should I still stick to avoiding mounting with flock
unless explicitly required?

Thank you very much, Zlatin

Andreas Dilger

2006-Dec-29 01:00 UTC

head link

[Lustre-discuss] Web serving with Lustre

On Dec 28, 2006  16:18 -0500, Ivanov, Zlatin wrote:> - When we crawl the Lustre file system, say for indexing or
> regular-expression substitution purposes, processing about 1MM files
> takes ~50 min over NFS, and ~2 hrs on Lustre, for an identical set of
> files.
What is your average file size?  Lustre isn''t really tuned for the
millions of small files case yet.
> ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size
> /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_mds-prod/lru_size
> /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_mds-prod/lru_size
> cat /proc/fs/lustre/ldlm/namespaces/*/lru_size
> 1000
> 1000
lru_size has no meaning on the MDS node.
> On the clients:
> 
> ls -1 /proc/fs/lustre/ldlm/namespaces/*/lru_size
> /proc/fs/lustre/ldlm/namespaces/MDC_lustre1_mds-prod_MNT_client-prod-f7f
> 23600/lru_size
> /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost1p_MNT_client-prod-f7f236
> 00/lru_size
> /proc/fs/lustre/ldlm/namespaces/OSC_lustre1_ost2p_MNT_client-prod-f7f236
> 00/lru_size
> cat /proc/fs/lustre/ldlm/namespaces/*/lru_size
> 400
> 400
> 400
If you are re-using many files, I would suggest increasing these to at
least 5000, 2500, 2500, if not 2-5x that.  Since you have a very small
cluster the number of locks held by the clients isn''t going to hurt the
servers, and it will allow you to cache much more content on the clients.
> /proc/fs/lustre/llite/fs0/max_read_ahead_mb
> 4
> 
> /proc/fs/lustre/llite/fs0/max_read_ahead_whole_mb
> 1
> 1) I am not sure about max_read_ahead_whole_mb - it''s unclear to
me
> whether 0 or 1 should be preferred here.
This means "files smaller than 1MB will be read in their entirety on the
first read".  There isn''t really any point in making this smaller.
> 2) Should I prefer 2500 for
> /proc/fs/lustre/ldlm/namespaces/MDC*/lru_size and 1000 or 1500 for
> /proc/fs/lustre/ldlm/namespaces/OSC*/lru_size?
At least, yes.
> 3) If yes, should I be setting this across the board, on the clients
> only, or on the MDS/OSSs only?
On the clients only.
> 4) In general, should I still stick to avoiding mounting with flock
> unless explicitly required?
Well, there are occasional problems with the flock code, but if you
are not enabling it consistently across your cluster it means that some
nodes may not cooperate in the locking correctly.  I''d enable it across
the board, and if flock is unused on some nodes then no harm done.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Dec 2006 - Web serving with Lustre

[Lustre-discuss] Web serving with Lustre

[Lustre-discuss] Web serving with Lustre

[Lustre-discuss] Web serving with Lustre

[Lustre-discuss] Web serving with Lustre