thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre Lite [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Lin Shen (lshen)

2006-Oct-28 20:10 UTC

[Lustre-discuss] Lustre Lite

Hi,
 
I wonder what''s the current status of Lustre Lite, is it still
supported? We''re looking for a CFS for an embedded system without a lot
of nodes. Lustre sounds a bit too heavyweight for that.
 
Lin Shen
 
Cisco Systems 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061027/838ff9bf/attachment.html

Andreas Dilger

2006-Oct-28 20:37 UTC

head link

[Lustre-discuss] Lustre Lite

On Oct 27, 2006  11:37 -0700, Lin Shen (lshen) wrote:> I wonder what''s the current status of Lustre Lite, is it still
> supported? We''re looking for a CFS for an embedded system without
a lot
> of nodes. Lustre sounds a bit too heavyweight for that.
How much RAM is in the embedded system?  With tuning if various buffer
sizes and thread counts, the amount of RAM used on a client could be
reduced substantially (at some cost in performance), depending on just
how tight the RAM is.  Also, it would likely be possible to shrink
the code in a number of ways by adding #ifdefs for different functionality
you might not need:

- disable flock
- disable quotas
- disable O_DIRECT and associated IO path
- make no-op debugging macros

We''d likely accept patches to make these configure options if it is
done cleanly.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Nathaniel Rutman

2006-Oct-30 09:18 UTC

head link

[Lustre-discuss] Lustre Lite

Lin Shen (lshen) wrote:> Hi,
>  
> I wonder what''s the current status of Lustre Lite, is it still 
> supported? We''re looking for a CFS for an embedded system without
a
> lot of nodes. Lustre sounds a bit too heavyweight for that.
>Lustre Lite is just the historical name of the client software -- there 
is no reduced-footprint Lustre client.

Andreas Dilger

2006-Oct-31 08:20 UTC

head link

[Lustre-discuss] Lustre Lite

On Oct 30, 2006  10:20 -0800, Lin Shen (lshen) wrote:> Thanks for getting back to me. The CFS would be on a series of
> platforms, so the RAM number varies. The low end could be as little as
> 512MG and that''s for the whole system including routing, VoIP etc.
The lustre clients (IO nodes) on the BG/L system (#1 supercomputer)
"only" have 1MB of RAM and they do IO for up to 128 compute clients
at a time.  Lustre as a whole (MDS + 5 OSTs + 2 client mounts) can
run inside a 64MB UML.  The lustre code already has internal tunables
that adjust the default buffer size, thread count, etc based on the
amount of RAM on the node.

The only issue is that by restricting the amount of memory that lustre
uses it might negatively impact the performance because the client cannot
keep IO in flight to saturate the network.
> What''s the footprint of Lustre and how much lower could it be
reduced to
> with the code optimization you mentioned? Maybe we can cut it even more
> w/o medata server cluster etc.
The clustered metadata isn''t in the released Lustre code, and even if
it
was available there wouldn''t be a requirement to use it.

With debugging symbols the lustre modules are a whopping 3-8MB a piece,
and you need about 10 of them loaded to have a functioning client.  If
you strip out the debugging symbols you lose about 70% of that.

If you _really_ wanted to slim down the modules you could make the
CDEBUG() macro a no-op for anything except error messages, at the
expense of making debugging much harder.  That could be mitigated (and
I''d be thrilled to see it) by having systemtap scripts to replace the
CDEBUG trace/debug functionality.

In terms of Lustre buffers and such, on a small clients this can be
limited to a few MB in total.  On larger clients it grows into the
10s of MB, and of course whatever the VM gives the filesystem for
data cache.  Lustre has its own tunable limits on how much dirty data
a client can have, as older kernels didn''t do a good job with that.
> And I don''t think the number of nodes will ever exceed a dozen.
The number of clients doesn''t impact the amount of memory on other
clients.  Lustre is strictly a client-server filesystem in that
regard.  It does impact memory usage on the server, which would be
more important if you also want the server to live inside this
environment.
> -----Original Message-----
> From: Andreas Dilger [mailto:adilger@clusterfs.com] 
> Sent: Saturday, October 28, 2006 7:38 PM
> To: Lin Shen (lshen)
> Cc: lustre-discuss@clusterfs.com
> Subject: Re: [Lustre-discuss] Lustre Lite
> 
> On Oct 27, 2006  11:37 -0700, Lin Shen (lshen) wrote:
> > I wonder what''s the current status of Lustre Lite, is it
still
> > supported? We''re looking for a CFS for an embedded system
without a
> > lot of nodes. Lustre sounds a bit too heavyweight for that.
> 
> How much RAM is in the embedded system?  With tuning if various buffer
> sizes and thread counts, the amount of RAM used on a client could be
> reduced substantially (at some cost in performance), depending on just
> how tight the RAM is.  Also, it would likely be possible to shrink the
> code in a number of ways by adding #ifdefs for different functionality
> you might not need:
> 
> - disable flock
> - disable quotas
> - disable O_DIRECT and associated IO path
> - make no-op debugging macros
> 
> We''d likely accept patches to make these configure options if it
is done
> cleanly.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Andreas Dilger

2006-Nov-02 20:10 UTC

head link

[Lustre-discuss] Lustre Lite

On Nov 01, 2006  10:27 -0800, Lin Shen (lshen) wrote:> Yes, the server will be running in the embedded environment as well.
With only tens of clients there shouldn''t be a problem with 512MB of
RAM.
> A few more things to clarify:
> 
> 1. The definition of a client. Is it one client per OS instance? 
Correct.
> 2. In our system, there will potentially be a wide range of storage
> media such as disk array, hard disk, flash and USB devices and we want
> to share all of them. Is it feasible and I''m having a hard time to
> picture sharing among those devices with very different performance
> capability and reliability.
This isn''t really what Lustre was designed to do.  It currently assumes
a uniform IO performance (and with 1.4 it also is best to have the same
size devices).  That isn''t to say you cannot have Lustre running on
different block devices, but rather (a) it isn''t very efficient to use
on small devices because of overhead, and (b) to aggregate all of these
devices into a single filesystem wouldn''t work very well.
> 3. We also want the system to share database. Is this related to CFS?
You should be able to run a database on top of Lustre, but this is not
a common mode of operation for our current customers and I don''t know
how well it functions.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Jean-Marc Saffroy

2006-Nov-03 06:32 UTC

head link

[Lustre-discuss] Lustre Lite

On Fri, 3 Nov 2006, Andreas Dilger wrote:
> You should be able to run a database on top of Lustre, but this is not a 
> common mode of operation for our current customers and I don''t
know how
> well it functions.
Maybe poor performance is what limits adoption of Lustre for transactional 
workloads: I heard some stories of users trying database-like access (ie. 
updates of many small files) on Lustre, with mixed or terrible results.

NFS gives much better results with small files, but it does not scale 
well, and it lacks good POSIX conformance (ie. wrt caching).

Too bad we don''t have the best of both worlds!


-- 
Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net

Peter J. Braam

2006-Nov-03 16:55 UTC

head link

[Lustre-discuss] Lustre Lite

Hi Jean-Marc,

Do you have a test on which NFS performs much better.  We''d like to
know
more about that.

If you go bak on this list, you will see that Lustre can handsomely beat NFS
at unpacking archives, for example.  So I am curious.

- Peter - 
> -----Original Message-----
> From: lustre-discuss-bounces@clusterfs.com 
> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of 
> Jean-Marc Saffroy
> Sent: Friday, November 03, 2006 9:30 PM
> To: Andreas Dilger
> Cc: Lin Shen (lshen); Lustre User Discussion Mailing List
> Subject: Re: [Lustre-discuss] Lustre Lite
> 
> On Fri, 3 Nov 2006, Andreas Dilger wrote:
> 
> > You should be able to run a database on top of Lustre, but 
> this is not 
> > a common mode of operation for our current customers and I 
> don''t know 
> > how well it functions.
> 
> Maybe poor performance is what limits adoption of Lustre for 
> transactional
> workloads: I heard some stories of users trying database-like 
> access (ie. 
> updates of many small files) on Lustre, with mixed or 
> terrible results.
> 
> NFS gives much better results with small files, but it does 
> not scale well, and it lacks good POSIX conformance (ie. wrt caching).
> 
> Too bad we don''t have the best of both worlds!
> 
> 
> --
> Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Don Capps

2006-Nov-03 17:22 UTC

head link

[Lustre-discuss] Lustre Lite

Jean-Marc,

    Perhaps a quick run of "fileop -f 100" from the Iozone
    suite might help identify which meta-data operations
    are problematic ?

Enjoy
Don Capps

----- Original Message ----- 
From: "Peter J. Braam" <braam@clusterfs.com>
To: "''Jean-Marc Saffroy''"
<jean-marc.saffroy@ext.bull.net>; "''Andreas
Dilger''" <adilger@clusterfs.com>
Cc: "''Lin Shen (lshen)''" <lshen@cisco.com>;
"''Lustre User Discussion Mailing
List''" <lustre-discuss@clusterfs.com>
Sent: Friday, November 03, 2006 5:55 PM
Subject: RE: [Lustre-discuss] Lustre Lite

> Hi Jean-Marc,
>
> Do you have a test on which NFS performs much better.  We''d like
to know
> more about that.
>
> If you go bak on this list, you will see that Lustre can handsomely beat 
> NFS
> at unpacking archives, for example.  So I am curious.
>
> - Peter -
>
>> -----Original Message-----
>> From: lustre-discuss-bounces@clusterfs.com
>> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of
>> Jean-Marc Saffroy
>> Sent: Friday, November 03, 2006 9:30 PM
>> To: Andreas Dilger
>> Cc: Lin Shen (lshen); Lustre User Discussion Mailing List
>> Subject: Re: [Lustre-discuss] Lustre Lite
>>
>> On Fri, 3 Nov 2006, Andreas Dilger wrote:
>>
>> > You should be able to run a database on top of Lustre, but
>> this is not
>> > a common mode of operation for our current customers and I
>> don''t know
>> > how well it functions.
>>
>> Maybe poor performance is what limits adoption of Lustre for
>> transactional
>> workloads: I heard some stories of users trying database-like
>> access (ie.
>> updates of many small files) on Lustre, with mixed or
>> terrible results.
>>
>> NFS gives much better results with small files, but it does
>> not scale well, and it lacks good POSIX conformance (ie. wrt caching).
>>
>> Too bad we don''t have the best of both worlds!
>>
>>
>> --
>> Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss@clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
>
>

Jean-Marc Saffroy

2006-Nov-03 18:59 UTC

head link

[Lustre-discuss] Lustre Lite

On Sat, 4 Nov 2006, Peter J. Braam wrote:
> Do you have a test on which NFS performs much better.  We''d like
to know
> more about that.
One test I can mention is postmark, originally written by Netapp folks, 
now available from Debian:
   http://packages.debian.org/stable/utils/postmark

In the test, postmark was configured to perform 10k "transactions" on
10k
files between 500B and 22kB.

This was run with up to 16 processes on *one* client connecting to various 
server configs on several interconnects and storage systems, running 1.4.6 
and 1.6beta4.

For convenience, I wrote the attached wrapper script:
  $ ssh $client postmark.sh -n 10000 -t 10000 -j $numprocs > $log

The metric was the number of transactions per second (sum of column 3 
across all lines of $log). NFS reached 800-1000 t/s with 2-8 processes 
(then performance dropped) over GigE, while Lustre peaked at 200 t/s with 
2 process on a tiny GigE cluster (1 MDS, 1 OSS serving 1 OST). A bigger 
cluster could not give more than 900 t/s.

Back then I suspected that the test was probably network bound (because of 
high interrupt and packet rates vs. low CPU and disk usage), but I could 
not find a good measurement. A tool showing metadata RPC statistics (such 
as LMT maybe, but I didn''t know it then) would have been useful.
> If you go bak on this list, you will see that Lustre can handsomely beat 
> NFS at unpacking archives, for example.  So I am curious.
I suppose you refer to the thread "I/O performance on small files"
(end of
June). IIRC the recipes mentioned there had been used in the tests.

Now I no longer have time to spend on these issues, but ideas for 
improvement would still be interesting.

-- 
Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postmark.sh
Type: application/x-sh
Size: 2599 bytes
Desc: 
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061103/983ead4e/postmark.sh

''Andreas Dilger''

2006-Nov-03 20:36 UTC

head link

[Lustre-discuss] Lustre Lite

On Nov 04, 2006  02:58 +0100, Jean-Marc Saffroy wrote:> Back then I suspected that the test was probably network bound (because of 
> high interrupt and packet rates vs. low CPU and disk usage), but I could 
> not find a good measurement. A tool showing metadata RPC statistics (such 
> as LMT maybe, but I didn''t know it then) would have been useful.
You could use llstat.pl to show MDS RPC stats, or are you thinking of
something else?


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Peter J. Braam

2006-Nov-03 21:10 UTC

head link

[Lustre-discuss] Lustre Lite

Your email explains a fair amount and here are two reasons contributing to
this.  

One is the mdc semaphore - our clients are effectively single threading
metadata updates - it will be a little while before we get to this, it is
not that hard.

The second is that many operations (but not the most critical
open/create/read/write) have too many RPC''s - this one we have a good
plan
for, and I think it might be fixed quite soon. 

Thanks!

- Peter -
> -----Original Message-----
> From: Jean-Marc Saffroy [mailto:jean-marc.saffroy@ext.bull.net] 
> Sent: Saturday, November 04, 2006 9:59 AM
> To: Peter J. Braam
> Cc: ''Andreas Dilger''; ''Lin Shen
(lshen)''; ''Lustre User
> Discussion Mailing List''
> Subject: RE: [Lustre-discuss] Lustre Lite
> 
> On Sat, 4 Nov 2006, Peter J. Braam wrote:
> 
> > Do you have a test on which NFS performs much better.  We''d
like to
> > know more about that.
> 
> One test I can mention is postmark, originally written by 
> Netapp folks, now available from Debian:
>    http://packages.debian.org/stable/utils/postmark
> 
> In the test, postmark was configured to perform 10k 
> "transactions" on 10k files between 500B and 22kB.
> 
> This was run with up to 16 processes on *one* client 
> connecting to various server configs on several interconnects 
> and storage systems, running 1.4.6 and 1.6beta4.
> 
> For convenience, I wrote the attached wrapper script:
>   $ ssh $client postmark.sh -n 10000 -t 10000 -j $numprocs > $log
> 
> The metric was the number of transactions per second (sum of 
> column 3 across all lines of $log). NFS reached 800-1000 t/s 
> with 2-8 processes (then performance dropped) over GigE, 
> while Lustre peaked at 200 t/s with
> 2 process on a tiny GigE cluster (1 MDS, 1 OSS serving 1 
> OST). A bigger cluster could not give more than 900 t/s.
> 
> Back then I suspected that the test was probably network 
> bound (because of high interrupt and packet rates vs. low CPU 
> and disk usage), but I could not find a good measurement. A 
> tool showing metadata RPC statistics (such as LMT maybe, but 
> I didn''t know it then) would have been useful.
> 
> > If you go bak on this list, you will see that Lustre can handsomely 
> > beat NFS at unpacking archives, for example.  So I am curious.
> 
> I suppose you refer to the thread "I/O performance on small 
> files" (end of June). IIRC the recipes mentioned there had 
> been used in the tests.
> 
> Now I no longer have time to spend on these issues, but ideas 
> for improvement would still be interesting.
> 
> 
> -- 
> Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net

Jean-Marc Saffroy

2006-Nov-04 05:49 UTC

head link

[Lustre-discuss] Lustre Lite

On Sat, 4 Nov 2006, ''Andreas Dilger'' wrote:
> On Nov 04, 2006  02:58 +0100, Jean-Marc Saffroy wrote:
>> Back then I suspected that the test was probably network bound (because
of
>> high interrupt and packet rates vs. low CPU and disk usage), but I
could
>> not find a good measurement. A tool showing metadata RPC statistics
(such
>> as LMT maybe, but I didn''t know it then) would have been
useful.
>
> You could use llstat.pl to show MDS RPC stats, or are you thinking of
> something else?
No, actually that''s exactly what I had in mind. :-)
Thanks for the tip!


-- 
Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net

Lin Shen (lshen)

2006-Nov-06 00:47 UTC

head link

[Lustre-discuss] Lustre Lite

> > 2. In our system, there will potentially be a wide range of storage
> > media such as disk array, hard disk, flash and USB devices 
> and we want 
> > to share all of them. Is it feasible and I''m having a hard
time to
> > picture sharing among those devices with very different performance 
> > capability and reliability.
> 
> This isn''t really what Lustre was designed to do.  It 
> currently assumes a uniform IO performance (and with 1.4 it 
> also is best to have the same size devices).  That isn''t to 
> say you cannot have Lustre running on different block 
> devices, but rather (a) it isn''t very efficient to use on 
> small devices because of overhead, and (b) to aggregate all 
> of these devices into a single filesystem wouldn''t work very well.
> 
How hard would it be to make Lustre to work reasonably well with hybrid
storage media? 
Are you aware of any other Open Source or Commerical CFS that does this?


Thanks
lin

Andreas Dilger

2006-Nov-13 03:41 UTC

head link

[Lustre-discuss] Lustre Lite

On Nov 11, 2006  15:04 -0800, Lin Shen (lshen) wrote:> In talking to the application team, they think there are a
> number of reasons that NFS is not good enough. Do you think Lustre can
> adequately address the following issues? 
> 
> 1) NFS adds too many data copies and complex marshaling of messages. 
> Particularly for virtual memory paging, this is a performance killer.
Lustre supports O_DIRECT and RDMA network transfers (zero-copy send only
for TCP).  With O_DIRECT IO and, say, Infiniband, it is possible to have
no data copies (except RDMA over the network) all the way to the disk.
> 2) NFS directory traversal is very slow because every path element
> requires message exchanges between client and server. This means that
> administrative tasks like backups are really expensive.
Lustre is the same in this regard currently.
> 3) NFS locking has always been problematic, sometimes with deadlock
> cases, sometimes problems with recovery after node failure. Simultaneous
> access of a single file from multiple nodes has additional caching and
> other coherency problems.
Lustre has full data coherency between clients.
 > 4) NFS doesn''t provide any notion of raw device access so the
> optimizations created in storage layers running on top of NFS
> (databases, specialized file systems, etc.) don''t work as
expected.
This is O_DIRECT, or something else?
> 5) The layering of volume management under NFS doesn''t really
work.  If
> the storage media is spread across several nodes, with the NFS server
> located on a single node, the read/write requests have to move from
> client to server to media node.  With a cluster file system, the
> read/write requests should always be from client to media node.
This is the case with Lustre - that client IO is always directly (and
only) to the storage node(s) that contains the data.  A single file can
be striped over multiple storage nodes.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Lustre discuss - Oct 2006 - Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite

[Lustre-discuss] Lustre Lite