thr3ads.net - Lustre discuss - [Lustre-discuss] NFS vs Lustre [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Tharindu Rukshan Bamunuarachchi

2009-Aug-26 10:11 UTC

[Lustre-discuss] NFS vs Lustre

hi All,

 

I need to prepare small report on "NFS vs. Lustre" ?

 

I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) .

 

Can you guys please provide few tips . URLs . etc.

 

 

 

 

cheers,

__

tharindu

 



*******************************************************************************************************************************************************************

"The information contained in this email including in any attachment is
confidential and is meant to be read only by the person to whom it is addressed.
If you are not the intended recipient(s), you are prohibited from printing,
forwarding, saving or copying this email. If you have received this e-mail in
error, please immediately notify the sender and delete this e-mail and its
attachments from your computer."

*******************************************************************************************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090826/2bb3d937/attachment.html

Mag Gam

2009-Aug-29 14:31 UTC

head link

[Lustre-discuss] NFS vs Lustre

Lustre is a parallel filesystem where NFS is not.

The advantage of NFS is its native for many Unix systems and is widely
available. The advantage of Lustre is its performance.

GPFS is a parallel fileysystem very similar to Lustre but its backed
by IBM. It runs on AIX and Linux. Its good but costly.

CXFS and GFS work similar. You need shared blockdev device such as a
SAN or NetAPP (iscsi). Not really for performance. This is mostly for
high availability.

What are you trying to solve? We maybe able to help .




On Wed, Aug 26, 2009 at 6:11 AM, Tharindu Rukshan
Bamunuarachchi<tharindub at millenniumit.com>
wrote:> hi All,
>
>
>
> I need to prepare small report on ?NFS vs. Lustre? ?
>
>
>
> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) ?
>
>
>
> Can you guys please provide few tips ? URLs ? etc.
>
>
>
>
>
>
>
>
>
> cheers,
>
> __
>
> tharindu
>
>
>
>
*******************************************************************************************************************************************************************
>
> "The information contained in this email including in any attachment
is
> confidential and is meant to be read only by the person to whom it is
> addressed. If you are not the intended recipient(s), you are prohibited
from
> printing, forwarding, saving or copying this email. If you have received
> this e-mail in error, please immediately notify the sender and delete this
> e-mail and its attachments from your computer."
>
>
*******************************************************************************************************************************************************************
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Lee Ward

2009-Aug-29 17:56 UTC

head link

[Lustre-discuss] NFS vs Lustre

You seem to be correct. Nobody ever seems to contrast NFS with these
super file systems solutions. That is interesting.

It''s Saturday, the family is out running around. I have time to think
about this question. Unfortunately, for you, I do this more for myself.
Which means this is going to be a stream-of-consciousness thing far more
than a well organized discussion. Sorry.

I''d begin by motivating both NFS and Lustre. Why do they exist? What
problems do they solve.

NFS first.

Way back in the day, ethernet and the concept of a workstation got
popular. There were many tools to copy files between machines but few
ways to share a name space; Have the directory hierarchy and it''s
content directly accessible to an application on a foreign machine. This
made file sharing awkward. The model was to copy the file or files to
the workstation where the work was going to be done, do the work, and
copy the results back to some, hopefully, well maintained central
machine.

There *were* solutions to this at the time. I recall an attractive
alternative called RFS (I believe) from the Bell Labs folks, via some
place in England if I''m remembering right, it''s been a looong
time after
all. It had issues though. The nastiest issue for me was that if a
client went down the service side would freeze, at least partially.
Since this could happen willy-nilly, depending on the users wishes and
how well the power button on his workstation was protected, together
with the power cord and ethernet connection, this freezing of service
for any amount of time was difficult to accept. This was so even in a
rather small collection of machines.

The problem with RFS (?) and it''s cousins were that they were all
stateful. The service side depended on state that was held at the
client. If the client went down, the service side couldn''t continue
without a whole lot of recovery, timeouts, etc. It was a very *annoying*
problem.

In the latter half of the 1980''s (am I remembering right?) SUN proposed
an open protocol called NFS. An implementation using this protocol could
do most everything RFS(?) could but it didn''t suffer the service-side
hangs. It couldn''t. It was stateless. If the client went down, the
server just didn''t care. If the server went down, the client had the
opportunity to either give up on the local operation, usually with an
error returned, or wait. It was always up to the user and for client
failures the annoyance was limited to the user(s) on that client.

SUN, also, wisely desired the protocol to be ubiquitous. They published
it. They wanted *everyone* to adopt it. More, they would help
competitors. SUN held interoperability bake-a-thons to help with this.

It looks like they succeeded, all around :)

Let''s sum up, then. The goals for NFS were:

1) Share a local file system name space across the network.
2) Do it in a robust, resilient way. Pesky FS issues because some user
kicked the cord out of his workstation was unacceptable.
3) Make it ubiquitous. SUN was a workstation vendor. They sold servers
but almost everyone had a VAX in their back pocket where they made the
infrastructure investment. SUN needed the high-value machines to support
this protocol.

Now Lustre.

Lustre has a weird story and I''m not going to go into all of it. The
shortest, relevant, part is that while there was at least one solution
that DOE/NNSA felt acceptable, GPFS, it was not available on anything
other than an IBM platform and because DOE/NNSA had a semi-formal policy
of buying from different vendors at each of the three labs we were kind
of stuck. Other file systems, existing and imminent, at the time were
examined but they were all distributed file systems and we needed IO
*bandwidth*. We needed lots, and lots of bandwidth.

We also needed that ubiquitous thing that SUN had as one of their goals.
We didn''t want to pay millions of dollars for another GPFS. We felt
that
would only be painting ourselves into a corner. Whatever we did, the
result *had* to be open. It also had to be attractive to smaller sites
as we wanted to turn loose of the ting at some point. If it was
attractive for smaller machines we felt we would win in the long term
as, eventually, the cost to further and maintain this thing was spread
across the community.

As far as technical goals, I guess we just wanted GPFS, but open. More
though, we wanted it to survive in our platform roadmaps for at least a
decade. The actual technical requirements for the contract that DOE/NNSA
executed with HP, CFS was the sub-contractor responsible for
development, can be found here:

<http://www-cs-students.stanford.edu/~trj/SGS_PathForward_SOW.pdf>

LLNL used to host this but it''s no longer there? Oh well, hopefully
this
link will be good for a while, at least.

I''m just going to jump to the end and sum the goals up:

1) It must do *everything* NFS can. We relaxed the stateless thing
though, see the next item for why.
2) It must support full POSIX semantics; Last writer wins, POSIX locks,
etc.
3) It must support all of the transports we are interested in.
4) It must be scalable, in that we can cheaply attach storage and both
performance (reading *and* writing) and capacity within a single mounted
file system increase in direct proportion.
6) We wanted it to be easy, administratively. Our goal was that it be no
harder than NFS to set up and maintain. We were involving too many folks
with PhDs in the operation of our machines at the time. Before you yell
FAIL, I''ll say we did try. I''ll also say we didn''t
make CFS responsible
for this part of the task. Don''t blame them overly much, OK?
7) We recognized we were asking for a stateful system, we wanted to
mitigate that by having some focus on resiliency. These were big
machines and clients died all the time.
8) While not in the SOW, we structured the contract to accomplish some
future form of wide acceptance. We wanted it to be ubiquitous.

That''s a lot of goals! For the technical ones, the main ones are all
pretty much structured to ask two things of what became Lustre. First,
give us everything NFS functionally does but go far beyond it in
performance. Second, give us everything NFS functionally does but make
it completely equivalent to a local file system, semantically.

There''s a little more we have to consider. NFS4 is a different beast
than NFS2 or NFS3. NFS{2,3} had some serious issues that becaome more
prominent as time went by. First, security; It had none. Folks had
bandaged on some different things to try to cure this but they weren''t
standard across platforms. Second, it couldn''t do the full POSIX
required semantics. That was attacked with the NFS lock protocols but it
was such an after-thought it will always remain problematic. Third, new
authorization possibilities introduced by Microsoft and then POSIX,
called ACLs, had no way of being accomplished.

NFS4 addresses those by:

1) Introducing state. Can do full POSIX now without the lock servers.
Lots of resiliency mechanisms introduced to offset the downside of this,
too.
2) Formalizing and offerring standardized authentication headers.
3) Introducing ACLs that map to equivalents in POSIX and Microsoft.

Strengths and Weaknesses of the Two
-----------------------------------

NFS4 does most everything Lustre can with one very important exception,
IO bandwidth.

Both seem able to deliver metadata performance at roughly the same
speeds. File create, delete, and stat rates are about the same. NetApp
seems to have a partial enhancement. They bought the Spinnaker goodies
some time back and have deployed that technology, and redirection
too(?), within their servers. The good about that is two users in
different directories *could* leverage two servers, independently, and,
so, scale metadata performance. It''s not guaranteed but at least there
is the possibility. If the two users are in the same directory, it''s
not
much different, though, I''m thinking. Someone correct me if
I''m wrong?

Both can offer full POSIX now. It''s nasty in both cases but, yes, in
theory you can export mail directory hierarchies with locking.

The NFS client and server are far easier to set up and maintain. The
tools to debug issues are advanced. While the Lustre folks have done
much to improve this area, NFS is just leaps and bounds ahead. It''s
easier to deal with NFS than Lustre. Just far, far easier, still.

NFS is just built in to everything. My TV has it, for hecks sake. Lustre
is, seemingly, always an add-on. It''s also a moving target.
We''re
constantly futzing with it, upgrading, and patching. Lustre might be
compilable most everywhere we care about but building it isn''t trivial.
The supplied modules are great but, still, moving targets in that we
wait for SUN to catch up to the vendor supplied changes that affect
Lustre. Given Lustre''s size and interaction with other components in
the
OS, that happens far more frequently than desired. NFS just plain wins
the ubiquity argument at present.

NFS IO performance does *not* scale. It''s still an in-band protocol.
The
data is carried in the same message as the request and is, practically,
limited in size. Reads are more scalable in writes, a popular
file-segment can be satisfied from the cache on reads but develops
issues at some point. For writes, NFS3 and NFS4 help in that they
directly support write-behind so that a client doesn''t have to wait for
data to go to disk, but it''s just not enough. If one streams data
to/from the store, it can be larger than the cache. A client that might
read a file already made "hot" but at a very different rate just
loses.
A client, writing, is always looking for free memory to buffer content.
Again, too many of these, simultaneously, and performance descends to
the native speed of the attached back-end store and that store can only
get so big.

Lustre IO performance *does* scale. It uses a 3rd-party transfer.
Requests are made to the metadata server and IO moves directly between
the affected storage component(s) and the client. The more storage
components, the less possibility of contention between clients and the
more data can be accepted/supplied per unit time.

NFS4 has a proposed extension, called pNFS, to address this problem. It
just introduces the 3rd-party data transfers that Lustre enjoys. If and
when that is a standard, and is well supported by clients and vendors,
the really big technical difference will virtually disappear. It''s been
a long time coming, though. It''s still not there. Will it ever be,
really?

The answer to the NFS vs. Lustre question comes down to the workload for
a given application then, since they do have overlap in their solution
space. If I were asked to look at a platform and recommend a solution I
would worry about IO bandwidth requirements. If the platform in question
were either read-mostly and, practically, never needed sustained read or
write bandwidth, NFS would be an easy choice. I''d even think hard about
NFS if the platform created many files but all were very small; Today''s
filers have very respectable IOPS rates. If it came down to IO
bandwidth, I''m still on the parallel file system bandwagon. NFS just
can''t deal with that at present and I do still have the folks, in
house,
to manage the administrative burden.

Done. That was useful for me. I think five years ago I might have opted
for Lustre in the "create many small files" case, where I would
consider
NFS today, so re-examining the motivations, relative strengths, and
weaknesses of both was useful. As I said, I did this more as a
self-exercise than anything else but I hope you can find something
useful here, too. The family is back from their errands, too :) Best
wishes and good luck.

		--Lee

On Wed, 2009-08-26 at 04:11 -0600, Tharindu Rukshan Bamunuarachchi
wrote:> hi All,
> 
>  
> 
> I need to prepare small report on ?NFS vs. Lustre? ?
> 
>  
> 
> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) ?
> 
>  
> 
> Can you guys please provide few tips ? URLs ? etc.
> 
>  
> 
>  
> 
>  
> 
>  
> 
> cheers,
> 
> __
> 
> tharindu
> 
>  
> 
> 
>
*******************************************************************************************************************************************************************
> 
> "The information contained in this email including in any attachment
> is confidential and is meant to be read only by the person to whom it
> is addressed. If you are not the intended recipient(s), you are
> prohibited from printing, forwarding, saving or copying this email. If
> you have received this e-mail in error, please immediately notify the
> sender and delete this e-mail and its attachments from your computer."
> 
>
*******************************************************************************************************************************************************************
>

John K. Dawson

2009-Aug-29 18:15 UTC

head link

[Lustre-discuss] NFS vs Lustre

Lee,

Thanks for posting this. I found the background and perspective very  
interesting.

John

John K. Dawson
jkdawson at gmail.com
612-860-2388

On Aug 29, 2009, at 12:56 PM, Lee Ward wrote:
> You seem to be correct. Nobody ever seems to contrast NFS with these
> super file systems solutions. That is interesting.
>
> It''s Saturday, the family is out running around. I have time to
think
> about this question. Unfortunately, for you, I do this more for  
> myself.
> Which means this is going to be a stream-of-consciousness thing far  
> more
> than a well organized discussion. Sorry.
>
> I''d begin by motivating both NFS and Lustre. Why do they exist?
What
> problems do they solve.
>
> NFS first.
>
> Way back in the day, ethernet and the concept of a workstation got
> popular. There were many tools to copy files between machines but few
> ways to share a name space; Have the directory hierarchy and it''s
> content directly accessible to an application on a foreign machine.  
> This
> made file sharing awkward. The model was to copy the file or files to
> the workstation where the work was going to be done, do the work, and
> copy the results back to some, hopefully, well maintained central
> machine.
>
> There *were* solutions to this at the time. I recall an attractive
> alternative called RFS (I believe) from the Bell Labs folks, via some
> place in England if I''m remembering right, it''s been a
looong time
> after
> all. It had issues though. The nastiest issue for me was that if a
> client went down the service side would freeze, at least partially.
> Since this could happen willy-nilly, depending on the users wishes and
> how well the power button on his workstation was protected, together
> with the power cord and ethernet connection, this freezing of service
> for any amount of time was difficult to accept. This was so even in a
> rather small collection of machines.
>
> The problem with RFS (?) and it''s cousins were that they were all
> stateful. The service side depended on state that was held at the
> client. If the client went down, the service side couldn''t
continue
> without a whole lot of recovery, timeouts, etc. It was a very  
> *annoying*
> problem.
>
> In the latter half of the 1980''s (am I remembering right?) SUN  
> proposed
> an open protocol called NFS. An implementation using this protocol  
> could
> do most everything RFS(?) could but it didn''t suffer the
service-side
> hangs. It couldn''t. It was stateless. If the client went down, the
> server just didn''t care. If the server went down, the client had
the
> opportunity to either give up on the local operation, usually with an
> error returned, or wait. It was always up to the user and for client
> failures the annoyance was limited to the user(s) on that client.
>
> SUN, also, wisely desired the protocol to be ubiquitous. They  
> published
> it. They wanted *everyone* to adopt it. More, they would help
> competitors. SUN held interoperability bake-a-thons to help with this.
>
> It looks like they succeeded, all around :)
>
> Let''s sum up, then. The goals for NFS were:
>
> 1) Share a local file system name space across the network.
> 2) Do it in a robust, resilient way. Pesky FS issues because some user
> kicked the cord out of his workstation was unacceptable.
> 3) Make it ubiquitous. SUN was a workstation vendor. They sold servers
> but almost everyone had a VAX in their back pocket where they made the
> infrastructure investment. SUN needed the high-value machines to  
> support
> this protocol.
>
> Now Lustre.
>
> Lustre has a weird story and I''m not going to go into all of it.
The
> shortest, relevant, part is that while there was at least one solution
> that DOE/NNSA felt acceptable, GPFS, it was not available on anything
> other than an IBM platform and because DOE/NNSA had a semi-formal  
> policy
> of buying from different vendors at each of the three labs we were  
> kind
> of stuck. Other file systems, existing and imminent, at the time were
> examined but they were all distributed file systems and we needed IO
> *bandwidth*. We needed lots, and lots of bandwidth.
>
> We also needed that ubiquitous thing that SUN had as one of their  
> goals.
> We didn''t want to pay millions of dollars for another GPFS. We
felt
> that
> would only be painting ourselves into a corner. Whatever we did, the
> result *had* to be open. It also had to be attractive to smaller sites
> as we wanted to turn loose of the ting at some point. If it was
> attractive for smaller machines we felt we would win in the long term
> as, eventually, the cost to further and maintain this thing was spread
> across the community.
>
> As far as technical goals, I guess we just wanted GPFS, but open. More
> though, we wanted it to survive in our platform roadmaps for at  
> least a
> decade. The actual technical requirements for the contract that DOE/ 
> NNSA
> executed with HP, CFS was the sub-contractor responsible for
> development, can be found here:
>
> <http://www-cs-students.stanford.edu/~trj/SGS_PathForward_SOW.pdf>
>
> LLNL used to host this but it''s no longer there? Oh well,
hopefully
> this
> link will be good for a while, at least.
>
> I''m just going to jump to the end and sum the goals up:
>
> 1) It must do *everything* NFS can. We relaxed the stateless thing
> though, see the next item for why.
> 2) It must support full POSIX semantics; Last writer wins, POSIX  
> locks,
> etc.
> 3) It must support all of the transports we are interested in.
> 4) It must be scalable, in that we can cheaply attach storage and both
> performance (reading *and* writing) and capacity within a single  
> mounted
> file system increase in direct proportion.
> 6) We wanted it to be easy, administratively. Our goal was that it  
> be no
> harder than NFS to set up and maintain. We were involving too many  
> folks
> with PhDs in the operation of our machines at the time. Before you  
> yell
> FAIL, I''ll say we did try. I''ll also say we
didn''t make CFS
> responsible
> for this part of the task. Don''t blame them overly much, OK?
> 7) We recognized we were asking for a stateful system, we wanted to
> mitigate that by having some focus on resiliency. These were big
> machines and clients died all the time.
> 8) While not in the SOW, we structured the contract to accomplish some
> future form of wide acceptance. We wanted it to be ubiquitous.
>
> That''s a lot of goals! For the technical ones, the main ones are
all
> pretty much structured to ask two things of what became Lustre. First,
> give us everything NFS functionally does but go far beyond it in
> performance. Second, give us everything NFS functionally does but make
> it completely equivalent to a local file system, semantically.
>
> There''s a little more we have to consider. NFS4 is a different
beast
> than NFS2 or NFS3. NFS{2,3} had some serious issues that becaome more
> prominent as time went by. First, security; It had none. Folks had
> bandaged on some different things to try to cure this but they
weren''t
> standard across platforms. Second, it couldn''t do the full POSIX
> required semantics. That was attacked with the NFS lock protocols  
> but it
> was such an after-thought it will always remain problematic. Third,  
> new
> authorization possibilities introduced by Microsoft and then POSIX,
> called ACLs, had no way of being accomplished.
>
> NFS4 addresses those by:
>
> 1) Introducing state. Can do full POSIX now without the lock servers.
> Lots of resiliency mechanisms introduced to offset the downside of  
> this,
> too.
> 2) Formalizing and offerring standardized authentication headers.
> 3) Introducing ACLs that map to equivalents in POSIX and Microsoft.
>
> Strengths and Weaknesses of the Two
> -----------------------------------
>
> NFS4 does most everything Lustre can with one very important  
> exception,
> IO bandwidth.
>
> Both seem able to deliver metadata performance at roughly the same
> speeds. File create, delete, and stat rates are about the same. NetApp
> seems to have a partial enhancement. They bought the Spinnaker goodies
> some time back and have deployed that technology, and redirection
> too(?), within their servers. The good about that is two users in
> different directories *could* leverage two servers, independently,  
> and,
> so, scale metadata performance. It''s not guaranteed but at least
there
> is the possibility. If the two users are in the same directory,
it''s
> not
> much different, though, I''m thinking. Someone correct me if
I''m wrong?
>
> Both can offer full POSIX now. It''s nasty in both cases but, yes,
in
> theory you can export mail directory hierarchies with locking.
>
> The NFS client and server are far easier to set up and maintain. The
> tools to debug issues are advanced. While the Lustre folks have done
> much to improve this area, NFS is just leaps and bounds ahead.
It''s
> easier to deal with NFS than Lustre. Just far, far easier, still.
>
> NFS is just built in to everything. My TV has it, for hecks sake.  
> Lustre
> is, seemingly, always an add-on. It''s also a moving target.
We''re
> constantly futzing with it, upgrading, and patching. Lustre might be
> compilable most everywhere we care about but building it isn''t  
> trivial.
> The supplied modules are great but, still, moving targets in that we
> wait for SUN to catch up to the vendor supplied changes that affect
> Lustre. Given Lustre''s size and interaction with other components
in
> the
> OS, that happens far more frequently than desired. NFS just plain wins
> the ubiquity argument at present.
>
> NFS IO performance does *not* scale. It''s still an in-band
protocol.
> The
> data is carried in the same message as the request and is,  
> practically,
> limited in size. Reads are more scalable in writes, a popular
> file-segment can be satisfied from the cache on reads but develops
> issues at some point. For writes, NFS3 and NFS4 help in that they
> directly support write-behind so that a client doesn''t have to
wait
> for
> data to go to disk, but it''s just not enough. If one streams data
> to/from the store, it can be larger than the cache. A client that  
> might
> read a file already made "hot" but at a very different rate just
> loses.
> A client, writing, is always looking for free memory to buffer  
> content.
> Again, too many of these, simultaneously, and performance descends to
> the native speed of the attached back-end store and that store can  
> only
> get so big.
>
> Lustre IO performance *does* scale. It uses a 3rd-party transfer.
> Requests are made to the metadata server and IO moves directly between
> the affected storage component(s) and the client. The more storage
> components, the less possibility of contention between clients and the
> more data can be accepted/supplied per unit time.
>
> NFS4 has a proposed extension, called pNFS, to address this problem.  
> It
> just introduces the 3rd-party data transfers that Lustre enjoys. If  
> and
> when that is a standard, and is well supported by clients and vendors,
> the really big technical difference will virtually disappear. It''s
> been
> a long time coming, though. It''s still not there. Will it ever be,
> really?
>
> The answer to the NFS vs. Lustre question comes down to the workload  
> for
> a given application then, since they do have overlap in their solution
> space. If I were asked to look at a platform and recommend a  
> solution I
> would worry about IO bandwidth requirements. If the platform in  
> question
> were either read-mostly and, practically, never needed sustained  
> read or
> write bandwidth, NFS would be an easy choice. I''d even think hard
> about
> NFS if the platform created many files but all were very small;  
> Today''s
> filers have very respectable IOPS rates. If it came down to IO
> bandwidth, I''m still on the parallel file system bandwagon. NFS
just
> can''t deal with that at present and I do still have the folks, in
> house,
> to manage the administrative burden.
>
> Done. That was useful for me. I think five years ago I might have  
> opted
> for Lustre in the "create many small files" case, where I would  
> consider
> NFS today, so re-examining the motivations, relative strengths, and
> weaknesses of both was useful. As I said, I did this more as a
> self-exercise than anything else but I hope you can find something
> useful here, too. The family is back from their errands, too :) Best
> wishes and good luck.
>
> 		--Lee
>
>
> On Wed, 2009-08-26 at 04:11 -0600, Tharindu Rukshan Bamunuarachchi
> wrote:
>> hi All,
>>
>>
>>
>> I need to prepare small report on ?NFS vs. Lustre? ?
>>
>>
>>
>> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) ?
>>
>>
>>
>> Can you guys please provide few tips ? URLs ? etc.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> cheers,
>>
>> __
>>
>> tharindu
>>
>>
>>
>>
>>
*******************************************************************************************************************************************************************
>>
>> "The information contained in this email including in any
attachment
>> is confidential and is meant to be read only by the person to whom it
>> is addressed. If you are not the intended recipient(s), you are
>> prohibited from printing, forwarding, saving or copying this email.  
>> If
>> you have received this e-mail in error, please immediately notify the
>> sender and delete this e-mail and its attachments from your  
>> computer."
>>
>>
*******************************************************************************************************************************************************************
>>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2419 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090829/5698c6fb/attachment.bin

Mag Gam

2009-Aug-30 14:12 UTC

head link

[Lustre-discuss] NFS vs Lustre

Well said.

This should be on the Wiki :-)


On Sat, Aug 29, 2009 at 2:15 PM, John K. Dawson<jkdawson at gmail.com>
wrote:> Lee,
>
> Thanks for posting this. I found the background and perspective very
> interesting.
>
> John
>
> John K. Dawson
> jkdawson at gmail.com
> 612-860-2388
>
> On Aug 29, 2009, at 12:56 PM, Lee Ward wrote:
>
>> You seem to be correct. Nobody ever seems to contrast NFS with these
>> super file systems solutions. That is interesting.
>>
>> It''s Saturday, the family is out running around. I have time
to think
>> about this question. Unfortunately, for you, I do this more for myself.
>> Which means this is going to be a stream-of-consciousness thing far
more
>> than a well organized discussion. Sorry.
>>
>> I''d begin by motivating both NFS and Lustre. Why do they
exist? What
>> problems do they solve.
>>
>> NFS first.
>>
>> Way back in the day, ethernet and the concept of a workstation got
>> popular. There were many tools to copy files between machines but few
>> ways to share a name space; Have the directory hierarchy and
it''s
>> content directly accessible to an application on a foreign machine.
This
>> made file sharing awkward. The model was to copy the file or files to
>> the workstation where the work was going to be done, do the work, and
>> copy the results back to some, hopefully, well maintained central
>> machine.
>>
>> There *were* solutions to this at the time. I recall an attractive
>> alternative called RFS (I believe) from the Bell Labs folks, via some
>> place in England if I''m remembering right, it''s been
a looong time after
>> all. It had issues though. The nastiest issue for me was that if a
>> client went down the service side would freeze, at least partially.
>> Since this could happen willy-nilly, depending on the users wishes and
>> how well the power button on his workstation was protected, together
>> with the power cord and ethernet connection, this freezing of service
>> for any amount of time was difficult to accept. This was so even in a
>> rather small collection of machines.
>>
>> The problem with RFS (?) and it''s cousins were that they were
all
>> stateful. The service side depended on state that was held at the
>> client. If the client went down, the service side couldn''t
continue
>> without a whole lot of recovery, timeouts, etc. It was a very
*annoying*
>> problem.
>>
>> In the latter half of the 1980''s (am I remembering right?) SUN
proposed
>> an open protocol called NFS. An implementation using this protocol
could
>> do most everything RFS(?) could but it didn''t suffer the
service-side
>> hangs. It couldn''t. It was stateless. If the client went down,
the
>> server just didn''t care. If the server went down, the client
had the
>> opportunity to either give up on the local operation, usually with an
>> error returned, or wait. It was always up to the user and for client
>> failures the annoyance was limited to the user(s) on that client.
>>
>> SUN, also, wisely desired the protocol to be ubiquitous. They published
>> it. They wanted *everyone* to adopt it. More, they would help
>> competitors. SUN held interoperability bake-a-thons to help with this.
>>
>> It looks like they succeeded, all around :)
>>
>> Let''s sum up, then. The goals for NFS were:
>>
>> 1) Share a local file system name space across the network.
>> 2) Do it in a robust, resilient way. Pesky FS issues because some user
>> kicked the cord out of his workstation was unacceptable.
>> 3) Make it ubiquitous. SUN was a workstation vendor. They sold servers
>> but almost everyone had a VAX in their back pocket where they made the
>> infrastructure investment. SUN needed the high-value machines to
support
>> this protocol.
>>
>> Now Lustre.
>>
>> Lustre has a weird story and I''m not going to go into all of
it. The
>> shortest, relevant, part is that while there was at least one solution
>> that DOE/NNSA felt acceptable, GPFS, it was not available on anything
>> other than an IBM platform and because DOE/NNSA had a semi-formal
policy
>> of buying from different vendors at each of the three labs we were kind
>> of stuck. Other file systems, existing and imminent, at the time were
>> examined but they were all distributed file systems and we needed IO
>> *bandwidth*. We needed lots, and lots of bandwidth.
>>
>> We also needed that ubiquitous thing that SUN had as one of their
goals.
>> We didn''t want to pay millions of dollars for another GPFS. We
felt that
>> would only be painting ourselves into a corner. Whatever we did, the
>> result *had* to be open. It also had to be attractive to smaller sites
>> as we wanted to turn loose of the ting at some point. If it was
>> attractive for smaller machines we felt we would win in the long term
>> as, eventually, the cost to further and maintain this thing was spread
>> across the community.
>>
>> As far as technical goals, I guess we just wanted GPFS, but open. More
>> though, we wanted it to survive in our platform roadmaps for at least a
>> decade. The actual technical requirements for the contract that
DOE/NNSA
>> executed with HP, CFS was the sub-contractor responsible for
>> development, can be found here:
>>
>>
<http://www-cs-students.stanford.edu/~trj/SGS_PathForward_SOW.pdf>
>>
>> LLNL used to host this but it''s no longer there? Oh well,
hopefully this
>> link will be good for a while, at least.
>>
>> I''m just going to jump to the end and sum the goals up:
>>
>> 1) It must do *everything* NFS can. We relaxed the stateless thing
>> though, see the next item for why.
>> 2) It must support full POSIX semantics; Last writer wins, POSIX locks,
>> etc.
>> 3) It must support all of the transports we are interested in.
>> 4) It must be scalable, in that we can cheaply attach storage and both
>> performance (reading *and* writing) and capacity within a single
mounted
>> file system increase in direct proportion.
>> 6) We wanted it to be easy, administratively. Our goal was that it be
no
>> harder than NFS to set up and maintain. We were involving too many
folks
>> with PhDs in the operation of our machines at the time. Before you yell
>> FAIL, I''ll say we did try. I''ll also say we
didn''t make CFS responsible
>> for this part of the task. Don''t blame them overly much, OK?
>> 7) We recognized we were asking for a stateful system, we wanted to
>> mitigate that by having some focus on resiliency. These were big
>> machines and clients died all the time.
>> 8) While not in the SOW, we structured the contract to accomplish some
>> future form of wide acceptance. We wanted it to be ubiquitous.
>>
>> That''s a lot of goals! For the technical ones, the main ones
are all
>> pretty much structured to ask two things of what became Lustre. First,
>> give us everything NFS functionally does but go far beyond it in
>> performance. Second, give us everything NFS functionally does but make
>> it completely equivalent to a local file system, semantically.
>>
>> There''s a little more we have to consider. NFS4 is a different
beast
>> than NFS2 or NFS3. NFS{2,3} had some serious issues that becaome more
>> prominent as time went by. First, security; It had none. Folks had
>> bandaged on some different things to try to cure this but they
weren''t
>> standard across platforms. Second, it couldn''t do the full
POSIX
>> required semantics. That was attacked with the NFS lock protocols but
it
>> was such an after-thought it will always remain problematic. Third, new
>> authorization possibilities introduced by Microsoft and then POSIX,
>> called ACLs, had no way of being accomplished.
>>
>> NFS4 addresses those by:
>>
>> 1) Introducing state. Can do full POSIX now without the lock servers.
>> Lots of resiliency mechanisms introduced to offset the downside of
this,
>> too.
>> 2) Formalizing and offerring standardized authentication headers.
>> 3) Introducing ACLs that map to equivalents in POSIX and Microsoft.
>>
>> Strengths and Weaknesses of the Two
>> -----------------------------------
>>
>> NFS4 does most everything Lustre can with one very important exception,
>> IO bandwidth.
>>
>> Both seem able to deliver metadata performance at roughly the same
>> speeds. File create, delete, and stat rates are about the same. NetApp
>> seems to have a partial enhancement. They bought the Spinnaker goodies
>> some time back and have deployed that technology, and redirection
>> too(?), within their servers. The good about that is two users in
>> different directories *could* leverage two servers, independently, and,
>> so, scale metadata performance. It''s not guaranteed but at
least there
>> is the possibility. If the two users are in the same directory,
it''s not
>> much different, though, I''m thinking. Someone correct me if
I''m wrong?
>>
>> Both can offer full POSIX now. It''s nasty in both cases but,
yes, in
>> theory you can export mail directory hierarchies with locking.
>>
>> The NFS client and server are far easier to set up and maintain. The
>> tools to debug issues are advanced. While the Lustre folks have done
>> much to improve this area, NFS is just leaps and bounds ahead.
It''s
>> easier to deal with NFS than Lustre. Just far, far easier, still.
>>
>> NFS is just built in to everything. My TV has it, for hecks sake.
Lustre
>> is, seemingly, always an add-on. It''s also a moving target.
We''re
>> constantly futzing with it, upgrading, and patching. Lustre might be
>> compilable most everywhere we care about but building it isn''t
trivial.
>> The supplied modules are great but, still, moving targets in that we
>> wait for SUN to catch up to the vendor supplied changes that affect
>> Lustre. Given Lustre''s size and interaction with other
components in the
>> OS, that happens far more frequently than desired. NFS just plain wins
>> the ubiquity argument at present.
>>
>> NFS IO performance does *not* scale. It''s still an in-band
protocol. The
>> data is carried in the same message as the request and is, practically,
>> limited in size. Reads are more scalable in writes, a popular
>> file-segment can be satisfied from the cache on reads but develops
>> issues at some point. For writes, NFS3 and NFS4 help in that they
>> directly support write-behind so that a client doesn''t have to
wait for
>> data to go to disk, but it''s just not enough. If one streams
data
>> to/from the store, it can be larger than the cache. A client that might
>> read a file already made "hot" but at a very different rate
just loses.
>> A client, writing, is always looking for free memory to buffer content.
>> Again, too many of these, simultaneously, and performance descends to
>> the native speed of the attached back-end store and that store can only
>> get so big.
>>
>> Lustre IO performance *does* scale. It uses a 3rd-party transfer.
>> Requests are made to the metadata server and IO moves directly between
>> the affected storage component(s) and the client. The more storage
>> components, the less possibility of contention between clients and the
>> more data can be accepted/supplied per unit time.
>>
>> NFS4 has a proposed extension, called pNFS, to address this problem. It
>> just introduces the 3rd-party data transfers that Lustre enjoys. If and
>> when that is a standard, and is well supported by clients and vendors,
>> the really big technical difference will virtually disappear.
It''s been
>> a long time coming, though. It''s still not there. Will it ever
be,
>> really?
>>
>> The answer to the NFS vs. Lustre question comes down to the workload
for
>> a given application then, since they do have overlap in their solution
>> space. If I were asked to look at a platform and recommend a solution I
>> would worry about IO bandwidth requirements. If the platform in
question
>> were either read-mostly and, practically, never needed sustained read
or
>> write bandwidth, NFS would be an easy choice. I''d even think
hard about
>> NFS if the platform created many files but all were very small;
Today''s
>> filers have very respectable IOPS rates. If it came down to IO
>> bandwidth, I''m still on the parallel file system bandwagon.
NFS just
>> can''t deal with that at present and I do still have the folks,
in house,
>> to manage the administrative burden.
>>
>> Done. That was useful for me. I think five years ago I might have opted
>> for Lustre in the "create many small files" case, where I
would consider
>> NFS today, so re-examining the motivations, relative strengths, and
>> weaknesses of both was useful. As I said, I did this more as a
>> self-exercise than anything else but I hope you can find something
>> useful here, too. The family is back from their errands, too :) Best
>> wishes and good luck.
>>
>> ? ? ? ? ? ? ? ?--Lee
>>
>>
>> On Wed, 2009-08-26 at 04:11 -0600, Tharindu Rukshan Bamunuarachchi
>> wrote:
>>>
>>> hi All,
>>>
>>>
>>>
>>> I need to prepare small report on ?NFS vs. Lustre? ?
>>>
>>>
>>>
>>> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) ?
>>>
>>>
>>>
>>> Can you guys please provide few tips ? URLs ? etc.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> cheers,
>>>
>>> __
>>>
>>> tharindu
>>>
>>>
>>>
>>>
>>>
>>>
*******************************************************************************************************************************************************************
>>>
>>> "The information contained in this email including in any
attachment
>>> is confidential and is meant to be read only by the person to whom
it
>>> is addressed. If you are not the intended recipient(s), you are
>>> prohibited from printing, forwarding, saving or copying this email.
If
>>> you have received this e-mail in error, please immediately notify
the
>>> sender and delete this e-mail and its attachments from your
computer."
>>>
>>>
>>>
*******************************************************************************************************************************************************************
>>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Daniel Kobras

2009-Aug-30 20:51 UTC

head link

[Lustre-discuss] NFS vs Lustre

Hi!

On Sat, Aug 29, 2009 at 11:56:40AM -0600, Lee Ward
wrote:> NFS4 addresses those by:
> 
> 1) Introducing state. Can do full POSIX now without the lock servers.
> Lots of resiliency mechanisms introduced to offset the downside of this,
> too.
NFS4 implementations are able to handle Posix advisory locks, but unlike
Lustre, they don''t support full Posix filesystem semantics. For
example, NFS4
still follows the traditional NFS close-to-open cache consistency model whereas
with Lustre, individual write()s are atomic and become immediately visible to
all clients.

Regards,

Daniel.

Nicolas Williams

2009-Aug-30 21:12 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Sun, Aug 30, 2009 at 10:51:41PM +0200, Daniel Kobras
wrote:> On Sat, Aug 29, 2009 at 11:56:40AM -0600, Lee Ward wrote:
> > NFS4 addresses those by:
> > 
> > 1) Introducing state. Can do full POSIX now without the lock servers.
> > Lots of resiliency mechanisms introduced to offset the downside of
this,
> > too.
> 
> NFS4 implementations are able to handle Posix advisory locks, but unlike
> Lustre, they don''t support full Posix filesystem semantics. For
example, NFS4
> still follows the traditional NFS close-to-open cache consistency model
whereas
> with Lustre, individual write()s are atomic and become immediately visible
to
> all clients.
NFSv4 can''t handle O_APPEND, and has those close-to-open semantics.
Those are the two large departures from POSIX in NFSv4.

NFSv4.1 also adds metadata/data separation and data distribution, much
like Lustre, but with the same POSIX semantics departures mentioned
above.  Also, NFSv4.1''s "pNFS" concept doesn''t have
room for
"capabilities" (in the distributed filesystem sense, not in the Linux
capabilities sense), which means that OSSes and MDSes have to
communicate to get permissions to be enforced.  There are also
differences with respect to recovery, etcetera.

One thing about NFS is that it''s meant to be neutral w.r.t. the type of
filesystem it shares.  So NFSv4, for example, has features for dealing
with filesystems that don''t have a notion of persistent inode number.
Whereas Lustre has its own on-disk format and therefore can''t be used
to
share just any type of filesystem.

Nico
--

Brian J. Murrell

2009-Aug-31 00:16 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Sun, 2009-08-30 at 16:12 -0500, Nicolas Williams
wrote:> 
> One thing about NFS is that it''s meant to be neutral w.r.t. the
type of
> filesystem it shares.  So NFSv4, for example, has features for dealing
> with filesystems that don''t have a notion of persistent inode
number.
> Whereas Lustre has its own on-disk format and therefore can''t be
used to
> share just any type of filesystem.
You have "stumbled on to" an interesting, significant difference
between
NFS and Lustre.  NFS is a protocol for sharing an existing filesystem.
Lustre is a filesystem -- so much so in fact, that NFS can even share it
out.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090830/813388b3/attachment.bin

Nicolas Williams

2009-Aug-31 05:47 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Sun, Aug 30, 2009 at 08:16:52PM -0400, Brian J. Murrell
wrote:> On Sun, 2009-08-30 at 16:12 -0500, Nicolas Williams wrote:
> > One thing about NFS is that it''s meant to be neutral w.r.t.
the type of
> > filesystem it shares.  So NFSv4, for example, has features for dealing
> > with filesystems that don''t have a notion of persistent inode
number.
> > Whereas Lustre has its own on-disk format and therefore can''t
be used to
> > share just any type of filesystem.
> 
> You have "stumbled on to" an interesting, significant difference
between
> NFS and Lustre.  NFS is a protocol for sharing an existing filesystem.
> Lustre is a filesystem -- so much so in fact, that NFS can even share it
> out.
Indeed.  pNFS is not really a protocol for sharing generic, pre-existing
filesystems anymore either.  The moment you want to distribute the
filesystem itself you can no longer just substitute any filesystem into
an implementation of the protocol.

(Yes, I understand that when Lustre was layered above the VFS one could
conceivably have changed the underlying fs, though that didn''t work
out,
if for practical reasons.  But even then, one couldn''t have used the
underlying fs directly, not in a meaningful way.)

Nico
--

Peter Grandi

2009-Aug-31 16:09 UTC

head link

[Lustre-discuss] NFS vs Lustre

Interesting discussion of NFS vs. Lustre even if they are so
different in aims...

[ ... ]

lee> 3) It must support all of the transports we are interested in.

Except for some corner cases (that an HEP site might well have)
that today tends to reduce to the classic Ethernet/IP pair...

lee> 4) It must be scalable, in that we can cheaply attach
lee>    storage and both performance (reading *and* writing) and
lee>    capacity within a single mounted file system increase in
lee>    direct proportion.

I suspect that scalability is more of a dream, as to me it
involves more requirements including scalable backup (not so
easy) and scalable ''fsck'' (not so easy).

These are easier with Lustre because it does not provide "a
single mounted file system" but a single mounted *namespace*
which is a very different thing, even if largely equivalent for
most users.

[ ... ]

lee> NFS4 does most everything Lustre can with one very
lee> important exception, IO bandwidth. [ ... ] Lustre IO
lee> performance *does* scale. It uses a 3rd-party transfer.

That can summarized by saying that Lustre is a parallel
distributed metafilesystem, while NFS is a protocol used to
access what usually is something not distributed and an actual
filesystem. The limitations of the NFS protocol can be overcome,
and as you say, pNFS turns it into a parallel distributed
metafilesystem too:

lee> NFS4 has a proposed extension, called pNFS, to address this
lee> problem. It just introduces the 3rd-party data transfers
lee> that Lustre enjoys. If and when that is a standard, and is
lee> well supported by clients and vendors, the really big
lee> technical difference will virtually disappear. It''s been a
lee> long time coming, though. It''s still not there. Will it
lee> ever be, really?

My impression is that it is a lot more real than it was only a
couple years ago, and here is an amusing mashup:

  http://FT.ORNL.gov/pubs-archive/ipdps2009-wyu-final.pdf

   ?Parallel NFS (pNFS) is an emergent open standard for
    parallelizing data transfer over a variety of I/O
    protocols. Prototypes of pNFS are actively being developed
    by industry and academia to examine its viability and
    possible enhancements. In this paper, we present the design,
    implementation, and evaluation of lpNFS, a Lustre-based
    parallel NFS. [ ... ] Our initial performance evaluation
    shows that the performance of lpNFS is comparable to that of
    original Lustre.?

lee> Done. That was useful for me. I think five years ago I
lee> might have opted for Lustre in the "create many small
lee> files" case, where I would consider NFS today,

Looks optimistic to me -- I don''t see any good solution to the
"create many small files case, at least as to shared storage.

For smaller situations I am looking out of interest to some
other distributed filesystems, which are a bit more researchy,
but seem fairly reliable already.

Daniel Kobras

2009-Aug-31 19:56 UTC

head link

[Lustre-discuss] NFS vs Lustre

Hi!

On Sun, Aug 30, 2009 at 04:12:11PM -0500, Nicolas Williams
wrote:> NFSv4 can''t handle O_APPEND, and has those close-to-open
semantics.
> Those are the two large departures from POSIX in NFSv4.
Along these lines, it''s probably worth mentioning commit-on-close as
well, an
area where NFS (v3 and v4, optionally relaxed when using write delegations) is
more strict than Posix. This is to make sure that NFS still has the possibility
to notify the user about errors when trying to save their data.
Lustre''s
standard config follows Posix and allows dirty client-side caches after
close(). Performance improves as a result, of course, but in case something
goes wrong on the net or the server, users potentially lose data just like on
any local Posix filesystem. The difference being that users tend to notice when
their local machine crashes. It''s much easier to miss a remote server
or a
switch going down, and hence suffer from silent data loss. (Admins will
typically notice, eg. via eviction messages in the logs, but have a hard time
telling whicht files had been affected.) The solution is to fsync() all
valuable data on a Posix filesystem, but that''s not necessarily within
reach
for an average end user.

Regards,

Daniel.

Brian J. Murrell

2009-Aug-31 20:34 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Mon, 2009-08-31 at 21:56 +0200, Daniel Kobras wrote:> Hi!
Hi,
> Lustre''s
> standard config follows Posix and allows dirty client-side caches after
> close(). Performance improves as a result, of course, but in case something
> goes wrong on the net or the server, users potentially lose data just like
on
> any local Posix filesystem.
I don''t think this is true.  This is something that I am only
peripherally knowledgeable about and I am sure somebody like Andreas or
Johann can correct me if/where I go wrong...

You are right that there is an opportunity for a client to write to an
OST and get it''s write(2) call returned before data goes to physical
disk.  But Lustre clients know that, and therefore they keep the state
needed to replay that write(2) to the server until the server sends back
a commit callback.  The commit callback is what tells the client that
the data actually went to physical media and that it can now purge any
state required to replay that transaction.

Until that write callback is received, the client holds on to whatever
state it would need to do that write(2) all over again, for exactly the
case you cite which is the server goes down before the data goes to
physical media.

It is this data that the client is caching until the commit callback is
received that is used by the recovery mechanisms that start when a
target comes back on-line.

Hope that clarifies things, and further, I hope my understanding is
correct as is my explanation.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090831/82d86932/attachment.bin

Paul Nowoczynski

2009-Aug-31 20:50 UTC

head link

[Lustre-discuss] NFS vs Lustre

Brian J. Murrell wrote:>> Lustre''s
>> standard config follows Posix and allows dirty client-side caches after
>> close(). Performance improves as a result, of course, but in case
something
>> goes wrong on the net or the server, users potentially lose data just
like on
>> any local Posix filesystem.
>>     
>
>   Yes this is the case on server failure but I think the true similarity 
between lustre and a locally mounted filesystem lies in the failure of a 
client holding dirty pages.  Please correct me if I''m wrong but data 
loss will occur should the client fail after close() but prior to the 
set of dirty pages being committed on the OST.
paul> I don''t think this is true.  This is something that I am only
> peripherally knowledgeable about and I am sure somebody like Andreas or
> Johann can correct me if/where I go wrong...
>
> You are right that there is an opportunity for a client to write to an
> OST and get it''s write(2) call returned before data goes to
physical
> disk.  But Lustre clients know that, and therefore they keep the state
> needed to replay that write(2) to the server until the server sends back
> a commit callback.  The commit callback is what tells the client that
> the data actually went to physical media and that it can now purge any
> state required to replay that transaction.
>
> Until that write callback is received, the client holds on to whatever
> state it would need to do that write(2) all over again, for exactly the
> case you cite which is the server goes down before the data goes to
> physical media.
>
> It is this data that the client is caching until the commit callback is
> received that is used by the recovery mechanisms that start when a
> target comes back on-line.
>
> Hope that clarifies things, and further, I hope my understanding is
> correct as is my explanation.
>
> b.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Kevin Van Maren

2009-Aug-31 20:59 UTC

head link

[Lustre-discuss] NFS vs Lustre

Yes, the semantics are similar to a local filesystem: data can be lost 
after close() under several cases, including:

1) the client crashes before the server has written the data to disk 
(data that made it to the server should be written, but that is asynch),

2) the server returns an error to the client (EIO, ie due to errors on 
the OST),

3) the client is evicted by the server (ie, due to communication issues) 
before writing data to disk, or

4) server reboots and recovery fails (ie, in 1.6.x a _different_ client 
does not reconnect to replay transactions).  With version-based recovery 
in 1.8, clients might be able to still replay some transactions even if 
another client crashed/rebooted while the server was down.

fsync() is the best way to ensure data is on disk, for both Lustre and a 
local filesystem.

Kevin


Brian J. Murrell wrote:> On Mon, 2009-08-31 at 21:56 +0200, Daniel Kobras wrote:
>   
>> Hi!
>>     
>
> Hi,
>
>   
>> Lustre''s
>> standard config follows Posix and allows dirty client-side caches after
>> close(). Performance improves as a result, of course, but in case
something
>> goes wrong on the net or the server, users potentially lose data just
like on
>> any local Posix filesystem.
>>     
>
> I don''t think this is true.  This is something that I am only
> peripherally knowledgeable about and I am sure somebody like Andreas or
> Johann can correct me if/where I go wrong...
>
> You are right that there is an opportunity for a client to write to an
> OST and get it''s write(2) call returned before data goes to
physical
> disk.  But Lustre clients know that, and therefore they keep the state
> needed to replay that write(2) to the server until the server sends back
> a commit callback.  The commit callback is what tells the client that
> the data actually went to physical media and that it can now purge any
> state required to replay that transaction.
>
> Until that write callback is received, the client holds on to whatever
> state it would need to do that write(2) all over again, for exactly the
> case you cite which is the server goes down before the data goes to
> physical media.
>
> It is this data that the client is caching until the commit callback is
> received that is used by the recovery mechanisms that start when a
> target comes back on-line.
>
> Hope that clarifies things, and further, I hope my understanding is
> correct as is my explanation.
>
> b.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Nicolas Williams

2009-Aug-31 21:03 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Mon, Aug 31, 2009 at 04:50:02PM -0400, Paul Nowoczynski
wrote:> Yes this is the case on server failure but I think the true similarity 
> between lustre and a locally mounted filesystem lies in the failure of a 
> client holding dirty pages.  Please correct me if I''m wrong but
data
> loss will occur should the client fail after close() but prior to the 
> set of dirty pages being committed on the OST.
The client will have DLM locks outstanding if it has dirty data, so that
the client''s death can be used to detect that its open, dirty files are
now potentially corrupted.

Client death with dirty data is not all that different from process
death with dirty data in user-land.  Think of an application that does
write(2), write(2), close(2), _exit(2), but dies between writes.
Compare that to a client that dies after flushing the first of those
writes but before flushing the second, though after the application
calls close(2).  Nothing special is usually done in the first case, even
though if the process did have byte range locks outstanding, then the OS
could flag the affected file as potentially corrupted.

I don''t think Lustre does actually do anything to mark files as
corrupted that Lustre could detect as potentially corrupted.  Some
applications can recover automatically -- think of databases, such as
SQLite3, or think of plain log files.  Other applications might well be
affected.  Since corruption detection in this case is heuristic, and
since the impact will vary by application, I don''t think
there''s an easy
answer as to what Lustre ought to do about it.  Ideally we could track
the "potentially corrupt" status as an advisory meta-data item that
could be fetched with a stat(2)-like system call, and have applications
reset it when they recover.

Nico
--

Kevin Van Maren

2009-Aug-31 21:50 UTC

head link

[Lustre-discuss] NFS vs Lustre

Nicolas Williams wrote:> Client death with dirty data is not all that different from process
> death with dirty data in user-land.  Think of an application that does
> write(2), write(2), close(2), _exit(2), but dies between writes.
>   
It is very different; with a user application crash, all the writes to 
that point will be completed by the kernel.

With the node crash, there are no guarantees about what made it to disk 
since the last fsync() -- a later write may be partly flushed before an 
earlier write, so the node could crash after the second write made it to 
disk, but before the first one does.  But this is also true for a local 
filesystem.

Kevin

Nicolas Williams

2009-Aug-31 21:53 UTC

head link

[Lustre-discuss] NFS vs Lustre

On Mon, Aug 31, 2009 at 03:50:33PM -0600, Kevin Van Maren
wrote:> Nicolas Williams wrote:
> >Client death with dirty data is not all that different from process
> >death with dirty data in user-land.  Think of an application that does
> >write(2), write(2), close(2), _exit(2), but dies between writes.
> 
> It is very different; with a user application crash, all the writes to 
> that point will be completed by the kernel.
But not writes that the application hasn''t done yet but which it was
working on putting together at the time that it died.  If those never-
done writes are related to writes that did get made, then you may have a
problem.

For example, consider an RDBMS.  Say you begin a transaction, do some
INSERTs/UPDATEs/DELETEs, then COMMIT.  This will almost certainly
require multiple write(2)s (even for a DB that uses COW principles).
And suppose that somewhere in the middle the process doing the writes
dies.  There should be some undo/redo log somewhere, and on restart the
RDBMS must recover from a partially unfinished transaction.

Daniel Kobras

2009-Sep-01 06:27 UTC

head link

[Lustre-discuss] NFS vs Lustre

Hi!

On Mon, Aug 31, 2009 at 04:34:58PM -0400, Brian J. Murrell
wrote:> On Mon, 2009-08-31 at 21:56 +0200, Daniel Kobras wrote:
> > Lustre''s
> > standard config follows Posix and allows dirty client-side caches
after
> > close(). Performance improves as a result, of course, but in case
something
> > goes wrong on the net or the server, users potentially lose data just
like on
> > any local Posix filesystem.
> 
> I don''t think this is true.  This is something that I am only
> peripherally knowledgeable about and I am sure somebody like Andreas or
> Johann can correct me if/where I go wrong...
> 
> You are right that there is an opportunity for a client to write to an
> OST and get it''s write(2) call returned before data goes to
physical
> disk.  But Lustre clients know that, and therefore they keep the state
> needed to replay that write(2) to the server until the server sends back
> a commit callback.  The commit callback is what tells the client that
> the data actually went to physical media and that it can now purge any
> state required to replay that transaction.
Lustre can recover from certain error conditions just fine, of course, but
still it cannot recover gracefully from others. Think double failures or, more
likely, connectivity problems to a subset of hosts. For instance, if, say, an
Ethernet switch goes down for a few minutes with IB still available, all
Ethernet-connected clients will get evicted. Users won''t necessarily
notice
that there was a problem, but they''ve just potentially lost data. VBR
makes the
data loss less likely in this case, but the possibility is still there.
I''d
suspect you''ll always be able to construct similar corner cases as long
as the
networked filesystem allows dirty caches after close().

Regards,

Daniel.

Jake Maul

2009-Sep-17 05:48 UTC

head link

[Lustre-discuss] NFS vs Lustre

Greetings all,

I''d like to throw in my 2c as well. I''m not a Lustre dev, just
a
sysadmin who manages a small (<10TB) Lustre data store.

For some background, we''re using it in a web hosting environment for a
particularly large set of websites. We''re also considering it for use
as a storage backend to a cluster of VPS servers. Our "default" choice
for new clusters is usually NFS, for many of the reasons mentioned
already- pretty good read performance, makes good use of client- and
server-side caching with no extra work, and above all it''s *extremely*
simple to maintain. You can install 2 machines with completely stock
Linux distros, and the odds are both of them will support being an NFS
server *and* client, and will talk to each other with only minimal
effort.

Our problems with NFS: Occasionally we need better locking support
than NFS delivers. Often capacity scalability is a concern (if you
planned for it, you can grow the NFS-exported volume to some extent).
Scaling out to many clients (frontend web servers in our case,
usually) is sometimes a problem, although realistically we just don''t
need that many frontends very often.

The downside to Lustre is the complexity. Initial setup is much
simpler than, say, Red Hat GFS or OCFS2, but still *vastly* more
complicated than NFS, due in large part to the ubiquity of NFS. If NFS
breaks (and it rarely does for us), the fix is usually pretty simple.
If Lustre breaks... well, let''s just say I don''t like being
the guy
on-call. It could be worse, but it''s no picnic. We''ve had a
lot more
downtime with our *redundant* Lustre cluster than we ever did with the
standalone NFS servers it replaced.

Documentation-wise, a lot of NFS documentation is extremely dated, and
what used to be good advice often isn''t anymore. My personal opinion
is that the Red Hat GFS documentation is an utter disaster. It looks
great from 50,000 feet but is nigh-impossible to implement without
much head-bashing. You may have found that really nice article in Red
Hat Magazine about NFS vs GFS scalability. Looks cool, doesn''t it? We
tried that and gave up a week later when we just couldn''t make it
stable- yeah we could make it work, but it''d be a *constant* headache.
Lustre, on the other hand, has pretty good documentation. The admin
guide is beefy and detailed, and has a lot of good info. Some of it
feels dated (1.6 vs 1.8), but all in all I''m happy with it.

Redundancy is a problem- you can sorta do HA-NFS, but it''s not
particularly pretty, and it''s not conveniently active-active. Lustre
has some redundancy abilities, although none of them are what I''d call
"native". To me, native failover redundancy would mean Lustre handles
the data migration/synchronization and the actual failover. Lustre
supports multiple targets for the same data, and will try them both if
it''s not working... but it''s up to *you* to make sure the data
is
actually *available* in both places. We use DRBD for this, and
heartbeat to handle it. It mostly works, but I''m not really happy with
it. It''s no worse than what NFS offers, and sometimes better.

You can easily do a LOT of disk space on one server if needed. I''ve
seen a 25TB array on one server (Dell MD1000''s + Windows!), and
*heard* of as much as 67TB on one server (not NFS though). I really
don''t know how well NFS handles arrays that size, but it should at
least function. Of course, with Lustre, you can still do that much on
one server, *plus* more servers with that much too.

There''s also staffing to consider. Being so much simpler, NFS wins
because you don''t need as highly-trained staff to deal with it. NFS
probably costs less from a personnel standpoint- Lustre admins are
rarer, and therefore probably command higher salaries, it''s not
obvious that you would need fewer of them. At some point a manager
will have to decide if the technological benefits of Lustre outweigh
the extra staffing costs to maintain it (if there actually are any
such costs).

All in all, neither is really ideal, and they have different
strengths. If you need to be 24/7 and not a lot of your staff is going
to have time to become proficient with a complicated storage subsystem
like Lustre, you''re probably better off with NFS. If you really need
better scalability or POSIX-ness, and can stand the administrative
overhead, Lustre works.

I guess the proof is in the pudding- we''re not planning on migrating
en-masse from NFS to Lustre. We''re sticking with NFS as our default
choice, at least for the time being.

Happy sysadmin-ing,
Jake

On Wed, Aug 26, 2009 at 3:11 AM, Tharindu Rukshan Bamunuarachchi
<tharindub at millenniumit.com> wrote:>
> hi All,
>
>
>
> I need to prepare small report on ?NFS vs. Lustre? ?
>
>
>
> I could find lot of resources about Lustre vs. (CXFS, GPFS, GFS) ?
>
>
>
> Can you guys please provide few tips ? URLs ? etc.
>
>
>
>
>
>
>
>
>
> cheers,
>
> __
>
> tharindu
>
>
>
>
*******************************************************************************************************************************************************************
>
> "The information contained in this email including in any attachment
is confidential and is meant to be read only by the person to whom it is
addressed. If you are not the intended recipient(s), you are prohibited from
printing, forwarding, saving or copying this email. If you have received this
e-mail in error, please immediately notify the sender and delete this e-mail and
its attachments from your computer."
>
>
*******************************************************************************************************************************************************************
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Lustre discuss - Aug 2009 - NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre

[Lustre-discuss] NFS vs Lustre