thr3ads.net - Lustre devel - [Lustre-devel] Flush on file close [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Andrew Perepechko

2010-Apr-19 18:30 UTC

[Lustre-devel] Flush on file close

Some applications expect non-zero errno on close() for any errors that may 
happen during flushing dirty cached data/metadata even though linux manual 
page for close(2) suggests that fsync(2) should be used prior to close(2) in 
order to detect problems like those.

Since syncing may degrade performance to a large extent, what do you think is 
the best/most convenient/least intrusive way to switch to that behaviour? 
Should it be a mount option for the client or anything else?

Andrew.

Oleg Drokin

2010-Apr-19 22:31 UTC

head link

[Lustre-devel] Flush on file close

Hello!

On Apr 19, 2010, at 2:30 PM, Andrew Perepechko wrote:> Since syncing may degrade performance to a large extent, what do you think
is
> the best/most convenient/least intrusive way to switch to that behaviour? 
> Should it be a mount option for the client or anything else?
Personally I think this is a non-issue for us.
There is no mad dash in other filesystems to implement this, aside from NFS
where this
is what is guaranteed.
As long as we don''t lose data after close (which we usually
don''t anyway) we are in the clear.
If we do (eviction, double failure) then there was some system exception that is
kind of
similar to disk failure for regular fs.
Apps doing this "close after each write" should be fixed anyway
because now they incur
additional overhead of two extra unneeded sync RPCs (open + close) and adding
more overhead
to this is not going to help even if made optional.

Bye,
    Oleg

Andreas Dilger

2010-Apr-20 02:34 UTC

head link

[Lustre-devel] Flush on file close

One thing we can do to improve this situation a bit is to return any  
previous write error codes at close time.

Cheers, Andreas

On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko at Sun.COM>
wrote:
> Some applications expect non-zero errno on close() for any errors  
> that may
> happen during flushing dirty cached data/metadata even though linux  
> manual
> page for close(2) suggests that fsync(2) should be used prior to  
> close(2) in
> order to detect problems like those.
>
> Since syncing may degrade performance to a large extent, what do you  
> think is
> the best/most convenient/least intrusive way to switch to that  
> behaviour?
> Should it be a mount option for the client or anything else?
>
> Andrew.
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Andreas Dilger

2010-Apr-20 04:04 UTC

head link

[Lustre-devel] Flush on file close

On 2010-04-19, at 12:30, Andrew Perepechko wrote:> Some applications expect non-zero errno on close() for any errors  
> that may
> happen during flushing dirty cached data/metadata even though linux  
> manual
> page for close(2) suggests that fsync(2) should be used prior to  
> close(2) in
> order to detect problems like those.
>
> Since syncing may degrade performance to a large extent, what do you  
> think is
> the best/most convenient/least intrusive way to switch to that  
> behaviour?
> Should it be a mount option for the client or anything else?

If the application is expecting the "flush all outstanding data"  
semantics of fsync() on close, it should just call fsync() and use the  
return code of that instead of overloading close() to do it.  There  
will be no worse performance by waiting for fsync() to complete, than  
waiting for a close that is flushing all of the data.

In contrast, if an application is NOT expecting synchronous flushing  
on close, then making every close flush all data WILL be a major  
performance impact.  Better to have applications do what they mean,  
and system calls do what they are supposed to, instead of adding in  
extra semantics that not every application wants.

Also, as Oleg wrote, if an application is doing close/open when it is  
really just trying to flush the data to disk (which I saw some traces  
of recently) then this is even WORSE than just doing an fsync().

Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

Oleg Drokin

2010-Apr-20 05:16 UTC

head link

[Lustre-devel] Flush on file close

Actually this is already being done. We do set AS_error (or something like
that).
There is only one exception where on eviction we forgot to implement this, I
think.

On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
> One thing we can do to improve this situation a bit is to return any  
> previous write error codes at close time.
> 
> Cheers, Andreas
> 
> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko at
Sun.COM>
> wrote:
> 
>> Some applications expect non-zero errno on close() for any errors  
>> that may
>> happen during flushing dirty cached data/metadata even though linux  
>> manual
>> page for close(2) suggests that fsync(2) should be used prior to  
>> close(2) in
>> order to detect problems like those.
>> 
>> Since syncing may degrade performance to a large extent, what do you  
>> think is
>> the best/most convenient/least intrusive way to switch to that  
>> behaviour?
>> Should it be a mount option for the client or anything else?
>> 
>> Andrew.
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Mitchell Erblich

2010-Apr-20 07:56 UTC

head link

[Lustre-devel] Flush on file close

Possible Suggestion,

			NFS allows async writes and then commits.

			I would almost suggest this..

			As yes, close''s are not expected to fail.
			What would you do / should you do if a close failed?

			Mitchell Erblich

			

			
On Apr 19, 2010, at 10:16 PM, Oleg Drokin wrote:
> Actually this is already being done. We do set AS_error (or something like
that).
> There is only one exception where on eviction we forgot to implement this,
I think.
> 
> On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
> 
>> One thing we can do to improve this situation a bit is to return any  
>> previous write error codes at close time.
>> 
>> Cheers, Andreas
>> 
>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko at
Sun.COM>
>> wrote:
>> 
>>> Some applications expect non-zero errno on close() for any errors  
>>> that may
>>> happen during flushing dirty cached data/metadata even though linux
>>> manual
>>> page for close(2) suggests that fsync(2) should be used prior to  
>>> close(2) in
>>> order to detect problems like those.
>>> 
>>> Since syncing may degrade performance to a large extent, what do
you
>>> think is
>>> the best/most convenient/least intrusive way to switch to that  
>>> behaviour?
>>> Should it be a mount option for the client or anything else?
>>> 
>>> Andrew.
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Oleg Drokin

2010-Apr-20 19:46 UTC

head link

[Lustre-devel] Flush on file close

Hello!

On Apr 20, 2010, at 3:56 AM, Mitchell Erblich wrote:
> 			NFS allows async writes and then commits.
It''s all internal within NFS. There is no way for the client to say
"these writes are async", "now do commit" aside from
doing manual syncs for commits and using some sort of sync io open flags
to force "sync" writes.
Of course additional level of confusion is the "async writes" you
reference
have nothing to do with real write system calls. In fact write system calls
put data in memory and data might stay there for who knows how long
(or until close) and only when NFS client decides (or VFS forces it to)
to write the data to the server is when these "async writes" are used,
Lustre has this too.
> 			I would almost suggest this..
It''s already implemented sort of NFS-style except the close-to-open.
> 			As yes, close''s are not expected to fail.
Huh?!
> 			What would you do / should you do if a close failed?
Well, you assume the file content is no longer correct and write it again, I
guess.
Or if you are like 99% of people writing programs out there, you just bail out
with an error ;)

Bye,
    Oleg

Oleg Drokin

2010-Apr-20 19:57 UTC

head link

[Lustre-devel] Flush on file close

> It''s all internal within NFS. There is no way for the client to
say
> "these writes are async", "now do commit" aside from
I meant to say "no way for the application" here.

Nicolas Williams

2010-Apr-20 20:27 UTC

head link

[Lustre-devel] Flush on file close

POSIX behavior is that close(2) can fail, so if you care, you should
check its error status.  POSIX behavior is also that close(2) doesn''t
imply an fsync(2) (much less a sync(2)).

NFS clients implement fsync(2)-on-close, but that''s another story.

However, when writes deferred at close(2) time fail on a local
filesystem...  chances are that subsequent I/O will just fail.  Or at
least that''s probably what many users will expect.  But does POSIX
require that?  I don''t have it handy, but I''m pretty sure the
answer is
"no".  With Lustre we could also have a close(2) whose deferred writes
fail long after the process that could handle the failure is gone.

Databases and so on should always fsync(2) when they need to ensure
consistent on-disk state.  Many apps that write(2) don''t need to
fsync(2) (think of writes to log files).

Nico
--

Nicolas Williams

2010-Apr-20 20:43 UTC

head link

[Lustre-devel] Flush on file close

On Tue, Apr 20, 2010 at 03:27:40PM -0500, Nicolas Williams
wrote:> However, when writes deferred at close(2) time fail on a local
> filesystem...  chances are that subsequent I/O will just fail.  Or at
> least that''s probably what many users will expect.  But does POSIX
> require that?  I don''t have it handy, but I''m pretty sure
the answer is
> "no".  With Lustre we could also have a close(2) whose deferred
writes
> fail long after the process that could handle the failure is gone.
To go out on a complete limb :) what we really need is a variant of
close(2) where eventual failure can be caught, even when the process
that called it has exit()ed.  Something like:

int close_or_spawn(int fd, <posix_spawn()-like arguments>);

or

int close_xid(int fd, uint64_t xid);   /*
                                        * Where some daemon(s) reads a
					* log of sucessful/failed close
					* XIDs and takes action as
					* necessary.
					*/

I prefer something along the lines of close_xid().

Adoption of new APIs in the context of Lustre is probably more
realistic than in the general case, but it''d still be slow.

So we''re still left with: you''d better fsync(2) explicitly
before
close()ing if you want to make sure that you don''t lose data.

Nico
--

Andreas Dilger

2010-Apr-21 21:04 UTC

head link

[Lustre-devel] Flush on file close

On 2010-04-19, at 23:16, Oleg Drokin wrote:> Actually this is already being done. We do set AS_error (or something like
that).  There is only one exception where on eviction we forgot to implement
this, I think.
One thing I see in both the 1.x and 2.x client code is that even though we track
the errors from async file IO on the file descriptor, we don''t actually
return them to the application at close time:

ll_file_release()
{
        if (lsm)
                lov_test_and_clear_async_rc(lsm);
        lli->lli_async_rc = 0;     

It looks like this should be something like:

        if (lsm)
                rc2 = lov_test_and_clear_async_rc(lsm);
        if (rc2 == 0)
                rc2 = lli->lli_async_rc;     
        lli->lli_async_rc = 0;
   > On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
> 
>> One thing we can do to improve this situation a bit is to return any  
>> previous write error codes at close time.
>> 
>> Cheers, Andreas
>> 
>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko at
Sun.COM>
>> wrote:
>> 
>>> Some applications expect non-zero errno on close() for any errors  
>>> that may
>>> happen during flushing dirty cached data/metadata even though linux
>>> manual
>>> page for close(2) suggests that fsync(2) should be used prior to  
>>> close(2) in
>>> order to detect problems like those.
>>> 
>>> Since syncing may degrade performance to a large extent, what do
you
>>> think is
>>> the best/most convenient/least intrusive way to switch to that  
>>> behaviour?
>>> Should it be a mount option for the client or anything else?
>>> 
>>> Andrew.
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 

Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

Alexey Lyashkov

2010-Apr-22 05:15 UTC

head link

[Lustre-devel] Flush on file close

Oleg, say about code:

                lock_page(page);
                wait_on_page_writeback(page);
                if (rc < 0) {
                        CERROR("writepage inode %lu(%p) of page %p "
                               "failed: %d\n",
mapping->host->i_ino,
                               mapping->host, page, rc);
                        if (rc == -ENOSPC)
                                set_bit(AS_ENOSPC, &mapping->flags);
                        else
                                set_bit(AS_EIO, &mapping->flags);
                }

in lock cancel handler and IO completion.
This better way to tell linux kernel about hit a error, because lov async error
isn''t help in case O_SYNC write (and some other - as i remember).
int wait_on_page_writeback_range(struct address_space *mapping,
                                pgoff_t start, pgoff_t end)
...
        /* Check for outstanding write errors */
        if (test_and_clear_bit(AS_ENOSPC, &mapping->flags))
                ret = -ENOSPC;
        if (test_and_clear_bit(AS_EIO, &mapping->flags))
                ret = -EIO;

        return ret;


 
On Apr 22, 2010, at 00:04, Andreas Dilger wrote:
> On 2010-04-19, at 23:16, Oleg Drokin wrote:
>> Actually this is already being done. We do set AS_error (or something
like that).  There is only one exception where on eviction we forgot to
implement this, I think.
> 
> One thing I see in both the 1.x and 2.x client code is that even though we
track the errors from async file IO on the file descriptor, we don''t
actually return them to the application at close time:
> 
> ll_file_release()
> {
>        if (lsm)
>                lov_test_and_clear_async_rc(lsm);
>        lli->lli_async_rc = 0;     
> 
> It looks like this should be something like:
> 
>        if (lsm)
>                rc2 = lov_test_and_clear_async_rc(lsm);
>        if (rc2 == 0)
>                rc2 = lli->lli_async_rc;     
>        lli->lli_async_rc = 0;
> 
>> On Apr 19, 2010, at 10:34 PM, Andreas Dilger wrote:
>> 
>>> One thing we can do to improve this situation a bit is to return
any
>>> previous write error codes at close time.
>>> 
>>> Cheers, Andreas
>>> 
>>> On 2010-04-19, at 12:30, Andrew Perepechko <Andrew.Perepechko at
Sun.COM>
>>> wrote:
>>> 
>>>> Some applications expect non-zero errno on close() for any
errors
>>>> that may
>>>> happen during flushing dirty cached data/metadata even though
linux
>>>> manual
>>>> page for close(2) suggests that fsync(2) should be used prior
to
>>>> close(2) in
>>>> order to detect problems like those.
>>>> 
>>>> Since syncing may degrade performance to a large extent, what
do you
>>>> think is
>>>> the best/most convenient/least intrusive way to switch to that
>>>> behaviour?
>>>> Should it be a mount option for the client or anything else?
>>>> 
>>>> Andrew.
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Engineer, Lustre Group
> Oracle Corporation Canada Inc.
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Lustre devel - Apr 2010 - Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close

[Lustre-devel] Flush on file close