Hello I''m wondering how Lustre client handles recovery when OST restarts with abort_recov flag set. Let''s say a client has page to flush to OST, but OST is stopped, then restarts with -o abort_recov. There is no recovery, so: 1- client retakes extent locks and then re-try to flush its pages or 2- client cannot flush anymore and drop the i/o, returns an error to the caller. If #2, what if the process has already closed the file ? What is the file is still opened and the process try to do another I/O, will it have an error for the former bad i/o? abort_recov is used only at first start, or the OST uses this flag until it is stopped for any other recovery-like mechanisms? Thanks -- Aurelien Degremont
On 2010-07-12, at 04:10, Aurelien Degremont wrote:> I''m wondering how Lustre client handles recovery when OST restarts with abort_recov flag set. > > Let''s say a client has page to flush to OST, but OST is stopped, then restarts with -o abort_recov. There is no recovery, so: > 1- client retakes extent locks and then re-try to flush its pages > or > 2- client cannot flush anymore and drop the i/o, returns an error to the caller.When the client is evicted, it drops all of its locks for that OST, and any unwritten pages for those files is discarded. While I know Lustre will save errors from async write RPCs into the file descriptor (for later write calls or fsync), I don''t know if we save any IO error into the file descriptor if we discard pages due to eviction. I think only errors due to currently in-flight RPCs that are aborted due to client eviction are returned. This is the same for "-o abort_recov" or if the client is evicted for other reasons (failed lock callbacks, or failed recovery even if abort_recovery is not used).> If #2, what if the process has already closed the file ? > What is the file is still opened and the process try to do another I/O, will it have an error for the former bad i/o?If the file is not closed yet, then fsync or a later write will return an earlier error. If the file descriptor is closed then there is no way to return that error. That is true for local filesystems as well.> abort_recov is used only at first start, or the OST uses this flag until it is stopped for any other recovery-like mechanisms?The "abort_recov" mount option is equivalent to: lctl --device {ost dev} abort_recovery it is only affecting the initial startup recovery, and is ignored afterward. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Andreas Dilger a ?crit :> While I know Lustre will save errors from async write RPCs into the file descriptor> for later write calls or fsync), I don''t know if we save any IO error into the file > descriptor if we discard pages due to eviction. I think only errors due to currently > in-flight RPCs that are aborted due to client eviction are returned. Sounds like a bug to me? That means, if a process write data on a client, those data goes to page cache. Not yet to OST if there is no local memory pressure. At that moment, if the client is evicted, those pages are dropped. Then client reconnect, the process writes other data. Those I/O are successful, client has missed that some previous I/O failed? Am I correct? -- Aurelien Degremont CEA
On 2010-07-15, at 02:05, Aurelien Degremont wrote:> Andreas Dilger a ?crit : >> While I know Lustre will save errors from async write RPCs into the file descriptor for later write calls or fsync), I don''t know if we save any IO error into the file descriptor if we discard pages due to eviction. I think only errors due to currently in-flight RPCs that are aborted due to client eviction are returned. > > Sounds like a bug to me? That means, if a process write data on a client, those data goes to page cache. Not yet to OST if there is no local memory pressure. At that moment, if the client is evicted, those pages are dropped. Then client reconnect, the process writes other data. Those I/O are successful, client has missed that some previous I/O failed?I would agree. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On 07/15/2010 11:19 AM, Andreas Dilger wrote:> On 2010-07-15, at 02:05, Aurelien Degremont wrote: >> Andreas Dilger a ?crit : >>> While I know Lustre will save errors from async write RPCs into >>> the file descriptor for later write calls or fsync), I don''t know >>> if we save any IO error into the file descriptor if we discard >>> pages due to eviction. I think only errors due to currently >>> in-flight RPCs that are aborted due to client eviction are >>> returned.If the async write fails due to eviction then writepage() will store -ESHUTDOWN in the inode info''s lli_async_rc member.>> Sounds like a bug to me? That means, if a process write data on a >> client, those data goes to page cache. Not yet to OST if there is >> no local memory pressure. At that moment, if the client is evicted, >> those pages are dropped. Then client reconnect, the process writes >> other data. Those I/O are successful, client has missed that some >> previous I/O failed?I filed a bug because the async errors weren''t being reported, see https://bugzilla.lustre.org/show_bug.cgi?id=22360. It looks like this is addressed in 1.8.4. Thereafter they should be reported on the next call to close() for that inode; but note that the error need not go to the processes whose writes were lost. Tant pis! -- John L. Hammond, Ph.D. ICES, The University of Texas at Austin jhammond at ices.utexas.edu (512) 471-9304
On Jul 15, 2010, at 21:57, John Hammond wrote:> On 07/15/2010 11:19 AM, Andreas Dilger wrote: >> On 2010-07-15, at 02:05, Aurelien Degremont wrote: >>> Andreas Dilger a ?crit : >>>> While I know Lustre will save errors from async write RPCs into >>>> the file descriptor for later write calls or fsync), I don''t know >>>> if we save any IO error into the file descriptor if we discard >>>> pages due to eviction. I think only errors due to currently >>>> in-flight RPCs that are aborted due to client eviction are >>>> returned. > > If the async write fails due to eviction then writepage() will store > -ESHUTDOWN in the inode info''s lli_async_rc member.no sure. look to ll_ap_completion to correct error reporting. } else { if (cmd & OBD_BRW_READ) { llap->llap_defer_uptodate = 0; } SetPageError(page); if (rc == -ENOSPC) set_bit(AS_ENOSPC, &page->mapping->flags); else set_bit(AS_EIO, &page->mapping->flags); } but that codepath never called if client has dirty data, but async IO don''t started. in that case, client canceled owned locks with local + discard flags set, so ll_page_removal_cb called with discard flag set and error bit don''t set in mapping. -------------------------------------- Alexey Lyashkov alexey.lyashkov at clusterstor.com