Hi there, I am working on a new client-side RPC engine using the per-stripe radix tree to select pages and trying to minimize RPC fragmentation. This should allow us to consume grant space more intelligently and to support blocksize > pagesize (e.g. for ext4 bigalloc). For historical reasons (lustre was initially developed for 2.4 kernels), the 1.8 client holds the page lock over bulk write RPCs. Some basic support for PG_writeback was added back in 2007 (see bugzilla ticket 11710), but the page lock is still held until RPC completion. Like the 1.8 client, the new client i/o stack introduced in 2.0 also keeps pages locked over transfer. I''m estimating the effort involved in implementing full PG_writeback support in CLIO. Does anybody have any technical concerns about this change? Thanks in advance. Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com
On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote:> Hi there,Hi Johann, [sorry, I hit a send button accidentally a few minutes ago]> > I am working on a new client-side RPC engine using the per-stripe radix tree to select pages and trying to minimize RPC fragmentation. This should allow us to consume grant space more intelligently and to support blocksize > pagesize (e.g. for ext4 bigalloc). > > For historical reasons (lustre was initially developed for 2.4 kernels), the 1.8 client holds the page lock over bulk write RPCs. Some basic support for PG_writeback was added back in 2007 (see bugzilla ticket 11710), but the page lock is still held until RPC completion. > Like the 1.8 client, the new client i/o stack introduced in 2.0 also keeps pages locked over transfer. I''m estimating the effort involved in implementing full PG_writeback support in CLIO. Does anybody have any technical concerns about this change?the reasons to use the same lock for page-in and page-out in CLIO were * portability: Solaris, Windows and pretty much every kernel around use the same lock and * simplicity. I don''t think there are any serious problems with splitting the lock, one has to be careful with checking all places where page is assumed to be "owned" by IO and making certain the lock is taken, if necessary.> > Thanks in advance. > > Cheers, > JohannNikita.> -- > Johann Lombardi > Whamcloud, Inc. > www.whamcloud.com (http://www.whamcloud.com)-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20110810/267c3583/attachment.html
Another problem I can think of is page checksum, if a page changes again during transfer, wrong checksum will be detected on the server side. On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote:> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote: > > Hi there, > > > > > Hi Johann, [sorry, I hit a send button accidentally a few minutes > ago] > > > > I am working on a new client-side RPC engine using the per-stripe > > radix tree to select pages and trying to minimize RPC fragmentation. > > This should allow us to consume grant space more intelligently and > > to support blocksize > pagesize (e.g. for ext4 bigalloc). > > > > For historical reasons (lustre was initially developed for 2.4 > > kernels), the 1.8 client holds the page lock over bulk write RPCs. > > Some basic support for PG_writeback was added back in 2007 (see > > bugzilla ticket 11710), but the page lock is still held until RPC > > completion. > > Like the 1.8 client, the new client i/o stack introduced in 2.0 also > > keeps pages locked over transfer. I''m estimating the effort involved > > in implementing full PG_writeback support in CLIO. Does anybody have > > any technical concerns about this change? > > > > > the reasons to use the same lock for page-in and page-out in CLIO were > > > * portability: Solaris, Windows and pretty much every kernel > around use the same lock and > > > * simplicity. > > > I don''t think there are any serious problems with splitting the lock, > one has to be careful with checking all places where page is assumed > to be "owned" by IO and making certain the lock is taken, if > necessary. > > > > > Thanks in advance. > > > > Cheers, > > Johann > > > > > Nikita. > > > -- > > Johann Lombardi > > Whamcloud, Inc. > > www.whamcloud.com > > > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel--
On 2011-08-11, at 12:21 PM, Jinshan Xiong wrote:> Another problem I can think of is page checksum, if a page changes again > during transfer, wrong checksum will be detected on the server side.Actually, the kernel now uses PG_writeback to protect the page from being modified while it is being written to disk (or in our case sent to the network). This was recently fixed for ext4 and other filesystems so that they can run properly on devices that support T10-DIF checksums. Otherwise the disk keeps reporting checksum errors for files that were modified during IO. Lustre has a workaround for the case where the page is mmapped and is modified during RPC sending (the only case today where the page can be modified during IO), but it would be better not to have this workaround at all. In this case the OSS detects the checksum error and the client resends like any other data corruption, but there is a flag in the RPC that silences the error messages that would otherwise be printed.> On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote: >> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote: >>> Hi there, >>> >> >> >> Hi Johann, [sorry, I hit a send button accidentally a few minutes >> ago] >>> >>> I am working on a new client-side RPC engine using the per-stripe >>> radix tree to select pages and trying to minimize RPC fragmentation. >>> This should allow us to consume grant space more intelligently and >>> to support blocksize > pagesize (e.g. for ext4 bigalloc). >>> >>> For historical reasons (lustre was initially developed for 2.4 >>> kernels), the 1.8 client holds the page lock over bulk write RPCs. >>> Some basic support for PG_writeback was added back in 2007 (see >>> bugzilla ticket 11710), but the page lock is still held until RPC >>> completion. >>> Like the 1.8 client, the new client i/o stack introduced in 2.0 also >>> keeps pages locked over transfer. I''m estimating the effort involved >>> in implementing full PG_writeback support in CLIO. Does anybody have >>> any technical concerns about this change? >>> >> >> >> the reasons to use the same lock for page-in and page-out in CLIO were >> >> >> * portability: Solaris, Windows and pretty much every kernel >> around use the same lock and >> >> >> * simplicity. >> >> >> I don''t think there are any serious problems with splitting the lock, >> one has to be careful with checking all places where page is assumed >> to be "owned" by IO and making certain the lock is taken, if >> necessary. >> >>> >>> Thanks in advance. >>> >>> Cheers, >>> Johann >>> >> >> >> Nikita. >> >>> -- >>> Johann Lombardi >>> Whamcloud, Inc. >>> www.whamcloud.com >>> >> >> >> _______________________________________________ >> Lustre-devel mailing list >> Lustre-devel at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > -- > > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-develCheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
On Thu, 2011-08-11 at 13:38 -0600, Andreas Dilger wrote:> On 2011-08-11, at 12:21 PM, Jinshan Xiong wrote: > > Another problem I can think of is page checksum, if a page changes again > > during transfer, wrong checksum will be detected on the server side. > > Actually, the kernel now uses PG_writeback to protect the page from being > modified while it is being written to disk (or in our case sent to the > network). This was recently fixed for ext4 and other filesystems so that > they can run properly on devices that support T10-DIF checksums. Otherwise > the disk keeps reporting checksum errors for files that were modified during > IO.This really means PG_writeback is useless. The most important benefit of PG_writeback is to allow to write a being flushed page.> > Lustre has a workaround for the case where the page is mmapped and is > modified during RPC sending (the only case today where the page can be > modified during IO), but it would be better not to have this workaround > at all. In this case the OSS detects the checksum error and the client > resends like any other data corruption, but there is a flag in the RPC > that silences the error messages that would otherwise be printed. > > > On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote: > >> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote: > >>> Hi there, > >>> > >> > >> > >> Hi Johann, [sorry, I hit a send button accidentally a few minutes > >> ago] > >>> > >>> I am working on a new client-side RPC engine using the per-stripe > >>> radix tree to select pages and trying to minimize RPC fragmentation. > >>> This should allow us to consume grant space more intelligently and > >>> to support blocksize > pagesize (e.g. for ext4 bigalloc). > >>> > >>> For historical reasons (lustre was initially developed for 2.4 > >>> kernels), the 1.8 client holds the page lock over bulk write RPCs. > >>> Some basic support for PG_writeback was added back in 2007 (see > >>> bugzilla ticket 11710), but the page lock is still held until RPC > >>> completion. > >>> Like the 1.8 client, the new client i/o stack introduced in 2.0 also > >>> keeps pages locked over transfer. I''m estimating the effort involved > >>> in implementing full PG_writeback support in CLIO. Does anybody have > >>> any technical concerns about this change? > >>> > >> > >> > >> the reasons to use the same lock for page-in and page-out in CLIO were > >> > >> > >> * portability: Solaris, Windows and pretty much every kernel > >> around use the same lock and > >> > >> > >> * simplicity. > >> > >> > >> I don''t think there are any serious problems with splitting the lock, > >> one has to be careful with checking all places where page is assumed > >> to be "owned" by IO and making certain the lock is taken, if > >> necessary. > >> > >>> > >>> Thanks in advance. > >>> > >>> Cheers, > >>> Johann > >>> > >> > >> > >> Nikita. > >> > >>> -- > >>> Johann Lombardi > >>> Whamcloud, Inc. > >>> www.whamcloud.com > >>> > >> > >> > >> _______________________________________________ > >> Lustre-devel mailing list > >> Lustre-devel at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-devel > > > > -- > > > > > > _______________________________________________ > > Lustre-devel mailing list > > Lustre-devel at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-devel > > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > >--
On Thu, Aug 11, 2011 at 02:40:23PM -0700, Jinshan Xiong wrote:> This really means PG_writeback is useless. The most important benefit of > PG_writeback is to allow to write a being flushed page.The benefit of PG_writeback is that one can check whether or not a page is under writeback and decide to wait for writeback to complete (via wait_on_page_writeback()) or to skip this page. A good example is the linux kernel writeback code which waits for writeback to complete for WB_SYNC_ALL (data integrity flush), but not for WB_SYNC_NONE (just regular memory cleaning writeback). Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com
>For historical reasons (lustre was initially developed for 2.4 >kernels), the 1.8 client holds the page lock over bulk write RPCs. >Some basic support for PG_writeback was added back in 2007 (see >bugzilla ticket 11710), but the page lock is still held until >RPC completion.Alright, let me ask some potentially dumb questions, no doubt due to my lack of understanding of the Linux VM system: - I''m missing the connection between using PG_writeback and selecting pages to minimize RPC fragmentation (I mean, I understand why you want to minimize RPC fragmentation, I just don''t quite understand how PG_writeback helps that). I looked at bugzilla ticket 11710, but that seems to be about how using PG_writeback is important for fsync() support. - As I understand it, you drop PG_locked and set PG_writeback during the actual write operation. Correct? - How are you planning on implementing this? Via a new state in the cl_page state machine, and setting PG_writeback in the new state? A new page lock at the cl_page level? As far as technical concerns ... as long as code that directly uses PG_writeback and/or PageWriteback() stays in the llite directory, then I don''t _think_ it will affect portability that much. I believe I can simply fake it under MacOS X. --Ken