thr3ads.net - Lustre devel - [Lustre-devel] client i/o and PG

If this information is useful, please help other people find it:
Share via:

Johann Lombardi

2011-Aug-10 14:23 UTC

[Lustre-devel] client i/o and PG_writeback

Hi there,

I am working on a new client-side RPC engine using the per-stripe radix tree to
select pages and trying to minimize RPC fragmentation. This should allow us to
consume grant space more intelligently and to support blocksize > pagesize
(e.g. for ext4 bigalloc).

For historical reasons (lustre was initially developed for 2.4 kernels), the 1.8
client holds the page lock over bulk write RPCs. Some basic support for
PG_writeback was added back in 2007 (see bugzilla ticket 11710), but the page
lock is still held until RPC completion.
Like the 1.8 client, the new client i/o stack introduced in 2.0 also keeps pages
locked over transfer. I''m estimating the effort involved in
implementing full PG_writeback support in CLIO. Does anybody have any technical
concerns about this change?

Thanks in advance.

Cheers,
Johann
-- 
Johann Lombardi
Whamcloud, Inc.
www.whamcloud.com

Nikita Danilov

2011-Aug-10 14:46 UTC

head link

[Lustre-devel] client i/o and PG_writeback

On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi
wrote:> Hi there,
Hi Johann, [sorry, I hit a send button accidentally a few minutes ago]
> 
> I am working on a new client-side RPC engine using the per-stripe radix
tree to select pages and trying to minimize RPC fragmentation. This should allow
us to consume grant space more intelligently and to support blocksize >
pagesize (e.g. for ext4 bigalloc).
> 
> For historical reasons (lustre was initially developed for 2.4 kernels),
the 1.8 client holds the page lock over bulk write RPCs. Some basic support for
PG_writeback was added back in 2007 (see bugzilla ticket 11710), but the page
lock is still held until RPC completion.
> Like the 1.8 client, the new client i/o stack introduced in 2.0 also keeps
pages locked over transfer. I''m estimating the effort involved in
implementing full PG_writeback support in CLIO. Does anybody have any technical
concerns about this change?
the reasons to use the same lock for page-in and page-out in CLIO were

 * portability: Solaris, Windows and pretty much every kernel around use the
same lock and

 * simplicity.

I don''t think there are any serious problems with splitting the lock,
one has to be careful with checking all places where page is assumed to be
"owned" by IO and making certain the lock is taken, if necessary.
> 
> Thanks in advance.
> 
> Cheers,
> Johann
Nikita.
> -- 
> Johann Lombardi
> Whamcloud, Inc.
> www.whamcloud.com (http://www.whamcloud.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20110810/267c3583/attachment.html

Jinshan Xiong

2011-Aug-11 18:21 UTC

head link

[Lustre-devel] client i/o and PG_writeback

Another problem I can think of is page checksum, if a page changes again
during transfer, wrong checksum will be detected on the server side.

On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote:> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote:
> > Hi there,
> > 
> 
> 
> Hi Johann, [sorry, I hit a send button accidentally a few minutes
> ago] 
> > 
> > I am working on a new client-side RPC engine using the per-stripe
> > radix tree to select pages and trying to minimize RPC fragmentation.
> > This should allow us to consume grant space more intelligently and
> > to support blocksize > pagesize (e.g. for ext4 bigalloc).
> > 
> > For historical reasons (lustre was initially developed for 2.4
> > kernels), the 1.8 client holds the page lock over bulk write RPCs.
> > Some basic support for PG_writeback was added back in 2007 (see
> > bugzilla ticket 11710), but the page lock is still held until RPC
> > completion.
> > Like the 1.8 client, the new client i/o stack introduced in 2.0 also
> > keeps pages locked over transfer. I''m estimating the effort
involved
> > in implementing full PG_writeback support in CLIO. Does anybody have
> > any technical concerns about this change?
> > 
> 
> 
> the reasons to use the same lock for page-in and page-out in CLIO were
> 
> 
>     * portability: Solaris, Windows and pretty much every kernel
> around use the same lock and
> 
> 
>     * simplicity.
> 
> 
> I don''t think there are any serious problems with splitting the
lock,
> one has to be careful with checking all places where page is assumed
> to be "owned" by IO and making certain the lock is taken, if
> necessary.
>  
> > 
> > Thanks in advance.
> > 
> > Cheers,
> > Johann
> > 
> 
> 
> Nikita.
>  
> > -- 
> > Johann Lombardi
> > Whamcloud, Inc.
> > www.whamcloud.com
> > 
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
--

Andreas Dilger

2011-Aug-11 19:38 UTC

head link

[Lustre-devel] client i/o and PG_writeback

On 2011-08-11, at 12:21 PM, Jinshan Xiong wrote:> Another problem I can think of is page checksum, if a page changes again
> during transfer, wrong checksum will be detected on the server side.
Actually, the kernel now uses PG_writeback to protect the page from being
modified while it is being written to disk (or in our case sent to the
network).  This was recently fixed for ext4 and other filesystems so that
they can run properly on devices that support T10-DIF checksums.  Otherwise
the disk keeps reporting checksum errors for files that were modified during
IO.

Lustre has a workaround for the case where the page is mmapped and is
modified during RPC sending (the only case today where the page can be
modified during IO), but it would be better not to have this workaround
at all.  In this case the OSS detects the checksum error and the client
resends like any other data corruption, but there is a flag in the RPC
that silences the error messages that would otherwise be printed.
> On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote:
>> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote:
>>> Hi there,
>>> 
>> 
>> 
>> Hi Johann, [sorry, I hit a send button accidentally a few minutes
>> ago] 
>>> 
>>> I am working on a new client-side RPC engine using the per-stripe
>>> radix tree to select pages and trying to minimize RPC
fragmentation.
>>> This should allow us to consume grant space more intelligently and
>>> to support blocksize > pagesize (e.g. for ext4 bigalloc).
>>> 
>>> For historical reasons (lustre was initially developed for 2.4
>>> kernels), the 1.8 client holds the page lock over bulk write RPCs.
>>> Some basic support for PG_writeback was added back in 2007 (see
>>> bugzilla ticket 11710), but the page lock is still held until RPC
>>> completion.
>>> Like the 1.8 client, the new client i/o stack introduced in 2.0
also
>>> keeps pages locked over transfer. I''m estimating the
effort involved
>>> in implementing full PG_writeback support in CLIO. Does anybody
have
>>> any technical concerns about this change?
>>> 
>> 
>> 
>> the reasons to use the same lock for page-in and page-out in CLIO were
>> 
>> 
>>    * portability: Solaris, Windows and pretty much every kernel
>> around use the same lock and
>> 
>> 
>>    * simplicity.
>> 
>> 
>> I don''t think there are any serious problems with splitting
the lock,
>> one has to be careful with checking all places where page is assumed
>> to be "owned" by IO and making certain the lock is taken, if
>> necessary.
>> 
>>> 
>>> Thanks in advance.
>>> 
>>> Cheers,
>>> Johann
>>> 
>> 
>> 
>> Nikita.
>> 
>>> -- 
>>> Johann Lombardi
>>> Whamcloud, Inc.
>>> www.whamcloud.com
>>> 
>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> -- 
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Jinshan Xiong

2011-Aug-11 21:40 UTC

head link

[Lustre-devel] client i/o and PG_writeback

On Thu, 2011-08-11 at 13:38 -0600, Andreas Dilger wrote:> On 2011-08-11, at 12:21 PM, Jinshan Xiong wrote:
> > Another problem I can think of is page checksum, if a page changes
again
> > during transfer, wrong checksum will be detected on the server side.
> 
> Actually, the kernel now uses PG_writeback to protect the page from being
> modified while it is being written to disk (or in our case sent to the
> network).  This was recently fixed for ext4 and other filesystems so that
> they can run properly on devices that support T10-DIF checksums.  Otherwise
> the disk keeps reporting checksum errors for files that were modified
during
> IO.
This really means PG_writeback is useless. The most important benefit of
PG_writeback is to allow to write a being flushed page.
> 
> Lustre has a workaround for the case where the page is mmapped and is
> modified during RPC sending (the only case today where the page can be
> modified during IO), but it would be better not to have this workaround
> at all.  In this case the OSS detects the checksum error and the client
> resends like any other data corruption, but there is a flag in the RPC
> that silences the error messages that would otherwise be printed.
> 
> > On Wed, 2011-08-10 at 18:46 +0400, Nikita Danilov wrote:
> >> On Wednesday, August 10, 2011 at 18:23 , Johann Lombardi wrote:
> >>> Hi there,
> >>> 
> >> 
> >> 
> >> Hi Johann, [sorry, I hit a send button accidentally a few minutes
> >> ago] 
> >>> 
> >>> I am working on a new client-side RPC engine using the
per-stripe
> >>> radix tree to select pages and trying to minimize RPC
fragmentation.
> >>> This should allow us to consume grant space more intelligently
and
> >>> to support blocksize > pagesize (e.g. for ext4 bigalloc).
> >>> 
> >>> For historical reasons (lustre was initially developed for 2.4
> >>> kernels), the 1.8 client holds the page lock over bulk write
RPCs.
> >>> Some basic support for PG_writeback was added back in 2007
(see
> >>> bugzilla ticket 11710), but the page lock is still held until
RPC
> >>> completion.
> >>> Like the 1.8 client, the new client i/o stack introduced in
2.0 also
> >>> keeps pages locked over transfer. I''m estimating the
effort involved
> >>> in implementing full PG_writeback support in CLIO. Does
anybody have
> >>> any technical concerns about this change?
> >>> 
> >> 
> >> 
> >> the reasons to use the same lock for page-in and page-out in CLIO
were
> >> 
> >> 
> >>    * portability: Solaris, Windows and pretty much every kernel
> >> around use the same lock and
> >> 
> >> 
> >>    * simplicity.
> >> 
> >> 
> >> I don''t think there are any serious problems with
splitting the lock,
> >> one has to be careful with checking all places where page is
assumed
> >> to be "owned" by IO and making certain the lock is
taken, if
> >> necessary.
> >> 
> >>> 
> >>> Thanks in advance.
> >>> 
> >>> Cheers,
> >>> Johann
> >>> 
> >> 
> >> 
> >> Nikita.
> >> 
> >>> -- 
> >>> Johann Lombardi
> >>> Whamcloud, Inc.
> >>> www.whamcloud.com
> >>> 
> >> 
> >> 
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> > 
> > -- 
> > 
> > 
> > _______________________________________________
> > Lustre-devel mailing list
> > Lustre-devel at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-devel
> 
> 
> Cheers, Andreas
> --
> Andreas Dilger 
> Principal Engineer
> Whamcloud, Inc.
> 
> 
> 
--

Johann Lombardi

2011-Aug-12 11:51 UTC

head link

[Lustre-devel] client i/o and PG_writeback

On Thu, Aug 11, 2011 at 02:40:23PM -0700, Jinshan Xiong
wrote:> This really means PG_writeback is useless. The most important benefit of
> PG_writeback is to allow to write a being flushed page.
The benefit of PG_writeback is that one can check whether or not a page is under
writeback and decide to wait for writeback to complete (via
wait_on_page_writeback()) or to skip this page. A good example is the linux
kernel writeback code which waits for writeback to complete for WB_SYNC_ALL
(data integrity flush), but not for WB_SYNC_NONE (just regular memory cleaning
writeback).

Cheers,
Johann
-- 
Johann Lombardi
Whamcloud, Inc.
www.whamcloud.com

Ken Hornstein

2011-Aug-17 02:01 UTC

head link

[Lustre-devel] client i/o and PG_writeback

>For historical reasons (lustre was initially developed for 2.4
>kernels), the 1.8 client holds the page lock over bulk write RPCs.
>Some basic support for PG_writeback was added back in 2007 (see
>bugzilla ticket 11710), but the page lock is still held until
>RPC completion.
Alright, let me ask some potentially dumb questions, no doubt due to my
lack of understanding of the Linux VM system:

- I''m missing the connection between using PG_writeback and selecting
pages
  to minimize RPC fragmentation (I mean, I understand why you want to
  minimize RPC fragmentation, I just don''t quite understand how
PG_writeback
  helps that).  I looked at bugzilla ticket 11710, but that seems to be
  about how using PG_writeback is important for fsync() support.

- As I understand it, you drop PG_locked and set PG_writeback during the
  actual write operation.  Correct?

- How are you planning on implementing this?  Via a new state in the
  cl_page state machine, and setting PG_writeback in the new state?
  A new page lock at the cl_page level?

As far as technical concerns ... as long as code that directly uses
PG_writeback and/or PageWriteback() stays in the llite directory,
then I don''t _think_ it will affect portability that much.  I believe
I can simply fake it under MacOS X.

--Ken

Lustre devel - Aug 2011 - client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback

[Lustre-devel] client i/o and PG_writeback