thr3ads.net - Lustre devel - [Lustre-devel] proposal on implementing a new readahead in clio [Jan 2010]

If this information is useful, please help other people find it:
Share via:

jay

2010-Jan-20 12:37 UTC

[Lustre-devel] proposal on implementing a new readahead in clio

Hello,

We have discussed the implementation of new readahead in CLIO. Here I 
just send the design document out to ask for comments.

We have already had several inputs from Z(aka bzzz) and other engineers. 
I''m not going to copy those ideas here directly, because I''m
afraid to
distort something, so please reply this email to show your ideas - 
thanks in advance.

Jay

-- 
Good good study, day day up

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ra.tar
Type: application/x-tar
Size: 194560 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20100120/d07ce14d/attachment-0001.tar

jay

2010-Jan-22 10:53 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

UP!!

Here is a pdf version of design document. Also I''m attaching a picture 
because the pictures in pdf is not clear.

Thanks,
Jay

jay wrote:> Hello,
>
> We have discussed the implementation of new readahead in CLIO. Here I 
> just send the design document out to ask for comments.
>
> We have already had several inputs from Z(aka bzzz) and other 
> engineers. I''m not going to copy those ideas here directly,
because
> I''m afraid to distort something, so please reply this email to
show
> your ideas - thanks in advance.
>
> Jay
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

-- 
Good good study, day day up

-------------- next part --------------
A non-text attachment was scrubbed...
Name: readahead.pdf
Type: application/pdf
Size: 198695 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20100122/9c461a12/attachment-0001.pdf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lazy-readahead-1.jpg
Type: image/jpeg
Size: 142721 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20100122/9c461a12/attachment-0001.jpg

Alexey Lyashkov

2010-Jan-23 07:09 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

>>We have an idea to spawn a per file readahead thread for each process,
and this thread can be used to issue the readahead RPC
async.>>I correctly understand: you suggest a spawn one new thread per open
file?
so if client have 10 processes, and each process is open 100 files, you
need spawn 1000 new threads?


On Fri, 2010-01-22 at 18:53 +0800, jay wrote:> UP!!
> 
> Here is a pdf version of design document. Also I''m attaching a
picture
> because the pictures in pdf is not clear.
> 
> Thanks,
> Jay
> 
> jay wrote:
> > Hello,

-- 
Alexey Lyashkov <alexey.lyashkov at clusterstor.com>
ClusterStor

jay

2010-Jan-24 01:01 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

Alexey Lyashkov wrote:> We have an idea to spawn a per file readahead thread for each process,
> and this thread can be used to issue the readahead RPC async.
>   
> I correctly understand: you suggest a spawn one new thread per open
> file?
> so if client have 10 processes, and each process is open 100 files, you
> need spawn 1000 new threads?
>   No, per process readahead, or some system readahead thread pool, this is 
because most of those threads are sleeping, and it consumes little time 
to issue readahead requests. The idea behind the scheme is to issue 
readahead rpcs async.

BTW, I''m not going to implement what you mentioned in linux, because I 
don''t think this is a good idea, as what I said in design doc. However,
we HAVE to have an async thread pool to implement readahead for windows. 
Windows doesn''t have an interface of issuing async read request, lack
of
a mechanism to have page lock or similar things - what a pity!

Jay>
> On Fri, 2010-01-22 at 18:53 +0800, jay wrote:
>   
>> UP!!
>>
>> Here is a pdf version of design document. Also I''m attaching a
picture
>> because the pictures in pdf is not clear.
>>
>> Thanks,
>> Jay
>>
>> jay wrote:
>>     
>>> Hello,
>>>       
>
>
>   

-- 
Good good study, day day up

Alexey Lyashkov

2010-Jan-24 09:18 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

On Sun, 2010-01-24 at 09:01 +0800, jay wrote:> Alexey Lyashkov wrote:
> > We have an idea to spawn a per file readahead thread for each process,
> > and this thread can be used to issue the readahead RPC async.
> >   
> > I correctly understand: you suggest a spawn one new thread per open
> > file?
> > so if client have 10 processes, and each process is open 100 files,
you
> > need spawn 1000 new threads?
> >   
> No, per process readahead, or some system readahead thread pool, this is 
> because most of those threads are sleeping, and it consumes little time 
> to issue readahead requests. The idea behind the scheme is to issue 
> readahead rpcs async.first case is same as i say (i think) - 10 processes reading from own
files, so will be spawn 1000 new threads.
in second case you will be lost readahead requests on hardloaded client.
> 
> BTW, I''m not going to implement what you mentioned in linux,
because I
> don''t think this is a good idea, as what I said in design doc.
However,
> we HAVE to have an async thread pool to implement readahead for windows. 
> Windows doesn''t have an interface of issuing async read request,
lack of
> a mechanism to have page lock or similar things - what a pity!hm.. looks i don''t understand problem. Currently linux client is using
->readpage() to generate OST_READ RPC and sending via ptlrpcd-io.
Why isn''t generate this RPC directly for Windows? Or you mean about
update asynchronous update VM cache ?


-- 
Alexey Lyashkov <alexey.lyashkov at clusterstor.com>
ClusterStor

Nicolas Williams

2010-Jan-25 04:05 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

On Sun, Jan 24, 2010 at 09:01:46AM +0800, jay wrote:> Alexey Lyashkov wrote:
> > I correctly understand: you suggest a spawn one new thread per open
> > file?
> > so if client have 10 processes, and each process is open 100 files,
you
> > need spawn 1000 new threads?
> >   
> No, per process readahead, or some system readahead thread pool, this is 
> because most of those threads are sleeping, and it consumes little time 
> to issue readahead requests. The idea behind the scheme is to issue 
> readahead rpcs async.
Sleeping threads do consume memory resources, and context switches
between them do add cache pressure.  The read ahead work should all be
async, in which case you need no more readahead threads than you have
CPUs.
> BTW, I''m not going to implement what you mentioned in linux,
because I
> don''t think this is a good idea, as what I said in design doc.
However,
> we HAVE to have an async thread pool to implement readahead for windows. 
> Windows doesn''t have an interface of issuing async read request,
lack of
> a mechanism to have page lock or similar things - what a pity!
But surely you can still do the readaheads asynchronously.  Say you
think that block N of some file will be needed soon: so you issue the
read ahead of time.  You''ll need to place the data somewhere, and
hopefully that will be somewhere that the host OS''s VFS sub-system
(Windows in your case) can either provide or accept -- if not you''ll
need to do a copy later, but you''re still able to send the read
request,
and process the reply, asynchronously.

Nico
--

jay

2010-Jan-25 06:17 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

Nico and shadow,

Since you have the same question about windows, I just replied them in 
one email. Also I got Matt involved - he is a windows expert.


Alexey Lyashkov wrote:> On Sun, 2010-01-24 at 09:01 +0800, jay wrote:
>   
>> Alexey Lyashkov wrote:
>>     
>>> We have an idea to spawn a per file readahead thread for each
process,
>>> and this thread can be used to issue the readahead RPC async.
>>>   
>>> I correctly understand: you suggest a spawn one new thread per open
>>> file?
>>> so if client have 10 processes, and each process is open 100 files,
you
>>> need spawn 1000 new threads?
>>>   
>>>       
>> No, per process readahead, or some system readahead thread pool, this
is
>> because most of those threads are sleeping, and it consumes little time
>> to issue readahead requests. The idea behind the scheme is to issue 
>> readahead rpcs async.
>>     
> first case is same as i say (i think) - 10 processes reading from own
> files, so will be spawn 1000 new threads.
> in second case you will be lost readahead requests on hardloaded client.
>   Nod - that''s why I''m not going to do it in linux - as the
design doc
said - don''t you see section 8.1? :-)>   
>> BTW, I''m not going to implement what you mentioned in linux,
because I
>> don''t think this is a good idea, as what I said in design doc.
However,
>> we HAVE to have an async thread pool to implement readahead for
windows.
>> Windows doesn''t have an interface of issuing async read
request, lack of
>> a mechanism to have page lock or similar things - what a pity!
>>     
> hm.. looks i don''t understand problem. Currently linux client is
using
> ->readpage() to generate OST_READ RPC and sending via ptlrpcd-io.
> Why isn''t generate this RPC directly for Windows? Or you mean
about
> update asynchronous update VM cache ?
>   
The problem is that we have to wait for the RPC(which may just contain 
readahead pages) to be finished before we can return to user space. You 
may ask why we can''t do this, the answer is that we should pipeline the
readahead request, instead of reading a chunk of data to support readahead.

The problem of windows is that it''s lack of interfaces to manipulate 
pages. I''m not a windows expert - please ask matt if you have windows 
specific questions.
>
>   

-- 
Good good study, day day up

Matt Wu

2010-Jan-25 06:55 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

We need do readahead asynchronously, but Windows kernel doesn''t give us
an
easy solution. Here are the issues for Windows readahead:

1, Windows kenrel (VM) doesn''t provide kernel drivers an equivalent 
grab_cache_page_nowait_gfp() to allocate an empty/invalid page. So in 
ll_readpage(), it''s too late for WNC to grab more pages for readahead.

2, The routines provided by Windows kernel to allocate page cache are 
synchronous and they won''t return until the requested pages are
fetched.

So we plan to start a thread pool, and dispatch the readahead requests to 
these threads instead of blocking user thread.

We can group the threads by several ways:
1, request per random thread, without any specify order. we just start a 
fixed number of threads and queue the readahead request to any thread of 
the thread pool.
    this is the decision we made during WNC readahead meeting last week.
2, thread per file (file) or thread per open instance (fd)
3, thread per ost, we need divide the readahead request to several which 
are stripe boundary aligned.


regards,
matt

On 2010/1/25 12:05, Nicolas Williams wrote:> On Sun, Jan 24, 2010 at 09:01:46AM +0800, jay wrote:
>> Alexey Lyashkov wrote:
>>> I correctly understand: you suggest a spawn one new thread per open
>>> file?
>>> so if client have 10 processes, and each process is open 100 files,
you
>>> need spawn 1000 new threads?
>>>
>> No, per process readahead, or some system readahead thread pool, this
is
>> because most of those threads are sleeping, and it consumes little time
>> to issue readahead requests. The idea behind the scheme is to issue
>> readahead rpcs async.
>
> Sleeping threads do consume memory resources, and context switches
> between them do add cache pressure.  The read ahead work should all be
> async, in which case you need no more readahead threads than you have
> CPUs.
>
>> BTW, I''m not going to implement what you mentioned in linux,
because I
>> don''t think this is a good idea, as what I said in design doc.
However,
>> we HAVE to have an async thread pool to implement readahead for
windows.
>> Windows doesn''t have an interface of issuing async read
request, lack of
>> a mechanism to have page lock or similar things - what a pity!
>
> But surely you can still do the readaheads asynchronously.  Say you
> think that block N of some file will be needed soon: so you issue the
> read ahead of time.  You''ll need to place the data somewhere, and
> hopefully that will be somewhere that the host OS''s VFS sub-system
> (Windows in your case) can either provide or accept -- if not
you''ll
> need to do a copy later, but you''re still able to send the read
request,
> and process the reply, asynchronously.
>
> Nico

Andreas Dilger

2010-Jan-25 07:23 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

On 2010-01-24, at 23:55, Matt Wu wrote:> We can group the threads by several ways:
> 1, request per random thread, without any specify order. we just  
> start a
> fixed number of threads and queue the readahead request to any  
> thread of
> the thread pool.
>    this is the decision we made during WNC readahead meeting last  
> week.
> 2, thread per file (file) or thread per open instance (fd)
> 3, thread per ost, we need divide the readahead request to several  
> which
> are stripe boundary aligned.
In order to keep the readahead pages local to the NUMA node that the  
userspace thread is running on, I''d recommend at most a single  
readahead thread per core.  That way, when the readahead thread is  
allocating pages they will be on the right NUMA node.
> On 2010/1/25 12:05, Nicolas Williams wrote:
>> On Sun, Jan 24, 2010 at 09:01:46AM +0800, jay wrote:
>>> Alexey Lyashkov wrote:
>>>> I correctly understand: you suggest a spawn one new thread per
open
>>>> file?
>>>> so if client have 10 processes, and each process is open 100  
>>>> files, you
>>>> need spawn 1000 new threads?
>>>>
>>> No, per process readahead, or some system readahead thread pool,  
>>> this is
>>> because most of those threads are sleeping, and it consumes little
>>> time
>>> to issue readahead requests. The idea behind the scheme is to issue
>>> readahead rpcs async.
>>
>> Sleeping threads do consume memory resources, and context switches
>> between them do add cache pressure.  The read ahead work should all  
>> be
>> async, in which case you need no more readahead threads than you have
>> CPUs.
>>
>>> BTW, I''m not going to implement what you mentioned in
linux,
>>> because I
>>> don''t think this is a good idea, as what I said in design
doc.
>>> However,
>>> we HAVE to have an async thread pool to implement readahead for  
>>> windows.
>>> Windows doesn''t have an interface of issuing async read
request,
>>> lack of
>>> a mechanism to have page lock or similar things - what a pity!
>>
>> But surely you can still do the readaheads asynchronously.  Say you
>> think that block N of some file will be needed soon: so you issue the
>> read ahead of time.  You''ll need to place the data somewhere,
and
>> hopefully that will be somewhere that the host OS''s VFS
sub-system
>> (Windows in your case) can either provide or accept -- if not
you''ll
>> need to do a copy later, but you''re still able to send the
read
>> request,
>> and process the reply, asynchronously.
>>
>> Nico
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Nicolas Williams

2010-Jan-25 15:34 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

On Mon, Jan 25, 2010 at 12:23:03AM -0700, Andreas Dilger
wrote:> On 2010-01-24, at 23:55, Matt Wu wrote:
> >We can group the threads by several ways:
> >1, request per random thread, without any specify order. we just
> >start a fixed number of threads and queue the readahead request to
> >any  thread of the thread pool.  this is the decision we made during
> >WNC readahead meeting last  week.
> >2, thread per file (file) or thread per open instance (fd)
> >3, thread per ost, we need divide the readahead request to several
> >which are stripe boundary aligned.
> 
> In order to keep the readahead pages local to the NUMA node that the  
> userspace thread is running on, I''d recommend at most a single  
> readahead thread per core.  That way, when the readahead thread is  
> allocating pages they will be on the right NUMA node.
That was my recommendation as well, but if I understand Matt correctly,
the Windows VFS makes it impossible to do readahead asynchronously,
which is why Matt suggests having many threads.  I have no clue as to
the relevant Windows kernel APIs, but if Matt''s right about Windows,
then color me surprised.  Assuming that''s correct and that
there''s no
reasonable way around the problem, then I''d recommend having a pool
with
some number of threads (say, 3 * CPUs), with readaheads done only when
there are threads available in the pool.

Nico
--

Alex Zhuravlev

2010-Jan-26 10:02 UTC

head link

[Lustre-devel] proposal on implementing a new readahead in clio

Hi,

I think we could help a lot if you restructure the proposal a bit:

first of all, describe the algorithm w/o implementation details,
probably using notion of event: read event extending window, io to
get data ahead, read-ahead io completion, hit/miss, etc.

then map these events to specific code paths? explain what kind of
information/mechanism layers miss to implement the algorithm?

z.

On 1/20/10 3:37 PM, jay wrote:> Hello,
>
> We have discussed the implementation of new readahead in CLIO. Here I
> just send the design document out to ask for comments.
>
> We have already had several inputs from Z(aka bzzz) and other engineers.
> I''m not going to copy those ideas here directly, because
I''m afraid to
> distort something, so please reply this email to show your ideas -
> thanks in advance.
>
> Jay
>
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Lustre devel - Jan 2010 - proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio

[Lustre-devel] proposal on implementing a new readahead in clio