Hi Eric,
On 2015/11/14 13:23, Eric Ren wrote:> Hi Joseph,
>
>>>> > >> 2. ocfs2cmt does periodically commit.
>>>> > >>
>>>> > >> One case can lead to long time downconvert is, it
is indeed that it has
>>>> > >> too much work to do. I am not sure if there are
any other cases or code
>>>> > >> bug.
>>> > > OK, not familiar with ocfs2cmt. Could I bother you to
explain what ocfs2cmt is used to do,
>>> > > it's relation with R/W, and why down-conversion can
be triggered by when it commits?
>> > Sorry, the above explanation is not right and may mislead you.
>> >
>> > jbd2/xxx (previously called kjournald2?) does periodically commit,
>> > the default interval is 5s and can be set with mount option
"commit=".
>> >
>> > ocfs2cmt does the checkpoint, it can be waked up:
>> > a) unblock lock during downconvert, and if jbd2/xxx has already
done the
>> > commit, ocfs2cmt won't be actually waken up because it has
already been
>> > checkpointed. So ocfs2cmt works with jbd2/xxx.
> OK, thanks for your knowledge;-)
>> > b) evict inode and then do downconvert.
> Sorry, I'm confused about b). You mean b) is also part of
ocfs2cmt's
> work? Does b) have something to do with a)? And what's the meaning of
"evict inode"?
> Actually, I can hardly understand the idea of b).
You can go through the code flow:
iput->iput_final->evict->evict_inode->ocfs2_evict_inode
->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint
It happens that one node do not use the inode any longer (but not
delete), and will free its related lockres.
Thanks,
Joseph
>> >
>>>>> > >>> Could you describes more in this case?
>>>>>> > >>>> And it seemed reasonable because
it had to.
>>>>>> > >>>>
>>>>>> > >>>> Node 1 wrote file, and node 2
read it. Since you used buffer io, that
>>>>>> > >>>> was after node 1 had finished
written, it might be still in page cache.
>>>>> > >>> Sorry, I cannot understand the
relationship between "still in page case" and
"so...downconvert".
>>>>>> > >>>> So node 1 should downconvert
first then node 2 read could continue.
>>>>>> > >>>> That was why you said it seemed
ocfs2_inode_lock_with_page spent most
>>>>> > >>> Actually, it suprises me more with such
long time spent than the *most* time compared to "readpage" stuff ;-)
>>>>>> > >>>> time. More specifically, it was
ocfs2_inode_lock after trying nonblock
>>>>>> > >>>> lock and returning -EAGAIN.
>>>>> > >>> You mean read process would repeatedly
try nonblock lock until write process down-convertion completes?
>>>> > >> No, after nonblock lock returning -EAGAIN, it
will unlock page and then
>>>> > >> call ocfs2_inode_lock and ocfs2_inode_unlock. And
ocfs2_inode_lock will
>>> > > Yes.
>>>> > >> wait until downconvert completion in another
node.
>>> > > Another node which read or write process on?
>> > Yes, the node blocks my request.
>> > For example, node 1 has EX, then node 2 wants to get PR, it should
wait
>> > for node 1 downconvert first.
> OK~
>
> Thanks,
> Eric