thr3ads.net - Ocfs2 devel - [Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently [Dec 2015]

If this information is useful, please help other people find it:
Share via:

Gang He

2015-Dec-08 03:21 UTC

[Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently

Hello Guys,

There is a issue from the customer, who is complaining that buffer reading
sometimes lasts too much time ( 1 - 10 seconds) in case reading/writing the same
file from different nodes concurrently.
According to the demo code from the customer, we also can reproduce this issue
at home (run the test program under SLES11SP4 OCFS2 cluster), actually this
issue can be reproduced in openSuSe 13.2 (more newer code), but in direct-io
mode, this issue will disappear.
Base on my investigation, the root cause is the buffer-io using cluster-lock is
different from direct-io, I do not know why buffer-io use cluster-lock like this
way.
the code details are as below,
in aops.c file,
 281 static int ocfs2_readpage(struct file *file, struct page *page)
 282 {
 283         struct inode *inode = page->mapping->host;
 284         struct ocfs2_inode_info *oi = OCFS2_I(inode);
 285         loff_t start = (loff_t)page->index << PAGE_CACHE_SHIFT;
 286         int ret, unlock = 1;
 287
 288         trace_ocfs2_readpage((unsigned long long)oi->ip_blkno,
 289                              (page ? page->index : 0));
 290
 291         ret = ocfs2_inode_lock_with_page(inode, NULL, 0, page);  <<==
this line
 292         if (ret != 0) {
 293                 if (ret == AOP_TRUNCATED_PAGE)
 294                         unlock = 0;
 295                 mlog_errno(ret);
 296                 goto out;
 297         } 
 
in dlmglue.c file,
2 int ocfs2_inode_lock_with_page(struct inode *inode,
2443                               struct buffer_head **ret_bh,
2444                               int ex,
2445                               struct page *page)
2446 {
2447         int ret;
2448
2449         ret = ocfs2_inode_lock_full(inode, ret_bh, ex,
OCFS2_LOCK_NONBLOCK); <<== there, why using NONBLOCK mode to get the
cluster lock? this way will let reading IO get starvation.
2450         if (ret == -EAGAIN) {
2451                 unlock_page(page);
2452                 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
2453                         ocfs2_inode_unlock(inode, ex);
2454                 ret = AOP_TRUNCATED_PAGE;
2455         }
2456
2457         return ret;
2458 }

If you know the background behind the code, please tell us, why not use block
way to get the lock in reading a page, then reading IO will get the page fairly
when there is a concurrent writing IO from the other node.
Second, I tried to modify that line from OCFS2_LOCK_NONBLOCK to 0 (switch to
blocking way), the reading IO will not be blocked too much time (can erase the
customer's complaining), but a new problem arises, sometimes the reading IO
and writing IO get a dead lock (why dead lock? I am looking at).


Thanks
Gang

Joseph Qi

2015-Dec-08 03:55 UTC

head link

[Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently

Hi Gang,
Eric and I have discussed this case before.
Using NONBLOCK here is because there is a lock inversion between inode
lock and page lock. You can refer to the comments of
ocfs2_inode_lock_with_page for details.
Actually I have found that NONBLOCK mode is only used in lock inversion
cases.

Thanks,
Joseph

On 2015/12/8 11:21, Gang He wrote:> Hello Guys,
> 
> There is a issue from the customer, who is complaining that buffer reading
sometimes lasts too much time ( 1 - 10 seconds) in case reading/writing the same
file from different nodes concurrently.
> According to the demo code from the customer, we also can reproduce this
issue at home (run the test program under SLES11SP4 OCFS2 cluster), actually
this issue can be reproduced in openSuSe 13.2 (more newer code), but in
direct-io mode, this issue will disappear.
> Base on my investigation, the root cause is the buffer-io using
cluster-lock is different from direct-io, I do not know why buffer-io use
cluster-lock like this way.
> the code details are as below,
> in aops.c file,
>  281 static int ocfs2_readpage(struct file *file, struct page *page)
>  282 {
>  283         struct inode *inode = page->mapping->host;
>  284         struct ocfs2_inode_info *oi = OCFS2_I(inode);
>  285         loff_t start = (loff_t)page->index <<
PAGE_CACHE_SHIFT;
>  286         int ret, unlock = 1;
>  287
>  288         trace_ocfs2_readpage((unsigned long long)oi->ip_blkno,
>  289                              (page ? page->index : 0));
>  290
>  291         ret = ocfs2_inode_lock_with_page(inode, NULL, 0, page); 
<<== this line
>  292         if (ret != 0) {
>  293                 if (ret == AOP_TRUNCATED_PAGE)
>  294                         unlock = 0;
>  295                 mlog_errno(ret);
>  296                 goto out;
>  297         } 
>  
> in dlmglue.c file,
> 2 int ocfs2_inode_lock_with_page(struct inode *inode,
> 2443                               struct buffer_head **ret_bh,
> 2444                               int ex,
> 2445                               struct page *page)
> 2446 {
> 2447         int ret;
> 2448
> 2449         ret = ocfs2_inode_lock_full(inode, ret_bh, ex,
OCFS2_LOCK_NONBLOCK); <<== there, why using NONBLOCK mode to get the
cluster lock? this way will let reading IO get starvation.
> 2450         if (ret == -EAGAIN) {
> 2451                 unlock_page(page);
> 2452                 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> 2453                         ocfs2_inode_unlock(inode, ex);
> 2454                 ret = AOP_TRUNCATED_PAGE;
> 2455         }
> 2456
> 2457         return ret;
> 2458 }
> 
> If you know the background behind the code, please tell us, why not use
block way to get the lock in reading a page, then reading IO will get the page
fairly when there is a concurrent writing IO from the other node.
> Second, I tried to modify that line from OCFS2_LOCK_NONBLOCK to 0 (switch
to blocking way), the reading IO will not be blocked too much time (can erase
the customer's complaining), but a new problem arises, sometimes the reading
IO and writing IO get a dead lock (why dead lock? I am looking at).
> 
> 
> Thanks
> Gang  
> 
> 
> 
> .
>

Ocfs2 devel - Dec 2015 - Buffer read will get starvation in case reading/writing the same file from different nodes concurrently

[Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently

[Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently