Hi All, These days, I am investigating an issue regarding OCFS2 unexpected reboot in some real world use cases. This problem occurred when the network status goes south, when the disk IO load is too high, etc... I suspect it might caused by ocfs2 fencing if it's BIO reading/writing can not be scheduled and processed quickly, or something like this happened in the network IO heartbeat thread. Now am trying to reproduce this problem locally. In the meantime, I'd like to ping you guys with some rough ideas to improve the disk IO heartbeat to see if they are sounds reasonable or not. Firstly, if an OCFS2 node is suffer from heavy disk IO, how about to fix the bio read/write to make this IO request can not be preempted by other requests? e.g, for o2hb_issue_node_write(), currently, it do bio submission with WRITE only, 'submit_bio(WRITE, bio)'. If we change the flag to WRITE_SYNC, or even submit the request combine with REQ_FUA, maybe could get highest priority for disk IO request. Secondly, the comments for bio allocation at o2hb_setup_one_bio() indicates that we can pre-allocate bio instead of acquire for each time. But I have not saw any code snippet doing such things in kernel. :( how about creating a private bio set for each o2hb_region, so that we can do allocation out of it? maybe it's faster than do allocation from global bio sets. Also, does it make sense if creating a memory pool on each o2hb_region, so that we can have continuous pages bind to those bios? Any comments are appreciated! Thanks, -Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120822/c454d0b4/attachment.html
On 8/22/2012 7:17 AM, Jie Liu wrote:> Hi All, > > These days, I am investigating an issue regarding OCFS2 unexpected > reboot in some real world use cases. > This problem occurred when the network status goes south, when the > disk IO load is too high, etc... > I suspect it might caused by ocfs2 fencing if it's BIO reading/writing > can not be scheduled and processed quickly, or > something like this happened in the network IO heartbeat thread. > > Now am trying to reproduce this problem locally. In the meantime, I'd > like to ping you guys with some rough ideas > to improve the disk IO heartbeat to see if they are sounds reasonable > or not. > > Firstly, if an OCFS2 node is suffer from heavy disk IO, how about to > fix the bio read/write to make this IO request can not > be preempted by other requests? e.g, for o2hb_issue_node_write(), > currently, it do bio submission with WRITE only, > 'submit_bio(WRITE, bio)'. If we change the flag to WRITE_SYNC, or > even submit the request combine with REQ_FUA, > maybe could get highest priority for disk IO request.This was submitted before by Noboru Iwamatsu and acked by sunil and tao but some how didn't get merged https://oss.oracle.com/pipermail/ocfs2-devel/2011-December/008438.html> > Secondly, the comments for bio allocation at o2hb_setup_one_bio() > indicates that we can pre-allocate bio instead of > acquire for each time. But I have not saw any code snippet doing such > things in kernel. :( > how about creating a private bio set for each o2hb_region, so that we > can do allocation out of it? > maybe it's faster than do allocation from global bio sets. Also, does > it make sense if creating a memory pool > on each o2hb_region, so that we can have continuous pages bind to > those bios? > > > Any comments are appreciated! > > Thanks, > -Jeff > > > > > > > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120822/7967b991/attachment-0001.html
Hi Jeff, On 08/22/2012 10:17 PM, Jie Liu wrote:> Hi All, > > These days, I am investigating an issue regarding OCFS2 unexpected > reboot in some real world use cases. > This problem occurred when the network status goes south, when the disk > IO load is too high, etc... > I suspect it might caused by ocfs2 fencing if it's BIO reading/writing > can not be scheduled and processed quickly, or > something like this happened in the network IO heartbeat thread. > > Now am trying to reproduce this problem locally. In the meantime, I'd > like to ping you guys with some rough ideas > to improve the disk IO heartbeat to see if they are sounds reasonable or > not. > > Firstly, if an OCFS2 node is suffer from heavy disk IO, how about to fix > the bio read/write to make this IO request can not > be preempted by other requests? e.g, for o2hb_issue_node_write(), > currently, it do bio submission with WRITE only, > 'submit_bio(WRITE, bio)'. If we change the flag to WRITE_SYNC, or even > submit the request combine with REQ_FUA, > maybe could get highest priority for disk IO request.yes, WRITE is too much easy to be preempted in cfq since it is in the async queue, so if we have a lot of read or sync write, the heartbeat write will be delayed a lot. So as Srini has said in another e-mail, WRITE_SYNC should help in case it will let you have the chance of be a sync write and be treated the same as the read. Please check it first to see whether it is OK. I guess the final solution will be WRITE_FUA, and I see btrfs uses it to write out the superblock. It will be handled differently by the underlying block layer so that it will not be in the elevator queue. It should work but I am not sure whether we need to be so aggressive. Maybe some test will show. Thanks Tao> > Secondly, the comments for bio allocation at o2hb_setup_one_bio() > indicates that we can pre-allocate bio instead of > acquire for each time. But I have not saw any code snippet doing such > things in kernel. :( > how about creating a private bio set for each o2hb_region, so that we > can do allocation out of it? > maybe it's faster than do allocation from global bio sets. Also, does > it make sense if creating a memory pool > on each o2hb_region, so that we can have continuous pages bind to those > bios? > > > Any comments are appreciated! > > Thanks, > -Jeff > > > > > >