piaojun
2018-May-09 08:50 UTC
[Ocfs2-devel] [PATCH] ocfs2: submit another bio if current bio is full
Hi Changwei, On 2018/5/8 23:57, Changwei Ge wrote:> Hi Jun, > > Sorry for this so late reply since I was very busy those days. > > > On 04/16/2018 11:44 AM, piaojun wrote: >> Hi Changwei, >> >> Do you mean that if the slotnum exceed 16 like 'mkfs.ocfs2 -N 17', you >> still let it go rather than reture error? > > If your assumption is right, do you mean that ocfs2 slots can't exceed 16? > > If you return error once slots exceed 16, mkfs will never succeed. > > So if we can ensure that bio is full in current iteration, we should run > into next iteration and allocate a > new bio adding pages and continue. > > And your patch makes my ocfs2-test fail. > > > Thanks, > Changwei > >> >> thanks, >> Jun >> >> On 2018/4/13 13:51, Changwei Ge wrote: >>> If cluster scale exceeds 16 nodes, bio will be full and bio_add_page()Sorry for misunderstanding your fix, and do you mean that the node num is a little big which could not be covered by 16 pages, such as 129? "one page could cover 8 node's slots" thanks, Jun>>> returns 0 when adding pages to bio. Returning -EIO to o2hb_read_slots() >>> from o2hb_setup_one_bio() will lead to losing chance to allocate more >>> bios to present all heartbeat region. >>> >>> So o2hb_read_slots() fails. >>> >>> In my test, making fs fails in starting o2cb service. >>> >>> Attach error log: >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0 >>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0, bi_sector 8192 >>> (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5 >>> (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5 >>> (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5 >>> >>> Fixes: ba16ddfbeb9d ("ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" >>> >>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >>> --- >>> fs/ocfs2/cluster/heartbeat.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c >>> index 91a8889abf9b..2809e29d612d 100644 >>> --- a/fs/ocfs2/cluster/heartbeat.c >>> +++ b/fs/ocfs2/cluster/heartbeat.c >>> @@ -540,11 +540,12 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, >>> struct bio *bio; >>> struct page *page; >>> >>> +#define O2HB_BIO_VECS 16 >>> /* Testing has shown this allocation to take long enough under >>> * GFP_KERNEL that the local node can get fenced. It would be >>> * nicest if we could pre-allocate these bios and avoid this >>> * all together. */ >>> - bio = bio_alloc(GFP_ATOMIC, 16); >>> + bio = bio_alloc(GFP_ATOMIC, O2HB_BIO_VECS); >>> if (!bio) { >>> mlog(ML_ERROR, "Could not alloc slots BIO!\n"); >>> bio = ERR_PTR(-ENOMEM); >>> @@ -570,7 +571,10 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, >>> current_page, vec_len, vec_start); >>> >>> len = bio_add_page(bio, page, vec_len, vec_start); >>> - if (len != vec_len) { >>> + if (len == 0 && current_page == O2HB_BIO_VECS) { >>> + /* bio is full now. */ >>> + goto bail; >>> + } else if (len != vec_len) { >>> mlog(ML_ERROR, "Adding page[%d] to bio failed, " >>> "page %p, len %d, vec_len %u, vec_start %u, " >>> "bi_sector %llu\n", current_page, page, len, >>> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel at oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Changwei Ge
2018-May-09 09:06 UTC
[Ocfs2-devel] [PATCH] ocfs2: submit another bio if current bio is full
Hi Jun, On 2018/5/9 16:50, piaojun wrote:> Hi Changwei, > > On 2018/5/8 23:57, Changwei Ge wrote: >> Hi Jun, >> >> Sorry for this so late reply since I was very busy those days. >> >> >> On 04/16/2018 11:44 AM, piaojun wrote: >>> Hi Changwei, >>> >>> Do you mean that if the slotnum exceed 16 like 'mkfs.ocfs2 -N 17', you >>> still let it go rather than reture error? >> If your assumption is right, do you mean that ocfs2 slots can't exceed 16? >> >> If you return error once slots exceed 16, mkfs will never succeed. >> >> So if we can ensure that bio is full in current iteration, we should run >> into next iteration and allocate a >> new bio adding pages and continue. >> >> And your patch makes my ocfs2-test fail. >> >> >> Thanks, >> Changwei >> >>> thanks, >>> Jun >>> >>> On 2018/4/13 13:51, Changwei Ge wrote: >>>> If cluster scale exceeds 16 nodes, bio will be full and bio_add_page() > Sorry for misunderstanding your fix, and do you mean that the node num is > a little big which could not be covered by 16 pages, such as 129? > > "one page could cover 8 node's slots"It has nothing to do with the capacity of page holding slots. It's about how many vecs a bio can have. For your reference, bio_alloc() has set the maximum vec to 16 in o2hb_setup_one_bio() as precondition. Thanks, Changwei> > thanks, > Jun > >>>> returns 0 when adding pages to bio. Returning -EIO to o2hb_read_slots() >>>> from o2hb_setup_one_bio() will lead to losing chance to allocate more >>>> bios to present all heartbeat region. >>>> >>>> So o2hb_read_slots() fails. >>>> >>>> In my test, making fs fails in starting o2cb service. >>>> >>>> Attach error log: >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0 >>>> (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0, bi_sector 8192 >>>> (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5 >>>> (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5 >>>> (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5 >>>> >>>> Fixes: ba16ddfbeb9d ("ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio" >>>> >>>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >>>> --- >>>> fs/ocfs2/cluster/heartbeat.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c >>>> index 91a8889abf9b..2809e29d612d 100644 >>>> --- a/fs/ocfs2/cluster/heartbeat.c >>>> +++ b/fs/ocfs2/cluster/heartbeat.c >>>> @@ -540,11 +540,12 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, >>>> struct bio *bio; >>>> struct page *page; >>>> >>>> +#define O2HB_BIO_VECS 16 >>>> /* Testing has shown this allocation to take long enough under >>>> * GFP_KERNEL that the local node can get fenced. It would be >>>> * nicest if we could pre-allocate these bios and avoid this >>>> * all together. */ >>>> - bio = bio_alloc(GFP_ATOMIC, 16); >>>> + bio = bio_alloc(GFP_ATOMIC, O2HB_BIO_VECS); >>>> if (!bio) { >>>> mlog(ML_ERROR, "Could not alloc slots BIO!\n"); >>>> bio = ERR_PTR(-ENOMEM); >>>> @@ -570,7 +571,10 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, >>>> current_page, vec_len, vec_start); >>>> >>>> len = bio_add_page(bio, page, vec_len, vec_start); >>>> - if (len != vec_len) { >>>> + if (len == 0 && current_page == O2HB_BIO_VECS) { >>>> + /* bio is full now. */ >>>> + goto bail; >>>> + } else if (len != vec_len) { >>>> mlog(ML_ERROR, "Adding page[%d] to bio failed, " >>>> "page %p, len %d, vec_len %u, vec_start %u, " >>>> "bi_sector %llu\n", current_page, page, len, >>>> >>> _______________________________________________ >>> Ocfs2-devel mailing list >>> Ocfs2-devel at oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel