Jeremy Fitzhardinge
2009-Feb-04 19:55 UTC
[Xen-devel] [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries
From: Jens Axboe <jens.axboe@oracle.com> On occasion, the request will apparently have more segments than we fit into the ring. Jens says:> The second problem is that the block layer then appears to create one > too many segments, but from the dump it has rq->nr_phys_segments => BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to > xen-blkfront not handling the merging on its own. It should check that > the new page doesn''t form part of the previous page. The > rq_for_each_segment() iterates all single bits in the request, not dma > segments. The "easiest" way to do this is to call blk_rq_map_sg() and > then iterate the mapped sg list. That will give you what you are > looking for.> Here''s a test patch, compiles but otherwise untested. I spent more > time figuring out how to enable XEN than to code it up, so YMMV! > Probably the sg list wants to be put inside the ring and only > initialized on allocation, then you can get rid of the sg on stack and > sg_init_table() loop call in the function. I''ll leave that, and the > testing, to you.[Moved sg array into info structure, and initialize once. -J] Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> --- drivers/block/xen-blkfront.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) ==================================================================--- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -40,6 +40,7 @@ #include <linux/hdreg.h> #include <linux/cdrom.h> #include <linux/module.h> +#include <linux/scatterlist.h> #include <xen/xenbus.h> #include <xen/grant_table.h> @@ -82,6 +83,7 @@ enum blkif_state connected; int ring_ref; struct blkif_front_ring ring; + struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; unsigned int evtchn, irq; struct request_queue *rq; struct work_struct work; @@ -204,12 +206,11 @@ struct blkfront_info *info = req->rq_disk->private_data; unsigned long buffer_mfn; struct blkif_request *ring_req; - struct req_iterator iter; - struct bio_vec *bvec; unsigned long id; unsigned int fsect, lsect; - int ref; + int i, ref; grant_ref_t gref_head; + struct scatterlist *sg; if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) return 1; @@ -238,12 +239,13 @@ if (blk_barrier_rq(req)) ring_req->operation = BLKIF_OP_WRITE_BARRIER; - ring_req->nr_segments = 0; - rq_for_each_segment(bvec, req, iter) { - BUG_ON(ring_req->nr_segments == BLKIF_MAX_SEGMENTS_PER_REQUEST); - buffer_mfn = pfn_to_mfn(page_to_pfn(bvec->bv_page)); - fsect = bvec->bv_offset >> 9; - lsect = fsect + (bvec->bv_len >> 9) - 1; + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); + + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); + fsect = sg->offset >> 9; + lsect = fsect + (sg->length >> 9) - 1; /* install a grant reference. */ ref = gnttab_claim_grant_reference(&gref_head); BUG_ON(ref == -ENOSPC); @@ -254,16 +256,12 @@ buffer_mfn, rq_data_dir(req) ); - info->shadow[id].frame[ring_req->nr_segments] - mfn_to_pfn(buffer_mfn); - - ring_req->seg[ring_req->nr_segments] + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); + ring_req->seg[i] (struct blkif_request_segment) { .gref = ref, .first_sect = fsect, .last_sect = lsect }; - - ring_req->nr_segments++; } info->ring.req_prod_pvt++; @@ -622,6 +620,8 @@ SHARED_RING_INIT(sring); FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE); + sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST); + err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring)); if (err < 0) { free_page((unsigned long)sring); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Greg Harris
2009-Feb-06 20:47 UTC
[Xen-devel] Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries
It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic. We''ve been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software. Previously we were running into these panics towards the end of file-system creation. Will this patch be part of the next upstream kernel release? Thank you all for your help in tracking down this issue. --- Greg Harris System Administrator MetaCarta, Inc. (O) +1 (617) 301-5530 (M) +1 (781) 258-4474 ----- "Jeremy Fitzhardinge" <jeremy@goop.org> wrote:> From: Jens Axboe <jens.axboe@oracle.com> > > On occasion, the request will apparently have more segments than we > fit into the ring. Jens says: > > > The second problem is that the block layer then appears to create > one > > too many segments, but from the dump it has rq->nr_phys_segments => > BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to > > xen-blkfront not handling the merging on its own. It should check > that > > the new page doesn''t form part of the previous page. The > > rq_for_each_segment() iterates all single bits in the request, not > dma > > segments. The "easiest" way to do this is to call blk_rq_map_sg() > and > > then iterate the mapped sg list. That will give you what you are > > looking for. > > > Here''s a test patch, compiles but otherwise untested. I spent more > > time figuring out how to enable XEN than to code it up, so YMMV! > > Probably the sg list wants to be put inside the ring and only > > initialized on allocation, then you can get rid of the sg on stack > and > > sg_init_table() loop call in the function. I''ll leave that, and the > > testing, to you. > > [Moved sg array into info structure, and initialize once. -J] > > Signed-off-by: Jens Axboe <jens.axboe@oracle.com> > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > drivers/block/xen-blkfront.c | 30 +++++++++++++++--------------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > ==================================================================> --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -40,6 +40,7 @@ > #include <linux/hdreg.h> > #include <linux/cdrom.h> > #include <linux/module.h> > +#include <linux/scatterlist.h> > > #include <xen/xenbus.h> > #include <xen/grant_table.h> > @@ -82,6 +83,7 @@ > enum blkif_state connected; > int ring_ref; > struct blkif_front_ring ring; > + struct scatterlist sg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > unsigned int evtchn, irq; > struct request_queue *rq; > struct work_struct work; > @@ -204,12 +206,11 @@ > struct blkfront_info *info = req->rq_disk->private_data; > unsigned long buffer_mfn; > struct blkif_request *ring_req; > - struct req_iterator iter; > - struct bio_vec *bvec; > unsigned long id; > unsigned int fsect, lsect; > - int ref; > + int i, ref; > grant_ref_t gref_head; > + struct scatterlist *sg; > > if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) > return 1; > @@ -238,12 +239,13 @@ > if (blk_barrier_rq(req)) > ring_req->operation = BLKIF_OP_WRITE_BARRIER; > > - ring_req->nr_segments = 0; > - rq_for_each_segment(bvec, req, iter) { > - BUG_ON(ring_req->nr_segments == BLKIF_MAX_SEGMENTS_PER_REQUEST); > - buffer_mfn = pfn_to_mfn(page_to_pfn(bvec->bv_page)); > - fsect = bvec->bv_offset >> 9; > - lsect = fsect + (bvec->bv_len >> 9) - 1; > + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > + > + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > + fsect = sg->offset >> 9; > + lsect = fsect + (sg->length >> 9) - 1; > /* install a grant reference. */ > ref = gnttab_claim_grant_reference(&gref_head); > BUG_ON(ref == -ENOSPC); > @@ -254,16 +256,12 @@ > buffer_mfn, > rq_data_dir(req) ); > > - info->shadow[id].frame[ring_req->nr_segments] > - mfn_to_pfn(buffer_mfn); > - > - ring_req->seg[ring_req->nr_segments] > + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > + ring_req->seg[i] > (struct blkif_request_segment) { > .gref = ref, > .first_sect = fsect, > .last_sect = lsect }; > - > - ring_req->nr_segments++; > } > > info->ring.req_prod_pvt++; > @@ -622,6 +620,8 @@ > SHARED_RING_INIT(sring); > FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE); > > + sg_init_table(info->sg, BLKIF_MAX_SEGMENTS_PER_REQUEST); > + > err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring)); > if (err < 0) { > free_page((unsigned long)sring);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Feb-06 21:16 UTC
[Xen-devel] Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg to generate ring entries
Greg Harris wrote:> It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic. We''ve been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software. Previously we were running into these panics towards the end of file-system creation. >Thanks very much for testing.> Will this patch be part of the next upstream kernel release? >Yes, and it should go into the next stable kernel too. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Feb-09 09:52 UTC
[Xen-devel] Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate ring entries
>>> Jeremy Fitzhardinge <jeremy@goop.org> 06.02.09 22:16 >>> >Greg Harris wrote: >> It looks like this patch applied against 2.6.28.2 resolves the issue for us -- our spinner has been running for two days now without a panic. We''ve been testing this by spinning around an installer which creates file systems across two disks and then installs a base Linux system and some additional software. Previously we were running into these panics towards the end of file-system creation. >> > >Thanks very much for testing. > >> Will this patch be part of the next upstream kernel release? >> > >Yes, and it should go into the next stable kernel too.But isn''t this just reducing the likelihood of hitting the problem? Without understanding why the entry limit gets exceeded (and fixing that), the potential for this to happen again is still there. Of course, the change made here has its own benefits, so I fully agree it should be applied (though I wonder whether it''s indeed a -stable candidate). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Feb-09 17:35 UTC
Re: [Xen-devel] Re: [TEST PATCH] xen/blkfront: use blk_rq_map_sg togenerate ring entries
Jan Beulich wrote:> But isn''t this just reducing the likelihood of hitting the problem? Without > understanding why the entry limit gets exceeded (and fixing that), the > potential for this to happen again is still there. >I had the same thought, but Jens says it fixes the real problem, so that satisfies me ;) Greg reported that simply decreasing the max segments to one less than the ring size was sufficient to clear up the symptoms, which indicates it was never submitting more than one extra segment. Jens writes:> The second problem is > that the block layer then appears to create one too many segments, but > from the dump it has rq->nr_phys_segments => BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to > xen-blkfront not handling the merging on its own. It should check that > the new page doesn''t form part of the previous page.which suggests that blkfront was omitting a merging check that driver should perform, results in this problem. I don''t really understand why the upper layers can''t do this merge check, but I''m sure Jens has a good reason.> Of course, the change made here has its own benefits, so I fully agree > it should be applied (though I wonder whether it''s indeed a -stable > candidate).Well, it appears to be the proper fix to a real bug observed in the field, which makes it an ideal -stable candidate. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel