kraxel@suse.de
2006-Oct-20 08:38 UTC
[Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
protocol changes: * There is a new operation (BLKIF_OP_WRITE_BARRIER) to pass on barrier requests. * There is a new state (BLKIF_RSP_EOPNOTSUPP) to indicate unsupported operations (barrier writes may fail depending on the underlying block device). * A new xenstore node named "feature-barrier" indicates the backend is able to handle barrier writes. The value can be 1 (all is fine) or 0 (underlying block device doesn''t support barriers). blkback changes: Add "feature-barrier" node to indicate barrier support, pass incoming barrier requests to the block layer using submit_bio(WRITE_BARRIER, bio). Some error handling fixes to properly pass through barrier write failures, so the frontend can turn off barriers then. blkfront changes: Check if the backend sets "feature-barrier", if present switch to QUEUE_ORDERED_DRAIN mode. Send off barrier requests to the backend using the new BLKIF_OP_WRITE_BARRIER operation. Also some error handling for the EOPNOTSUPP case. Background: Barriers are needed to make journaling filesystems work reliable. For some requests they need order constrains to make the transactions work correctly. Barrier requests are used to pass that ordering information to the block layer and/or to the device, so the constrains are obeyed when reordering requests for better write performance. Signed-off-by: Gerd Hoffmann <kraxel@suse.de> --- linux-2.6-xen-sparse/drivers/xen/blkback/blkback.c | 41 ++++++++++++++----- linux-2.6-xen-sparse/drivers/xen/blkback/common.h | 3 + linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c | 2 linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c | 31 ++++++++++++++ linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c | 30 +++++++++++-- linux-2.6-xen-sparse/drivers/xen/blkfront/block.h | 2 linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c | 20 ++++++++- xen/include/public/io/blkif.h | 10 ++-- 8 files changed, 117 insertions(+), 22 deletions(-) Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/blkback.c ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkback/blkback.c +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/blkback.c @@ -189,9 +189,9 @@ static void fast_flush_area(pending_req_ static void print_stats(blkif_t *blkif) { - printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d\n", + printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d | br %4d\n", current->comm, blkif->st_oo_req, - blkif->st_rd_req, blkif->st_wr_req); + blkif->st_rd_req, blkif->st_wr_req, blkif->st_br_req); blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); blkif->st_rd_req = 0; blkif->st_wr_req = 0; @@ -241,12 +241,17 @@ int blkif_schedule(void *arg) * COMPLETION CALLBACK -- Called as bh->b_end_io() */ -static void __end_block_io_op(pending_req_t *pending_req, int uptodate) +static void __end_block_io_op(pending_req_t *pending_req, int error) { /* An error fails the entire request. */ - if (!uptodate) { - DPRINTK("Buffer not up-to-date at end of operation\n"); + if (error) { + DPRINTK("Buffer not up-to-date at end of operation, error=%d\n", error); pending_req->status = BLKIF_RSP_ERROR; + if (pending_req->operation == BLKIF_OP_WRITE_BARRIER && error == -EOPNOTSUPP) { + pending_req->status = BLKIF_RSP_EOPNOTSUPP; + blkback_barrier(pending_req->blkif->be, 0); + printk("blkback: write barrier op failed, not supported\n"); + } } if (atomic_dec_and_test(&pending_req->pendcnt)) { @@ -262,7 +267,7 @@ static int end_block_io_op(struct bio *b { if (bio->bi_size != 0) return 1; - __end_block_io_op(bio->bi_private, !error); + __end_block_io_op(bio->bi_private, error); bio_put(bio); return error; } @@ -319,6 +324,9 @@ static int do_block_io_op(blkif_t *blkif blkif->st_rd_req++; dispatch_rw_block_io(blkif, req, pending_req); break; + case BLKIF_OP_WRITE_BARRIER: + blkif->st_br_req++; + /* fall through */ case BLKIF_OP_WRITE: blkif->st_wr_req++; dispatch_rw_block_io(blkif, req, pending_req); @@ -340,7 +348,6 @@ static void dispatch_rw_block_io(blkif_t pending_req_t *pending_req) { extern void ll_rw_block(int rw, int nr, struct buffer_head * bhs[]); - int operation = (req->operation == BLKIF_OP_WRITE) ? WRITE : READ; struct gnttab_map_grant_ref map[BLKIF_MAX_SEGMENTS_PER_REQUEST]; struct phys_req preq; struct { @@ -349,6 +356,22 @@ static void dispatch_rw_block_io(blkif_t unsigned int nseg; struct bio *bio = NULL, *biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST]; int ret, i, nbio = 0; + int operation; + + switch (req->operation) { + case BLKIF_OP_READ: + operation = READ; + break; + case BLKIF_OP_WRITE: + operation = WRITE; + break; + case BLKIF_OP_WRITE_BARRIER: + operation = WRITE_BARRIER; + break; + default: + operation = 0; /* make gcc happy */ + BUG(); + } /* Check that number of segments is sane. */ nseg = req->nr_segments; @@ -364,7 +387,7 @@ static void dispatch_rw_block_io(blkif_t pending_req->blkif = blkif; pending_req->id = req->id; - pending_req->operation = operation; + pending_req->operation = req->operation; pending_req->status = BLKIF_RSP_OKAY; pending_req->nr_pages = nseg; @@ -380,7 +403,7 @@ static void dispatch_rw_block_io(blkif_t preq.nr_sects += seg[i].nsec; flags = GNTMAP_host_map; - if ( operation == WRITE ) + if ( operation != READ ) flags |= GNTMAP_readonly; gnttab_set_map_op(&map[i], vaddr(pending_req, i), flags, req->seg[i].gref, blkif->domid); Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/common.h ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkback/common.h +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/common.h @@ -87,6 +87,7 @@ typedef struct blkif_st { int st_rd_req; int st_wr_req; int st_oo_req; + int st_br_req; wait_queue_head_t waiting_to_free; @@ -131,4 +132,6 @@ void blkif_xenbus_init(void); irqreturn_t blkif_be_int(int irq, void *dev_id, struct pt_regs *regs); int blkif_schedule(void *arg); +int blkback_barrier(struct backend_info *be, int state); + #endif /* __BLKIF__BACKEND__COMMON_H__ */ Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c @@ -91,11 +91,13 @@ static void update_blkif_status(blkif_t VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); +VBD_SHOW(br_req, "%d\n", be->blkif->st_br_req); static struct attribute *vbdstat_attrs[] = { &dev_attr_oo_req.attr, &dev_attr_rd_req.attr, &dev_attr_wr_req.attr, + &dev_attr_br_req.attr, NULL }; @@ -165,6 +167,31 @@ static int blkback_remove(struct xenbus_ return 0; } +int blkback_barrier(struct backend_info *be, int state) +{ + struct xenbus_device *dev = be->dev; + struct xenbus_transaction xbt; + int err; + + do { + err = xenbus_transaction_start(&xbt); + if (err) { + xenbus_dev_fatal(dev, err, "starting transaction"); + return -1; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-barrier", + "%d", state); + if (err) { + xenbus_transaction_end(xbt, 1); + xenbus_dev_fatal(dev, err, "writing feature-barrier"); + return -1; + } + + err = xenbus_transaction_end(xbt, 0); + } while (err == -EAGAIN); + return 0; +} /** * Entry point to this code when a new device is created. Allocate the basic @@ -201,6 +228,10 @@ static int blkback_probe(struct xenbus_d if (err) goto fail; + err = blkback_barrier(be, 1); + if (err) + goto fail; + err = xenbus_switch_state(dev, XenbusStateInitWait); if (err) goto fail; Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c @@ -316,6 +316,12 @@ static void connect(struct blkfront_info return; } + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "feature-barrier", "%lu", &info->feature_barrier, + NULL); + if (err) + info->feature_barrier = 0; + err = xlvbd_add(sectors, info->vdevice, binfo, sector_size, info); if (err) { xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", @@ -544,11 +550,14 @@ static int blkif_queue_request(struct re info->shadow[id].request = (unsigned long)req; ring_req->id = id; - ring_req->operation = rq_data_dir(req) ? - BLKIF_OP_WRITE : BLKIF_OP_READ; ring_req->sector_number = (blkif_sector_t)req->sector; ring_req->handle = info->handle; + ring_req->operation = rq_data_dir(req) ? + BLKIF_OP_WRITE : BLKIF_OP_READ; + if (blk_barrier_rq(req)) + ring_req->operation = BLKIF_OP_WRITE_BARRIER; + ring_req->nr_segments = 0; rq_for_each_bio (bio, req) { bio_for_each_segment (bvec, bio, idx) { @@ -645,6 +654,7 @@ static irqreturn_t blkif_int(int irq, vo RING_IDX i, rp; unsigned long flags; struct blkfront_info *info = (struct blkfront_info *)dev_id; + int uptodate; spin_lock_irqsave(&blkif_io_lock, flags); @@ -669,19 +679,27 @@ static irqreturn_t blkif_int(int irq, vo ADD_ID_TO_FREELIST(info, id); + uptodate = (bret->status == BLKIF_RSP_OKAY); switch (bret->operation) { + case BLKIF_OP_WRITE_BARRIER: + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + printk("blkfront: %s: write barrier op failed\n", + info->gd->disk_name); + uptodate = -EOPNOTSUPP; + info->feature_barrier = 0; + xlvbd_barrier(info); + } + /* fall through */ case BLKIF_OP_READ: case BLKIF_OP_WRITE: if (unlikely(bret->status != BLKIF_RSP_OKAY)) DPRINTK("Bad return from blkdev data " "request: %x\n", bret->status); - ret = end_that_request_first( - req, (bret->status == BLKIF_RSP_OKAY), + ret = end_that_request_first(req, uptodate, req->hard_nr_sectors); BUG_ON(ret); - end_that_request_last( - req, (bret->status == BLKIF_RSP_OKAY)); + end_that_request_last(req, uptodate); break; default: BUG(); Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/block.h ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkfront/block.h +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/block.h @@ -126,6 +126,7 @@ struct blkfront_info struct gnttab_free_callback callback; struct blk_shadow shadow[BLK_RING_SIZE]; unsigned long shadow_free; + int feature_barrier; /** * The number of people holding this device open. We won''t allow a @@ -152,5 +153,6 @@ extern void do_blkif_request (request_qu int xlvbd_add(blkif_sector_t capacity, int device, u16 vdisk_info, u16 sector_size, struct blkfront_info *info); void xlvbd_del(struct blkfront_info *info); +int xlvbd_barrier(struct blkfront_info *info); #endif /* __XEN_DRIVERS_BLOCK_H__ */ Index: build-64-unstable-11724/xen/include/public/io/blkif.h ==================================================================--- build-64-unstable-11724.orig/xen/include/public/io/blkif.h +++ build-64-unstable-11724/xen/include/public/io/blkif.h @@ -29,8 +29,9 @@ #endif #define blkif_sector_t uint64_t -#define BLKIF_OP_READ 0 -#define BLKIF_OP_WRITE 1 +#define BLKIF_OP_READ 0 +#define BLKIF_OP_WRITE 1 +#define BLKIF_OP_WRITE_BARRIER 2 /* * Maximum scatter/gather segments per request. @@ -61,8 +62,9 @@ struct blkif_response { }; typedef struct blkif_response blkif_response_t; -#define BLKIF_RSP_ERROR -1 /* non-specific ''error'' */ -#define BLKIF_RSP_OKAY 0 /* non-specific ''okay'' */ +#define BLKIF_RSP_EOPNOTSUPP -2 /* operation not supported (can happen on barrier writes) */ +#define BLKIF_RSP_ERROR -1 /* non-specific ''error'' */ +#define BLKIF_RSP_OKAY 0 /* non-specific ''okay'' */ /* * Generate blkif ring structures and types. Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c @@ -257,6 +257,10 @@ xlvbd_alloc_gendisk(int minor, blkif_sec } info->rq = gd->queue; + info->gd = gd; + + if (info->feature_barrier) + xlvbd_barrier(info); if (vdisk_info & VDISK_READONLY) set_disk_ro(gd, 1); @@ -267,8 +271,6 @@ xlvbd_alloc_gendisk(int minor, blkif_sec if (vdisk_info & VDISK_CDROM) gd->flags |= GENHD_FL_CD; - info->gd = gd; - return 0; out: @@ -316,3 +318,17 @@ xlvbd_del(struct blkfront_info *info) blk_cleanup_queue(info->rq); info->rq = NULL; } + +int +xlvbd_barrier(struct blkfront_info *info) +{ + int err; + + err = blk_queue_ordered(info->rq, + info->feature_barrier ? QUEUE_ORDERED_DRAIN : QUEUE_ORDERED_NONE, NULL); + if (err) + return err; + printk("blkfront: %s: barriers %s\n", + info->gd->disk_name, info->feature_barrier ? "enabled" : "disabled"); + return 0; +} Index: build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c ==================================================================--- build-64-unstable-11724.orig/linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c +++ build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c @@ -104,7 +104,7 @@ int vbd_translate(struct phys_req *req, struct vbd *vbd = &blkif->vbd; int rc = -EACCES; - if ((operation == WRITE) && vbd->readonly) + if ((operation != READ) && vbd->readonly) goto out; if (unlikely((req->sector_number + req->nr_sects) > vbd_sz(vbd))) -- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Oct-20 20:35 UTC
RE: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
> Background: Barriers are needed to make journaling filesystems work > reliable. For some requests they need order constrains to make the > transactions work correctly. Barrier requests are used to pass that > ordering information to the block layer and/or to the device, so the > constrains are obeyed when reordering requests for better write > performance.Just so I''m clear, this is just an optimization, right? -- linux does the right thing without blk device barrier support, it just gives the opportunity for more block scheduling flexibility (which is certainly a good thing) Out of interest, have you any benchmarks showing the benefits of barrier support? I''d be very interested to see them. Also, have you thought how this would work with blktap? Does the AIO interface allow ordering constraints to be communicated to the kernel? Thanks, Ian> Signed-off-by: Gerd Hoffmann <kraxel@suse.de> > --- > linux-2.6-xen-sparse/drivers/xen/blkback/blkback.c | 41 > ++++++++++++++----- > linux-2.6-xen-sparse/drivers/xen/blkback/common.h | 3 + > linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c | 2 > linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c | 31++++++++++++++> linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c | 30+++++++++++--> linux-2.6-xen-sparse/drivers/xen/blkfront/block.h | 2 > linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c | 20 ++++++++- > xen/include/public/io/blkif.h | 10 ++-- > 8 files changed, 117 insertions(+), 22 deletions(-) > > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/blkback.c > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkback/blkback.c > +++ build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/blkback.c > @@ -189,9 +189,9 @@ static void fast_flush_area(pending_req_ > > static void print_stats(blkif_t *blkif) > { > - printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d\n", > + printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d | br%4d\n",> current->comm, blkif->st_oo_req, > - blkif->st_rd_req, blkif->st_wr_req); > + blkif->st_rd_req, blkif->st_wr_req, blkif->st_br_req); > blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); > blkif->st_rd_req = 0; > blkif->st_wr_req = 0; > @@ -241,12 +241,17 @@ int blkif_schedule(void *arg) > * COMPLETION CALLBACK -- Called as bh->b_end_io() > */ > > -static void __end_block_io_op(pending_req_t *pending_req, intuptodate)> +static void __end_block_io_op(pending_req_t *pending_req, int error) > { > /* An error fails the entire request. */ > - if (!uptodate) { > - DPRINTK("Buffer not up-to-date at end of operation\n"); > + if (error) { > + DPRINTK("Buffer not up-to-date at end of operation, > error=%d\n", error); > pending_req->status = BLKIF_RSP_ERROR; > + if (pending_req->operation == BLKIF_OP_WRITE_BARRIER &&error> == -EOPNOTSUPP) { > + pending_req->status = BLKIF_RSP_EOPNOTSUPP; > + blkback_barrier(pending_req->blkif->be, 0); > + printk("blkback: write barrier op failed, not > supported\n"); > + } > } > > if (atomic_dec_and_test(&pending_req->pendcnt)) { > @@ -262,7 +267,7 @@ static int end_block_io_op(struct bio *b > { > if (bio->bi_size != 0) > return 1; > - __end_block_io_op(bio->bi_private, !error); > + __end_block_io_op(bio->bi_private, error); > bio_put(bio); > return error; > } > @@ -319,6 +324,9 @@ static int do_block_io_op(blkif_t *blkif > blkif->st_rd_req++; > dispatch_rw_block_io(blkif, req, pending_req); > break; > + case BLKIF_OP_WRITE_BARRIER: > + blkif->st_br_req++; > + /* fall through */ > case BLKIF_OP_WRITE: > blkif->st_wr_req++; > dispatch_rw_block_io(blkif, req, pending_req); > @@ -340,7 +348,6 @@ static void dispatch_rw_block_io(blkif_t > pending_req_t *pending_req) > { > extern void ll_rw_block(int rw, int nr, struct buffer_head *bhs[]);> - int operation = (req->operation == BLKIF_OP_WRITE) ? WRITE :READ;> struct gnttab_map_grant_ref map[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > struct phys_req preq; > struct { > @@ -349,6 +356,22 @@ static void dispatch_rw_block_io(blkif_t > unsigned int nseg; > struct bio *bio = NULL,*biolist[BLKIF_MAX_SEGMENTS_PER_REQUEST];> int ret, i, nbio = 0; > + int operation; > + > + switch (req->operation) { > + case BLKIF_OP_READ: > + operation = READ; > + break; > + case BLKIF_OP_WRITE: > + operation = WRITE; > + break; > + case BLKIF_OP_WRITE_BARRIER: > + operation = WRITE_BARRIER; > + break; > + default: > + operation = 0; /* make gcc happy */ > + BUG(); > + } > > /* Check that number of segments is sane. */ > nseg = req->nr_segments; > @@ -364,7 +387,7 @@ static void dispatch_rw_block_io(blkif_t > > pending_req->blkif = blkif; > pending_req->id = req->id; > - pending_req->operation = operation; > + pending_req->operation = req->operation; > pending_req->status = BLKIF_RSP_OKAY; > pending_req->nr_pages = nseg; > > @@ -380,7 +403,7 @@ static void dispatch_rw_block_io(blkif_t > preq.nr_sects += seg[i].nsec; > > flags = GNTMAP_host_map; > - if ( operation == WRITE ) > + if ( operation != READ ) > flags |= GNTMAP_readonly; > gnttab_set_map_op(&map[i], vaddr(pending_req, i), flags, > req->seg[i].gref, blkif->domid); > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/common.h > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkback/common.h > +++ build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/common.h > @@ -87,6 +87,7 @@ typedef struct blkif_st { > int st_rd_req; > int st_wr_req; > int st_oo_req; > + int st_br_req; > > wait_queue_head_t waiting_to_free; > > @@ -131,4 +132,6 @@ void blkif_xenbus_init(void); > irqreturn_t blkif_be_int(int irq, void *dev_id, struct pt_regs*regs);> int blkif_schedule(void *arg); > > +int blkback_barrier(struct backend_info *be, int state); > + > #endif /* __BLKIF__BACKEND__COMMON_H__ */ > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/xenbus.c > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkback/xenbus.c > +++ build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/xenbus.c > @@ -91,11 +91,13 @@ static void update_blkif_status(blkif_t > VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); > VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); > VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); > +VBD_SHOW(br_req, "%d\n", be->blkif->st_br_req); > > static struct attribute *vbdstat_attrs[] = { > &dev_attr_oo_req.attr, > &dev_attr_rd_req.attr, > &dev_attr_wr_req.attr, > + &dev_attr_br_req.attr, > NULL > }; > > @@ -165,6 +167,31 @@ static int blkback_remove(struct xenbus_ > return 0; > } > > +int blkback_barrier(struct backend_info *be, int state) > +{ > + struct xenbus_device *dev = be->dev; > + struct xenbus_transaction xbt; > + int err; > + > + do { > + err = xenbus_transaction_start(&xbt); > + if (err) { > + xenbus_dev_fatal(dev, err, "startingtransaction");> + return -1; > + } > + > + err = xenbus_printf(xbt, dev->nodename,"feature-barrier",> + "%d", state); > + if (err) { > + xenbus_transaction_end(xbt, 1); > + xenbus_dev_fatal(dev, err, "writingfeature-barrier");> + return -1; > + } > + > + err = xenbus_transaction_end(xbt, 0); > + } while (err == -EAGAIN); > + return 0; > +} > > /** > * Entry point to this code when a new device is created. Allocatethe> basic > @@ -201,6 +228,10 @@ static int blkback_probe(struct xenbus_d > if (err) > goto fail; > > + err = blkback_barrier(be, 1); > + if (err) > + goto fail; > + > err = xenbus_switch_state(dev, XenbusStateInitWait); > if (err) > goto fail; > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkfront/blkfront.c > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkfront/blkfront.c > +++ build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkfront/blkfront.c > @@ -316,6 +316,12 @@ static void connect(struct blkfront_info > return; > } > > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "feature-barrier", "%lu",&info->feature_barrier,> + NULL); > + if (err) > + info->feature_barrier = 0; > + > err = xlvbd_add(sectors, info->vdevice, binfo, sector_size,info);> if (err) { > xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > @@ -544,11 +550,14 @@ static int blkif_queue_request(struct re > info->shadow[id].request = (unsigned long)req; > > ring_req->id = id; > - ring_req->operation = rq_data_dir(req) ? > - BLKIF_OP_WRITE : BLKIF_OP_READ; > ring_req->sector_number = (blkif_sector_t)req->sector; > ring_req->handle = info->handle; > > + ring_req->operation = rq_data_dir(req) ? > + BLKIF_OP_WRITE : BLKIF_OP_READ; > + if (blk_barrier_rq(req)) > + ring_req->operation = BLKIF_OP_WRITE_BARRIER; > + > ring_req->nr_segments = 0; > rq_for_each_bio (bio, req) { > bio_for_each_segment (bvec, bio, idx) { > @@ -645,6 +654,7 @@ static irqreturn_t blkif_int(int irq, vo > RING_IDX i, rp; > unsigned long flags; > struct blkfront_info *info = (struct blkfront_info *)dev_id; > + int uptodate; > > spin_lock_irqsave(&blkif_io_lock, flags); > > @@ -669,19 +679,27 @@ static irqreturn_t blkif_int(int irq, vo > > ADD_ID_TO_FREELIST(info, id); > > + uptodate = (bret->status == BLKIF_RSP_OKAY); > switch (bret->operation) { > + case BLKIF_OP_WRITE_BARRIER: > + if (unlikely(bret->status =BLKIF_RSP_EOPNOTSUPP)) { > + printk("blkfront: %s: write barrier opfailed\n",> + info->gd->disk_name); > + uptodate = -EOPNOTSUPP; > + info->feature_barrier = 0; > + xlvbd_barrier(info); > + } > + /* fall through */ > case BLKIF_OP_READ: > case BLKIF_OP_WRITE: > if (unlikely(bret->status != BLKIF_RSP_OKAY)) > DPRINTK("Bad return from blkdev data " > "request: %x\n", bret->status); > > - ret = end_that_request_first( > - req, (bret->status == BLKIF_RSP_OKAY), > + ret = end_that_request_first(req, uptodate, > req->hard_nr_sectors); > BUG_ON(ret); > - end_that_request_last( > - req, (bret->status == BLKIF_RSP_OKAY)); > + end_that_request_last(req, uptodate); > break; > default: > BUG(); > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkfront/block.h > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkfront/block.h > +++ build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkfront/block.h > @@ -126,6 +126,7 @@ struct blkfront_info > struct gnttab_free_callback callback; > struct blk_shadow shadow[BLK_RING_SIZE]; > unsigned long shadow_free; > + int feature_barrier; > > /** > * The number of people holding this device open. We won''tallow a> @@ -152,5 +153,6 @@ extern void do_blkif_request (request_qu > int xlvbd_add(blkif_sector_t capacity, int device, > u16 vdisk_info, u16 sector_size, struct blkfront_info*info);> void xlvbd_del(struct blkfront_info *info); > +int xlvbd_barrier(struct blkfront_info *info); > > #endif /* __XEN_DRIVERS_BLOCK_H__ */ > Index: build-64-unstable-11724/xen/include/public/io/blkif.h > ==================================================================> --- build-64-unstable-11724.orig/xen/include/public/io/blkif.h > +++ build-64-unstable-11724/xen/include/public/io/blkif.h > @@ -29,8 +29,9 @@ > #endif > #define blkif_sector_t uint64_t > > -#define BLKIF_OP_READ 0 > -#define BLKIF_OP_WRITE 1 > +#define BLKIF_OP_READ 0 > +#define BLKIF_OP_WRITE 1 > +#define BLKIF_OP_WRITE_BARRIER 2 > > /* > * Maximum scatter/gather segments per request. > @@ -61,8 +62,9 @@ struct blkif_response { > }; > typedef struct blkif_response blkif_response_t; > > -#define BLKIF_RSP_ERROR -1 /* non-specific ''error'' */ > -#define BLKIF_RSP_OKAY 0 /* non-specific ''okay'' */ > +#define BLKIF_RSP_EOPNOTSUPP -2 /* operation not supported (canhappen on> barrier writes) */ > +#define BLKIF_RSP_ERROR -1 /* non-specific ''error'' */ > +#define BLKIF_RSP_OKAY 0 /* non-specific ''okay'' */ > > /* > * Generate blkif ring structures and types. > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkfront/vbd.c > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkfront/vbd.c > +++build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkfront/vbd.c> @@ -257,6 +257,10 @@ xlvbd_alloc_gendisk(int minor, blkif_sec > } > > info->rq = gd->queue; > + info->gd = gd; > + > + if (info->feature_barrier) > + xlvbd_barrier(info); > > if (vdisk_info & VDISK_READONLY) > set_disk_ro(gd, 1); > @@ -267,8 +271,6 @@ xlvbd_alloc_gendisk(int minor, blkif_sec > if (vdisk_info & VDISK_CDROM) > gd->flags |= GENHD_FL_CD; > > - info->gd = gd; > - > return 0; > > out: > @@ -316,3 +318,17 @@ xlvbd_del(struct blkfront_info *info) > blk_cleanup_queue(info->rq); > info->rq = NULL; > } > + > +int > +xlvbd_barrier(struct blkfront_info *info) > +{ > + int err; > + > + err = blk_queue_ordered(info->rq, > + info->feature_barrier ? QUEUE_ORDERED_DRAIN : > QUEUE_ORDERED_NONE, NULL); > + if (err) > + return err; > + printk("blkfront: %s: barriers %s\n", > + info->gd->disk_name, info->feature_barrier ? "enabled" : > "disabled"); > + return 0; > +} > Index: build-64-unstable-11724/linux-2.6-xen- > sparse/drivers/xen/blkback/vbd.c > ==================================================================> --- build-64-unstable-11724.orig/linux-2.6-xen- > sparse/drivers/xen/blkback/vbd.c > +++build-64-unstable-11724/linux-2.6-xen-sparse/drivers/xen/blkback/vbd.c> @@ -104,7 +104,7 @@ int vbd_translate(struct phys_req *req, > struct vbd *vbd = &blkif->vbd; > int rc = -EACCES; > > - if ((operation == WRITE) && vbd->readonly) > + if ((operation != READ) && vbd->readonly) > goto out; > > if (unlikely((req->sector_number + req->nr_sects) >vbd_sz(vbd)))> > -- > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Oct-23 11:42 UTC
RE: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
> > Out of interest, have you any benchmarks showing the benefits of > > barrier support? I''d be very interested to see them. > > It''s about correctness, not about performance. I''ve talked > with Jens Axboe about it a while ago. Barriers are > *required* for journaling filesystems to work reliable.I just don''t believe that. If the underlying device doesn''t support barriers Linux should just stop issuing requests until it has the completion notifcation back for *all* the outstanding IOs to the device, and then start issuing the new batch of IOs. I''d be incredibly surprised if this is not what Linux does today for devices that don''t support barriers. [NB: you still have the issue of disks that support write caching and lie and send the completion before data has hit the disk, but there''s nothing you can do about that from the OS] Barriers should just be an optimization that gives greater scheduling flexibility. I''d certainly be interested to see some benchmarks with and without the barrier support in blkfront/back.> > Also, have you thought how this would work with blktap? > Does the AIO > > interface allow ordering constraints to be communicated to > the kernel? > > Have a look at Documentation/block/barrier.txt for some > background information. > > It shouldn''t be a big problem to implement barriers with > blktap too. I think it can''t be simply passed down to the > block layer (like blkback does). But it can be implement > with aio_fsync(). If a barrier request comes in: sync, > submit the write request, sync again, then go ahead with the > next request.Linux''s notion of a barrier is pretty odd in that it does appear to require the second fsync, at least according to barrier.txt. That''s more restrictive than the notion of barrier I''ve seen in other OSes.> Some extra care might be needed for the disk > format metadata (cow images come to mind ...).There''s already care taken to ensure that metadata updates happen with the correct ordering with respect to data writes. Adding barriers at the data level should be totally orthogonal. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Oct-23 12:27 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Ian Pratt wrote:> >> It''s about correctness, not about performance. I''ve talked >> with Jens Axboe about it a while ago. Barriers are >> *required* for journaling filesystems to work reliable. > > I just don''t believe that. If the underlying device doesn''t support > barriers Linux should just stop issuing requests until it has the > completion notifcation back for *all* the outstanding IOs to the device, > and then start issuing the new batch of IOs.That is what actually happens, that isn''t enough in all cases though.> I''d be incredibly surprised if this is not what Linux does today for > devices that don''t support barriers. [NB: you still have the issue of > disks that support write caching and lie and send the completion before > data has hit the disk, but there''s nothing you can do about that from > the OS]The write caching case is exactly the problematic one. If you turn off the disks write caching you are fine I think. With write caching enabled journaling is safe only together with barriers. The kernel can either use request tagging so the drive itself makes sure the write order is correct or explicitly ask the drive to flush the write cache, depending on what the hardware supports.> I''d certainly be interested to see some benchmarks with and without the > barrier support in blkfront/back.I''ll go run some ... cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Oct-25 16:52 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Gerd Hoffmann wrote:> Ian Pratt wrote:>> I''d certainly be interested to see some benchmarks with and without the >> barrier support in blkfront/back. > > I''ll go run some ...I did some bonnie++ runs, each test three times. Complete set of results, also the scripts I''ve used to run stuff, are at http://www.suse.de/~kraxel/bb/. Handle the scripts with care, there is a mkfs call in there ;) I''ve picked two numbers (sequential block writes, sequential file creation) which should have a bunch barrier requests for journal updates. Machine is a intel devel box with hyperthreading, 1GB of memory, sata disk. I''ve used the ext3 filesystem, freshly created on a lvm volume, mounted with barrier={0,1} and data=journal (to stress journaling and barriers ...). Legend: wce0 - write cache enable = off (hdparm -W0 /dev/sda) wce1 - write cache enable = on (hdparm -W1 /dev/sda) barrier0 - barriers off (mount -o barrier=0) barrier1 - barriers on Here are the dom0 results: name blkout create --------------------------------------- dom0-wce0-barrier1 12614 1841 12598 975 12575 1102 dom0-wce0-barrier0 12656 1320 12569 1629 12669 1313 dom0-wce1-barrier1 18457 2801 18395 3316 18560 2948 dom0-wce1-barrier0 19224 3190 18880 3079 19121 3555 With write caching disabled barriers don''t make a big difference, it''s in the noise. Not surprising. With write caching enabled barriers make things a bit slower. But keep in mind that journaling filesystems can NOT work reliable without barriers when write caching is enabled. The small performance penalty simply is the price to have to pay for filesystem integrity. The other way to make sure the journaling works ok is to disable write caching. As you can see this costs even more performance. domU results (guest with 256 MB memory, separate table due to the results not being directly comparable with the dom0 ones). name blkout create --------------------------------------- domU-wce0-barrier1 9247 866 9167 1265 9591 1131 domU-wce0-barrier0 9413 882 9414 1082 9113 942 domU-wce1-barrier1 21134 4428 19782 3507 19813 3810 domU-wce1-barrier0 20065 3411 20451 4342 20312 4672 Look simliar, but a bit more noisy, so there isn''t a clear difference visible for barriers being on/off ... cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Oct-27 13:46 UTC
RE: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
> With write caching disabled barriers don''t make a big > difference, it''s in the noise. Not surprising. With write > caching enabled barriers make things a bit slower. But keep > in mind that journaling filesystems can NOT work reliable > without barriers when write caching is enabled. The small > performance penalty simply is the price to have to pay for > filesystem integrity. The other way to make sure the > journaling works ok is to disable write caching. As you can > see this costs even more performance.What''s the current logic for determining whether the backend advertises barrier support, and whether the frontend chooses to use it? I guess the backend could always advertise it providing it did the appropriate queue drain whenever it encoutered one if the underlying block device didn''t support barriers. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Oct-30 08:21 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Ian Pratt wrote:> What''s the current logic for determining whether the backend advertises > barrier support, and whether the frontend chooses to use it?Backend: "feature-barrier" node in xenstore. It means the backend understands the new BLKIF_OP_WRITE_BARRIER operation. The node can be either 1 or 0, depending on whenever the underlying block device actually supports barriers or not. Initially it is ''1'' unconditionally, the only way to figure whenever the underlying block device supports barriers is to actually submit one and see if it works. If a barrier write fails with -EOPNOTSUPP the backend changes the node to ''0''. The error is also propagated to the frontend so it knows barriers don''t work (and can in turn propagate that up to the filesystem driver), the new BLKIF_RSP_EOPNOTSUPP error code is needed for that. Frontend: It simply submits barrier writes ;) The backend takes care that the new error code is used for barrier writes only (it should never ever happen for normal writes anyway), so old frontends which don''t know about barriers (and thus never submit barrier write requests) should never ever see the new error code.> I guess the backend could always advertise it providing it did the > appropriate queue drain whenever it encoutered one if the underlying > block device didn''t support barriers.The filesystems do some best-effort handling when barrier are not available anyway (which works ok for the non-write caching case). IMO the best way to handle non-working barriers is to simply let the filesystems know, which is exactly what the patch implements: It passes through the capabilities of the underlying block device to the domU instead of trying to fake something which isn''t there. cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Oct-30 12:15 UTC
RE: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
> Backend: "feature-barrier" node in xenstore. It means the > backend understands the new BLKIF_OP_WRITE_BARRIER operation. > The node can be either 1 or 0, depending on whenever the > underlying block device actually supports barriers or not. > Initially it is ''1'' unconditionally, the only way to figure > whenever the underlying block device supports barriers is to > actually submit one and see if it works. If a barrier write > fails with -EOPNOTSUPP the backend changes the node to ''0''.What block devices currently support barriers? I assume device mapper does (if the underlying devices do), but loop presumably doesn''t.> The error is also propagated to the frontend so it knows > barriers don''t work (and can in turn propagate that up to the > filesystem driver), the new BLKIF_RSP_EOPNOTSUPP error code > is needed for that.Do you do a probe at backendd initialisation time to avoid telling the frontend that barriers are supported and then having to tell it they are not the first time it tries to use one? Althoug Linux may be happy with that its not a clean interface. It would be good if someone who know the *BSD/Solaris block stack could offer some input here. Thanks, Ian> Frontend: It simply submits barrier writes ;) The backend > takes care that the new error code is used for barrier writes > only (it should never ever happen for normal writes anyway), > so old frontends which don''t know about barriers (and thus > never submit barrier write requests) should never ever see > the new error code. > > > I guess the backend could always advertise it providing it did the > > appropriate queue drain whenever it encoutered one if the > underlying > > block device didn''t support barriers. > > The filesystems do some best-effort handling when barrier are > not available anyway (which works ok for the non-write > caching case). IMO the best way to handle non-working > barriers is to simply let the filesystems know, which is > exactly what the patch implements: It passes through the > capabilities of the underlying block device to the domU > instead of trying to fake something which isn''t there. > > cheers, > Gerd > > -- > Gerd Hoffmann <kraxel@suse.de> > http://www.suse.de/~kraxel/julika-dora.jpeg >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Oct-30 12:43 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Hi,> What block devices currently support barriers? I assume device mapper > does (if the underlying devices do), but loop presumably doesn''t.ide does, sata does (at least ide also depending on disk capabilities), lvm on top of these does too. loop doesn''t, I have a patch in the queue though: http://www.suse.de/~kraxel/patches/kraxel-unstable-zweiblum-hg11870-quilt/loop-barrier.diff>> The error is also propagated to the frontend so it knows >> barriers don''t work (and can in turn propagate that up to the >> filesystem driver), the new BLKIF_RSP_EOPNOTSUPP error code >> is needed for that. > > Do you do a probe at backendd initialisation time to avoid telling the > frontend that barriers are supported and then having to tell it they are > not the first time it tries to use one? Althoug Linux may be happy with > that its not a clean interface.Well, there is no way around that. The only way to probe this reliably is to actually submit a barrier request to the hardware and see if they work ok (due to some buggy drives, telling they support the required features but grok if such a request is submitted). There are likely cases where it is possible to figure it in advance and avoid the "submit barrier req and see it fail" roundtrip. Havn''t investigated that. I''d prefer to have *one* code path only, if you have two the unlikely of them tends to get much less tested ... cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Nov-08 15:38 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Hi, ping. What is the status of this? Any remaining issues to be solved? cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Nov-09 01:04 UTC
RE: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
> -----Original Message----- > From: Gerd Hoffmann [mailto:kraxel@suse.de] > Sent: 08 November 2006 15:38 > To: Gerd Hoffmann > Cc: Ian Pratt; xen-devel@lists.xensource.com; ian.pratt@cl.cam.ac.uk > Subject: Re: [Xen-devel] [patch] Add support for barriers to > blk{back,front} drivers. > > > ping. What is the status of this? Any remaining issues to be solved?I''d like to hear from the Solaris and BSD folks whether the linux notion of a barrier is stronger than used by those OSes (I know at least one OS that treats barriers as being between requests rather than implicitly both sides of a request as in Linux) and whether they feel that this would actually make any practical performance difference). It would also be good to know whether the other OSes could cope with linux''s odd way of determining whether barriers are supported (i.e. send requests assuming they are and get failures if not). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2006-Nov-09 01:15 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back, front} drivers.
On Thu, Nov 09, 2006 at 01:04:19AM -0000, Ian Pratt wrote:> > ping. What is the status of this? Any remaining issues to be solved? > > I''d like to hear from the Solaris and BSD folks whether the linux notion > of a barrier is stronger than used by those OSes (I know at least one OS > that treats barriers as being between requests rather than implicitly > both sides of a request as in Linux) and whether they feel that this > would actually make any practical performance difference).Sorry we didn''t reply earlier, fell off our radar temporarily. We''ve looked at the proposal and it looks like it would work for Solaris and our front/back drivers.> It would also be good to know whether the other OSes could cope with > linux''s odd way of determining whether barriers are supported (i.e. send > requests assuming they are and get failures if not).This should work too, for the reasons Gerd mentioned, or so I''m informed. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-09 09:00 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
What happens (even on native Linux) if you have, say, a RAID array in which some of the discs support barriers and others don''t? For example, you have a mirror set where one of the discs supports barriers but the others don''t: what status gets returned? Or you have a RAID-0 set, so that some barrier writes can succeed and others can''t: must the filesystem handle barrier_write failures at any time? -- Keir On 8/11/06 3:38 pm, "Gerd Hoffmann" <kraxel@suse.de> wrote:> Hi, > > ping. What is the status of this? Any remaining issues to be solved? > > cheers, > Gerd_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Nov-09 12:48 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Keir Fraser wrote:> What happens (even on native Linux) if you have, say, a RAID array in which > some of the discs support barriers and others don''t?The raid0 driver doesn''t support barriers in the first place. Not sure about the other raid drivers.> Or you have a RAID-0 set, so that some barrier > writes can succeed and others can''t:I think that can never ever happen. To maintain correct barrier semantics for the complete raid0 it is not enough to simply pass down the barrier request to the disk which happens to hold the block in question. You have to somehow synchronize the disks as well.> must the filesystem handle > barrier_write failures at any time?The linux filesystems do. The usual way to handle the error is to submit the request again as normal request and turn off the barrier flag so the filesystem driver wouldn''t use barrier requests again (and enter fallback mode). It doesn''t matter whenever that happens on the very first or any subsequent barrier write request. As far I know barriers are always a all-or-nothing thingy. Either they work or they don''t. Only problem is that there is no reliable way to figure without actually submitting a barrier request. I don''t like it that much either, but the existing hardware forces us to handle it that way. Life sucks sometimes ... cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Nov-09 13:47 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
On 9/11/06 12:48, "Gerd Hoffmann" <kraxel@suse.de> wrote:>> What happens (even on native Linux) if you have, say, a RAID array in which >> some of the discs support barriers and others don''t? > > The raid0 driver doesn''t support barriers in the first place. Not sure > about the other raid drivers.Does this mean journalling filesystems cannot run reliably on top of RAID-0? That sounds a bit concerning! What about when running on top of LVM, where an LV is stitched together from bits of various PVs. Some may support barriers, some may not. Running on top of LVM is default for most distros, so surely it must have a story on write barriers? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Nov-09 14:15 UTC
Re: [Xen-devel] [patch] Add support for barriers to blk{back,front} drivers.
Keir Fraser wrote:> On 9/11/06 12:48, "Gerd Hoffmann" <kraxel@suse.de> wrote: > >>> What happens (even on native Linux) if you have, say, a RAID array in which >>> some of the discs support barriers and others don''t? >> The raid0 driver doesn''t support barriers in the first place. Not sure >> about the other raid drivers. > > Does this mean journalling filesystems cannot run reliably on top of RAID-0? > That sounds a bit concerning!Without barrier support available linux filesytems fallback to just wait until the requests (which would have been submitted as barrier requests) are finished before submitting the next to make sure the ordering is fine. That works ok as long as the disk doesn''t do write caching, so you better turn write caching off in that case.> What about when running on top of LVM, where an LV is stitched together from > bits of various PVs. Some may support barriers, some may not. Running on top > of LVM is default for most distros, so surely it must have a story on write > barriers?lvm works ok for me, almost all my domU disks are on lvm. I have just one disk in the system though. I''m not sure what happens with multiple PVs. I''d expect barriers are working fine as long as your logical volume is not spread over multiple physical volumes and the underlying physical volume can handle barriers. cheers, Gerd -- Gerd Hoffmann <kraxel@suse.de> http://www.suse.de/~kraxel/julika-dora.jpeg _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel