Stefano Stabellini
2010-Nov-24 13:08 UTC
[Xen-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
Hi all, this patch can be applied to both qemu-xen and qemu and adds support for empty write barriers to xen_disk. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> diff --git a/hw/xen_disk.c b/hw/xen_disk.c index 38b5fbf..94af001 100644 --- a/hw/xen_disk.c +++ b/hw/xen_disk.c @@ -182,6 +182,10 @@ static int ioreq_parse(struct ioreq *ioreq) ioreq->prot = PROT_WRITE; /* to memory */ break; case BLKIF_OP_WRITE_BARRIER: + if (!ioreq->req.nr_segments) { + ioreq->presync = 1; + return 0; + } if (!syncwrite) ioreq->presync = ioreq->postsync = 1; /* fall through */ @@ -306,7 +310,7 @@ static int ioreq_runio_qemu_sync(struct ioreq *ioreq) int i, rc, len = 0; off_t pos; - if (ioreq_map(ioreq) == -1) + if (ioreq->req.nr_segments && ioreq_map(ioreq) == -1) goto err; if (ioreq->presync) bdrv_flush(blkdev->bs); @@ -330,6 +334,8 @@ static int ioreq_runio_qemu_sync(struct ioreq *ioreq) break; case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: + if (!ioreq->req.nr_segments) + break; pos = ioreq->start; for (i = 0; i < ioreq->v.niov; i++) { rc = bdrv_write(blkdev->bs, pos / BLOCK_SIZE, @@ -387,7 +393,7 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) { struct XenBlkDev *blkdev = ioreq->blkdev; - if (ioreq_map(ioreq) == -1) + if (ioreq->req.nr_segments && ioreq_map(ioreq) == -1) goto err; ioreq->aio_inflight++; @@ -404,6 +410,8 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq) case BLKIF_OP_WRITE: case BLKIF_OP_WRITE_BARRIER: ioreq->aio_inflight++; + if (!ioreq->req.nr_segments) + break; bdrv_aio_writev(blkdev->bs, ioreq->start / BLOCK_SIZE, &ioreq->v, ioreq->v.size / BLOCK_SIZE, qemu_aio_complete, ioreq); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2010-Nov-24 14:23 UTC
[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On 11/24/10 14:08, Stefano Stabellini wrote:> this patch can be applied to both qemu-xen and qemu and adds support > for empty write barriers to xen_disk.Looks good. Acked-by: Gerd Hoffmann <kraxel@redhat.com> cheers, Gerd _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2010-Nov-24 16:29 UTC
[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
Am 24.11.2010 15:23, schrieb Gerd Hoffmann:> On 11/24/10 14:08, Stefano Stabellini wrote: >> this patch can be applied to both qemu-xen and qemu and adds support >> for empty write barriers to xen_disk. > > Looks good. > > Acked-by: Gerd Hoffmann <kraxel@redhat.com>Thanks, applied to the block branch. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Nov-24 16:46 UTC
[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Wed, 24 Nov 2010, Kevin Wolf wrote:> > On 11/24/10 14:08, Stefano Stabellini wrote: > >> this patch can be applied to both qemu-xen and qemu and adds support > >> for empty write barriers to xen_disk. > > > > Looks good. > > > > Acked-by: Gerd Hoffmann <kraxel@redhat.com> > > Thanks, applied to the block branch.thank you! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Nov-24 16:58 UTC
[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
I had the discussion with Jeremy in Boston before, but let''s repeat it here: - is there actually any pre-existing xen backend that does properly implement empty barries. Back then we couldn''t find any. - if this is a new concept to Xen please do not define an empty barrier primitive, but a new flush cache primitive. That one maps natively to the qemu I/O layer, and with recent Linux, NetBSD, Windows, or Solaris guest will be a lot faster than a barrier which drains the queue. Note that what your patch implements actually is a rather inefficient implementation of the latter. You do none of the queue draining which the in-kernel blkback implementation does by submitting the old-style barrier bio. While most filesystem do not care you introduce a quite subtile chance of data corruption for reiserfs, or ext4 with asynchronous journal commits on pre-2.6.37 kernels. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-24 18:18 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On 11/24/2010 08:58 AM, Christoph Hellwig wrote:> I had the discussion with Jeremy in Boston before, but let''s repeat it > here: > > - is there actually any pre-existing xen backend that does properly > implement empty barries. Back then we couldn''t find any. > - if this is a new concept to Xen please do not define an empty > barrier primitive, but a new flush cache primitive. That one > maps natively to the qemu I/O layer, and with recent Linux, NetBSD, > Windows, or Solaris guest will be a lot faster than a barrier > which drains the queue. > > Note that what your patch implements actually is a rather inefficient > implementation of the latter. You do none of the queue draining which > the in-kernel blkback implementation does by submitting the old-style > barrier bio. While most filesystem do not care you introduce a quite > subtile chance of data corruption for reiserfs, or ext4 with > asynchronous journal commits on pre-2.6.37 kernels.Yeah, I agree. I think semantically empty WRITE_BARRIERs are supposed to work, as evidenced by the effort blkback makes in trying to specifically support them. I haven''t traced through to find out why it EIOs them regardless - it seems to be coming from deeper in the block subsystem (is there a "no payload" flag or something missing?). But in the future, I think defining an unordered FLUSH operator like Linux wants is a useful thing to do and implement (especially since it amounts to standardising the ?BSD extension). I''m not sure of their precise semantics (esp WRT ordering), but I think its already OK. (BTW, in case it wasn''t clear, we''re seriously considering - but not yet committed to - using qemu as the primary PV block backend for Xen instead of submitting the existing blkback code for upstream. We still need to do some proper testing and measuring to make sure it stacks up OK, and work out how it would fit together with the rest of the management stack. But so far it looks promising.) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Nov-24 18:44 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote:> Linux wants is a useful thing to do and implement (especially since it > amounts to standardising the ?BSD extension). I''m not sure of their > precise semantics (esp WRT ordering), but I think its already OK.The nice bit is that a pure flush does not imply any odering at all. Which is how the current qemu driver implements the barrier requests anyway, so that needs some fixing.> (BTW, in case it wasn''t clear, we''re seriously considering - but not yet > committed to - using qemu as the primary PV block backend for Xen > instead of submitting the existing blkback code for upstream. We still > need to do some proper testing and measuring to make sure it stacks up > OK, and work out how it would fit together with the rest of the > management stack. But so far it looks promising.)Good to know. Besides the issue with barriers mentioned above there''s a few things that need addressing in xen_disk, if you (or Stefano or Daniel) are interested: - remove the syncwrite tunable, as this is handled by the underlying posix I/O code if needed by using O_DSYNC which is a lot more efficient. - check whatever the issue with the use_aio codepath is and make it the default. It should help the performance a lot. - Make sure to use bdrv_aio_flush for cache flushes in the aio codepath, currently it still uses plain synchronous flushes.> > J---end quoted text--- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Nov-24 19:55 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Wed, 24 Nov 2010, Christoph Hellwig wrote:> On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote: > > Linux wants is a useful thing to do and implement (especially since it > > amounts to standardising the ?BSD extension). I''m not sure of their > > precise semantics (esp WRT ordering), but I think its already OK. > > The nice bit is that a pure flush does not imply any odering at all. > Which is how the current qemu driver implements the barrier requests > anyway, so that needs some fixing. > > > (BTW, in case it wasn''t clear, we''re seriously considering - but not yet > > committed to - using qemu as the primary PV block backend for Xen > > instead of submitting the existing blkback code for upstream. We still > > need to do some proper testing and measuring to make sure it stacks up > > OK, and work out how it would fit together with the rest of the > > management stack. But so far it looks promising.) > > Good to know. Besides the issue with barriers mentioned above there''s > a few things that need addressing in xen_disk, if you (or Stefano or > Daniel) are interested: > > - remove the syncwrite tunable, as this is handled by the underlying > posix I/O code if needed by using O_DSYNC which is a lot more > efficient. > - check whatever the issue with the use_aio codepath is and make it > the default. It should help the performance a lot. > - Make sure to use bdrv_aio_flush for cache flushes in the aio > codepath, currently it still uses plain synchronous flushes.all very good suggestions, I am adding them to my todo list, but Daniel is very welcome to contribute as well :) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Nov-25 19:30 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
Christoph Hellwig writes ("Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk"):> On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote: > > Linux wants is a useful thing to do and implement (especially since it > > amounts to standardising the ?BSD extension). I''m not sure of their > > precise semantics (esp WRT ordering), but I think its already OK. > > The nice bit is that a pure flush does not imply any odering at all. > Which is how the current qemu driver implements the barrier requests > anyway, so that needs some fixing.Thanks for your comments. Does that mean, though, that Stefano''s patch is actually making the situation worse, or simply that it isn''t making it as good as it should be ? If it''s an improvement it should be applied it even if it''s not perfect, and it sounds to me like we don''t want to wait for the more complicated discussion to finish and produce code ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Nov-25 19:46 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On 11/25/2010 11:30 AM, Ian Jackson wrote:> Christoph Hellwig writes ("Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk"): >> On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote: >>> Linux wants is a useful thing to do and implement (especially since it >>> amounts to standardising the ?BSD extension). I''m not sure of their >>> precise semantics (esp WRT ordering), but I think its already OK. >> The nice bit is that a pure flush does not imply any odering at all. >> Which is how the current qemu driver implements the barrier requests >> anyway, so that needs some fixing. > Thanks for your comments. Does that mean, though, that Stefano''s > patch is actually making the situation worse, or simply that it isn''t > making it as good as it should be ?The latter. There''s a question over whether WRITE_BARRIER really supports empty barriers, since it appears that none of the existing backends implement it correctly - but on the other hand, the kernel blkback code does *try* to implement it, even though it fails. This change makes empty WRITE_BARRIERS work in qemu, which is helpful because the upstream blkfront tries to use them. But WRITE_BARRIER is fundamentally suboptimal for Linux''s needs because it is a fully ordered barrier operation. What Linux needs is a simple FLUSH operation which just makes sure that previously completed writes are fully flushed out of any caches and buffers and are really on durable storage. It has no ordering requirements, so it doesn''t prevent subsequent writes from being handled while the flush is going on. Christoph is therefore recommending that we add a specific FLUSH operation to the protocol with these properties so that we can achieve the best performance. But if the backend lacks FLUSH, we still need a reliable WRITE_BARRIER. (But it would be very sad that if, in practice, most backends in the field fail empty WRITE_BARRIER operations, leaving guests with no mechanism to force writes to stable storage. In that case I guess we''ll need to look at some other hacky thing to try and make it work...) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Nov-25 23:30 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Thu, Nov 25, 2010 at 11:46:40AM -0800, Jeremy Fitzhardinge wrote:> The latter. There''s a question over whether WRITE_BARRIER really > supports empty barriers, since it appears that none of the existing > backends implement it correctly - but on the other hand, the kernel > blkback code does *try* to implement it, even though it fails. This > change makes empty WRITE_BARRIERS work in qemu, which is helpful because > the upstream blkfront tries to use them.So far so good.> > But WRITE_BARRIER is fundamentally suboptimal for Linux''s needs because > it is a fully ordered barrier operation. What Linux needs is a simple > FLUSH operation which just makes sure that previously completed writes > are fully flushed out of any caches and buffers and are really on > durable storage. It has no ordering requirements, so it doesn''t prevent > subsequent writes from being handled while the flush is going on. > > Christoph is therefore recommending that we add a specific FLUSH > operation to the protocol with these properties so that we can achieve > the best performance. But if the backend lacks FLUSH, we still need a > reliable WRITE_BARRIER.The problem is that qemu currently implements WRITE_BARRIER incorrectly, empty or not. The Linux barrier primitive, which appears to extent 1:1 to Xen implies ordering semantics, which the blkback implementation implementes by translating the write barrier back to a bio with the barrier bit set. But the qemu backend does not impose any ordering, so it gives you different behaviour from blkback. Is there any formal specification of the Xen block protocol? In the end the empty WRITE_BARRIER after this patch is equivalent to a flush, which is fine for that particular command. The problem is that a small minority of Linux filesystems actually relied on the ordering semantics of a non-empty WRITE_BARRIER command, which qemu doesn''t respect. But the patch doesn''t make anything worse by also accepting empty barrier writes, so I guess it''s fine. My initial reply was just supposed to be a reminder about the big dragon lurking here. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kevin Wolf
2010-Nov-26 11:03 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
Am 24.11.2010 19:44, schrieb Christoph Hellwig:> On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote: >> Linux wants is a useful thing to do and implement (especially since it >> amounts to standardising the ?BSD extension). I''m not sure of their >> precise semantics (esp WRT ordering), but I think its already OK. > > The nice bit is that a pure flush does not imply any odering at all. > Which is how the current qemu driver implements the barrier requests > anyway, so that needs some fixing. > >> (BTW, in case it wasn''t clear, we''re seriously considering - but not yet >> committed to - using qemu as the primary PV block backend for Xen >> instead of submitting the existing blkback code for upstream. We still >> need to do some proper testing and measuring to make sure it stacks up >> OK, and work out how it would fit together with the rest of the >> management stack. But so far it looks promising.) > > Good to know. Besides the issue with barriers mentioned above there''s > a few things that need addressing in xen_disk, if you (or Stefano or > Daniel) are interested: > > - remove the syncwrite tunable, as this is handled by the underlying > posix I/O code if needed by using O_DSYNC which is a lot more > efficient. > - check whatever the issue with the use_aio codepath is and make it > the default. It should help the performance a lot. > - Make sure to use bdrv_aio_flush for cache flushes in the aio > codepath, currently it still uses plain synchronous flushes.I don''t think the synchronous flushes even do what they''re supposed to do. Shouldn''t ioreq->postsync do the flush after the request has completed instead of doing it as soon as it has been submitted? Let alone that, as you mentioned above, it doesn''t implement barrier semantics at all. Oh, and it would be very nice if the return values of bdrv_aio_readv/writev and bdrv_flush were checked. They can return errors. Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Hellwig
2010-Nov-26 13:29 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Fri, Nov 26, 2010 at 12:03:35PM +0100, Kevin Wolf wrote:> I don''t think the synchronous flushes even do what they''re supposed to > do. Shouldn''t ioreq->postsync do the flush after the request has > completed instead of doing it as soon as it has been submitted?Indeed, that''s yet another bug in the implementation. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Nov-26 15:25 UTC
Re: [Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
On Fri, 26 Nov 2010, Kevin Wolf wrote:> Am 24.11.2010 19:44, schrieb Christoph Hellwig: > > On Wed, Nov 24, 2010 at 10:18:40AM -0800, Jeremy Fitzhardinge wrote: > >> Linux wants is a useful thing to do and implement (especially since it > >> amounts to standardising the ?BSD extension). I''m not sure of their > >> precise semantics (esp WRT ordering), but I think its already OK. > > > > The nice bit is that a pure flush does not imply any odering at all. > > Which is how the current qemu driver implements the barrier requests > > anyway, so that needs some fixing. > > > >> (BTW, in case it wasn''t clear, we''re seriously considering - but not yet > >> committed to - using qemu as the primary PV block backend for Xen > >> instead of submitting the existing blkback code for upstream. We still > >> need to do some proper testing and measuring to make sure it stacks up > >> OK, and work out how it would fit together with the rest of the > >> management stack. But so far it looks promising.) > > > > Good to know. Besides the issue with barriers mentioned above there''s > > a few things that need addressing in xen_disk, if you (or Stefano or > > Daniel) are interested: > > > > - remove the syncwrite tunable, as this is handled by the underlying > > posix I/O code if needed by using O_DSYNC which is a lot more > > efficient. > > - check whatever the issue with the use_aio codepath is and make it > > the default. It should help the performance a lot. > > - Make sure to use bdrv_aio_flush for cache flushes in the aio > > codepath, currently it still uses plain synchronous flushes. > > I don''t think the synchronous flushes even do what they''re supposed to > do. Shouldn''t ioreq->postsync do the flush after the request has > completed instead of doing it as soon as it has been submitted? >I noticed that, it is one of the reasons I disabled aio for xen_disk in qemu-xen. That and the fact that the aio implementation in qemu-xen is still thread based. In fact I am waiting for Xen support to be in upstream qemu before doing any benchmarks, because I expect the performances to be better.> Let alone that, as you mentioned above, it doesn''t implement barrier > semantics at all. > > Oh, and it would be very nice if the return values of > bdrv_aio_readv/writev and bdrv_flush were checked. They can return errors.Indeed, my todo list is growing... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Dec-14 18:43 UTC
[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk
Gerd Hoffmann writes ("[Xen-devel] Re: [Qemu-devel] [PATCH] qemu and qemu-xen: support empty write barriers in xen_disk"):> On 11/24/10 14:08, Stefano Stabellini wrote: > > this patch can be applied to both qemu-xen and qemu and adds support > > for empty write barriers to xen_disk. > > Looks good. > > Acked-by: Gerd Hoffmann <kraxel@redhat.com>I have applied this patch to qemu-xen, thanks. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel