Stefano Stabellini
2013-Jun-27 18:16 UTC
[PATCH v2] xen_disk: support "direct-io-safe" backend option
Support backend option "direct-io-safe". This is documented as follows in the Xen backend specification: * direct-io-safe * Values: 0/1 (boolean) * Default Value: 0 * * The underlying storage is not affected by the direct IO memory * lifetime bug. See: * http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html * * Therefore this option gives the backend permission to use * O_DIRECT, notwithstanding that bug. * * That is, if this option is enabled, use of O_DIRECT is safe, * in circumstances where we would normally have avoided it as a * workaround for that bug. This option is not relevant for all * backends, and even not necessarily supported for those for * which it is relevant. A backend which knows that it is not * affected by the bug can ignore this option. * * This option doesn''t require a backend to use O_DIRECT, so it * should not be used to try to control the caching behaviour. Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the default flags passed to the qemu block layer. The original proposal for a "cache" backend option has been dropped because it was believed too wide, especially considering that at the moment the backend doesn''t have a way to tell the toolstack that it is capable of supporting it. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index 247f32f..727f433 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -93,6 +93,7 @@ struct XenBlkDev { char *type; char *dev; char *devtype; + bool directiosafe; const char *fileproto; const char *filename; int ring_ref; @@ -701,6 +702,7 @@ static int blk_init(struct XenDevice *xendev) { struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev); int info = 0; + char *directiosafe = NULL; /* read xenstore entries */ if (blkdev->params == NULL) { @@ -733,6 +735,8 @@ static int blk_init(struct XenDevice *xendev) if (blkdev->devtype == NULL) { blkdev->devtype = xenstore_read_be_str(&blkdev->xendev, "device-type"); } + directiosafe = xenstore_read_be_str(&blkdev->xendev, "direct-io-safe"); + blkdev->directiosafe = (directiosafe && atoi(directiosafe)); /* do we have all we need? */ if (blkdev->params == NULL || @@ -760,6 +764,8 @@ static int blk_init(struct XenDevice *xendev) xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1); xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1); xenstore_write_be_int(&blkdev->xendev, "info", info); + + g_free(directiosafe); return 0; out_error: @@ -773,6 +779,8 @@ out_error: blkdev->dev = NULL; g_free(blkdev->devtype); blkdev->devtype = NULL; + g_free(directiosafe); + blkdev->directiosafe = false; return -1; } @@ -783,7 +791,11 @@ static int blk_connect(struct XenDevice *xendev) bool readonly = true; /* read-only ? */ - qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO; + if (blkdev->directiosafe) { + qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO; + } else { + qflags = BDRV_O_CACHE_WB; + } if (strcmp(blkdev->mode, "w") == 0) { qflags |= BDRV_O_RDWR; readonly = false;
Paolo Bonzini
2013-Jun-28 07:56 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
Il 27/06/2013 20:16, Stefano Stabellini ha scritto:> Support backend option "direct-io-safe". This is documented as > follows in the Xen backend specification: > > * direct-io-safe > * Values: 0/1 (boolean) > * Default Value: 0 > * > * The underlying storage is not affected by the direct IO memory > * lifetime bug. See: > * http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html > * > * Therefore this option gives the backend permission to use > * O_DIRECT, notwithstanding that bug. > * > * That is, if this option is enabled, use of O_DIRECT is safe, > * in circumstances where we would normally have avoided it as a > * workaround for that bug. This option is not relevant for all > * backends, and even not necessarily supported for those for > * which it is relevant. A backend which knows that it is not > * affected by the bug can ignore this option. > * > * This option doesn''t require a backend to use O_DIRECT, so it > * should not be used to try to control the caching behaviour. > > Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the > default flags passed to the qemu block layer. > > The original proposal for a "cache" backend option has been dropped > because it was believed too wide, especially considering that at the > moment the backend doesn''t have a way to tell the toolstack that it is > capable of supporting it.Given how rusty my xenstore-fu is, I''ll ask the obvious: the frontend cannot write to it, can it? Paolo
Alex Bligh
2013-Jun-28 08:48 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
Stefano, --On 27 June 2013 19:16:30 +0100 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> * Therefore this option gives the backend permission to use > * O_DIRECT, notwithstanding that bug.Looks useful. Are you planning to do this for both emulated and pv disks? -- Alex Bligh
Ian Jackson
2013-Jun-28 10:44 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
Alex Bligh writes ("Re: [Qemu-devel] [PATCH v2] xen_disk: support "direct-io-safe" backend option"):> Stefano, > --On 27 June 2013 19:16:30 +0100 Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > > * Therefore this option gives the backend permission to use > > * O_DIRECT, notwithstanding that bug. > > Looks useful. Are you planning to do this for both emulated and pv > disks?Emulated disks don''t have the same problem because they don''t try to use O_DIRECT on pages shared with the guest via the Xen grant table mechanism. Ian.
Ian Jackson
2013-Jun-28 10:54 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
Paolo Bonzini writes ("Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option"):> Il 27/06/2013 20:16, Stefano Stabellini ha scritto: > > The original proposal for a "cache" backend option has been dropped > > because it was believed too wide, especially considering that at the > > moment the backend doesn''t have a way to tell the toolstack that it is > > capable of supporting it. > > Given how rusty my xenstore-fu is, I''ll ask the obvious: the frontend > cannot write to it, can it?No, the backend directory is writeable only by the toolstack and driver domains. Ian.
Stefano Stabellini
2013-Jun-28 10:56 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
On Fri, 28 Jun 2013, Alex Bligh wrote:> Stefano, > > --On 27 June 2013 19:16:30 +0100 Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > > > * Therefore this option gives the backend permission to use > > * O_DIRECT, notwithstanding that bug. > > Looks useful. Are you planning to do this for both emulated and pv > disks?This is PV only, at least for the moment: emulated disks always use writeback caching. From the performance point of view, making this change for IDE disks is not very important (because IDE is slow anyway).
Stefano Stabellini
2013-Jun-28 10:57 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
On Fri, 28 Jun 2013, Paolo Bonzini wrote:> Il 27/06/2013 20:16, Stefano Stabellini ha scritto: > > Support backend option "direct-io-safe". This is documented as > > follows in the Xen backend specification: > > > > * direct-io-safe > > * Values: 0/1 (boolean) > > * Default Value: 0 > > * > > * The underlying storage is not affected by the direct IO memory > > * lifetime bug. See: > > * http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html > > * > > * Therefore this option gives the backend permission to use > > * O_DIRECT, notwithstanding that bug. > > * > > * That is, if this option is enabled, use of O_DIRECT is safe, > > * in circumstances where we would normally have avoided it as a > > * workaround for that bug. This option is not relevant for all > > * backends, and even not necessarily supported for those for > > * which it is relevant. A backend which knows that it is not > > * affected by the bug can ignore this option. > > * > > * This option doesn''t require a backend to use O_DIRECT, so it > > * should not be used to try to control the caching behaviour. > > > > Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the > > default flags passed to the qemu block layer. > > > > The original proposal for a "cache" backend option has been dropped > > because it was believed too wide, especially considering that at the > > moment the backend doesn''t have a way to tell the toolstack that it is > > capable of supporting it. > > Given how rusty my xenstore-fu is, I''ll ask the obvious: the frontend > cannot write to it, can it?Nope, this option lives under the backend path, that is read-only for the frontend
Stefano Stabellini
2013-Jun-28 16:16 UTC
Re: [Qemu-devel] [PATCH v2] xen_disk: support "direct-io-safe" backend option
On Thu, 27 Jun 2013, Stefano Stabellini wrote:> Support backend option "direct-io-safe". This is documented as > follows in the Xen backend specification: > > * direct-io-safe > * Values: 0/1 (boolean) > * Default Value: 0 > * > * The underlying storage is not affected by the direct IO memory > * lifetime bug. See: > * http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html > * > * Therefore this option gives the backend permission to use > * O_DIRECT, notwithstanding that bug. > * > * That is, if this option is enabled, use of O_DIRECT is safe, > * in circumstances where we would normally have avoided it as a > * workaround for that bug. This option is not relevant for all > * backends, and even not necessarily supported for those for > * which it is relevant. A backend which knows that it is not > * affected by the bug can ignore this option. > * > * This option doesn''t require a backend to use O_DIRECT, so it > * should not be used to try to control the caching behaviour. > > Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the > default flags passed to the qemu block layer. > > The original proposal for a "cache" backend option has been dropped > because it was believed too wide, especially considering that at the > moment the backend doesn''t have a way to tell the toolstack that it is > capable of supporting it. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>George, should I go ahead and commit to the qemu-xen tree?> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > index 247f32f..727f433 100644 > --- a/hw/block/xen_disk.c > +++ b/hw/block/xen_disk.c > @@ -93,6 +93,7 @@ struct XenBlkDev { > char *type; > char *dev; > char *devtype; > + bool directiosafe; > const char *fileproto; > const char *filename; > int ring_ref; > @@ -701,6 +702,7 @@ static int blk_init(struct XenDevice *xendev) > { > struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev); > int info = 0; > + char *directiosafe = NULL; > > /* read xenstore entries */ > if (blkdev->params == NULL) { > @@ -733,6 +735,8 @@ static int blk_init(struct XenDevice *xendev) > if (blkdev->devtype == NULL) { > blkdev->devtype = xenstore_read_be_str(&blkdev->xendev, "device-type"); > } > + directiosafe = xenstore_read_be_str(&blkdev->xendev, "direct-io-safe"); > + blkdev->directiosafe = (directiosafe && atoi(directiosafe)); > > /* do we have all we need? */ > if (blkdev->params == NULL || > @@ -760,6 +764,8 @@ static int blk_init(struct XenDevice *xendev) > xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1); > xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1); > xenstore_write_be_int(&blkdev->xendev, "info", info); > + > + g_free(directiosafe); > return 0; > > out_error: > @@ -773,6 +779,8 @@ out_error: > blkdev->dev = NULL; > g_free(blkdev->devtype); > blkdev->devtype = NULL; > + g_free(directiosafe); > + blkdev->directiosafe = false; > return -1; > } > > @@ -783,7 +791,11 @@ static int blk_connect(struct XenDevice *xendev) > bool readonly = true; > > /* read-only ? */ > - qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO; > + if (blkdev->directiosafe) { > + qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO; > + } else { > + qflags = BDRV_O_CACHE_WB; > + } > if (strcmp(blkdev->mode, "w") == 0) { > qflags |= BDRV_O_RDWR; > readonly = false; >
Alex Bligh
2013-Jun-28 16:17 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
--On 28 June 2013 11:44:41 +0100 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:>> Looks useful. Are you planning to do this for both emulated and pv >> disks? > > Emulated disks don''t have the same problem because they don''t try to > use O_DIRECT on pages shared with the guest via the Xen grant table > mechanism.I should have been more specific. The original thread maintained emulated disks always had O_DIRECT turned off, despite the fact the rationale for using O_DIRECT for PV disks was that not using O_DIRECT in some circumstances might be unsafe, because it was the only way to get any decent performance out of them. I think we ran the ''no O_DIRECT might be unsafe'' argument to ground, but if the rationale for Stefano''s patch is not just speed but additional safety (for instance against the host dying and losing the page cache for file systems that have barriers switched off), then there is an argument to use it for emulated disks too. But as Stefano says: --On 28 June 2013 11:56:29 +0100 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> This is PV only, at least for the moment: emulated disks always use > writeback caching. > From the performance point of view, making this change for IDE disks is > not very important (because IDE is slow anyway).... perhaps ''who cares''. -- Alex Bligh
Paolo Bonzini
2013-Jun-28 16:26 UTC
Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option
Il 28/06/2013 18:17, Alex Bligh ha scritto:> > But as Stefano says: > > --On 28 June 2013 11:56:29 +0100 Stefano Stabellini > <stefano.stabellini@eu.citrix.com> wrote: > >> This is PV only, at least for the moment: emulated disks always use >> writeback caching. >> From the performance point of view, making this change for IDE disks is >> not very important (because IDE is slow anyway). > > ... perhaps ''who cares''.There are plenty of emulated devices that are not that slow, though perhaps Xen management does not support them: AHCI, MegaSAS, VMware pvscsi, and of course virtio-{blk,scsi}... Paolo
George Dunlap
2013-Jun-28 16:51 UTC
Re: [Qemu-devel] [PATCH v2] xen_disk: support "direct-io-safe" backend option
On 28/06/13 17:16, Stefano Stabellini wrote:> On Thu, 27 Jun 2013, Stefano Stabellini wrote: >> Support backend option "direct-io-safe". This is documented as >> follows in the Xen backend specification: >> >> * direct-io-safe >> * Values: 0/1 (boolean) >> * Default Value: 0 >> * >> * The underlying storage is not affected by the direct IO memory >> * lifetime bug. See: >> * http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html >> * >> * Therefore this option gives the backend permission to use >> * O_DIRECT, notwithstanding that bug. >> * >> * That is, if this option is enabled, use of O_DIRECT is safe, >> * in circumstances where we would normally have avoided it as a >> * workaround for that bug. This option is not relevant for all >> * backends, and even not necessarily supported for those for >> * which it is relevant. A backend which knows that it is not >> * affected by the bug can ignore this option. >> * >> * This option doesn''t require a backend to use O_DIRECT, so it >> * should not be used to try to control the caching behaviour. >> >> Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the >> default flags passed to the qemu block layer. >> >> The original proposal for a "cache" backend option has been dropped >> because it was believed too wide, especially considering that at the >> moment the backend doesn''t have a way to tell the toolstack that it is >> capable of supporting it. >> >> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > George, should I go ahead and commit to the qemu-xen tree?Yes, I think it''s pretty important to have the override in there: Acked-by: George Dunlap <george.dunlap@eu.citrix.com>