Richard W.M. Jones
2021-Jan-26 15:03 UTC
[Libguestfs] [PATCH 4/6] v2v: rhv-upload-plugin: Support multiple connections
On Tue, Jan 26, 2021 at 08:51:59AM -0600, Eric Blake wrote:> However, that's for the general case (when multiple writers are sending > requests that can overlap one another, and when you have to worry about > read consistency). But when copying an image, you have a special case: > each writer is only touching distinct subsets of the overall image, and > you are not reading from the image. As long as those subsets do not > overlap, and you do not trigger read-modify-write actions, you really > don't care whether flushes are consistent between writers, because there > are no consistency issues in the first place. > > So we _might_ be able to optimize nbdcopy for the case of parallel > writes even when the NBD server does not advertise CAN_MULTI_CONN; the > writers can either use FLAG_FUA for every write, or _each_ write without > flushes and perform a final NBD_CMD_FLUSH; as long as EACH writer > performs a final flush before disconnecting (rather than trying to > optimize to just one writer flushing), you will be guaranteed that the > entire copied image is now consistent, even though you could not > guarantee flush consistency during the parallel writes.It happens that nbdcopy does call nbd_flush on each handle. (A case of being "right but for the wrong reasons"). However we don't do multi-conn unless the server advertises it, and in light of what you say above perhaps we should.> > https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md > > > > "bit 8, NBD_FLAG_CAN_MULTI_CONN: Indicates that the server operates > > entirely without cache, or that the cache it uses is shared among > > all connections to the given device. In particular, if this flag is > > present, then the effects of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA MUST > > be visible across all connections when the server sends its reply to > > that command to the client. In the absence of this flag, clients > > SHOULD NOT multiplex their commands over more than one connection to > > the export."Although the text only mentions flush & FUA, I wonder if there's another case where multi-conn should not be advertised or used: That is where the block or sector size is larger than the size of writes, so writes are being turned into r/m/w cycles. Multiple adjacent (even non-overlapping) writes could then interfere with each other. This is vanishingly unlikely to affect nbdcopy which uses a huge block size, but would it be a problem in the NBD protocol itself? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Eric Blake
2021-Jan-26 15:20 UTC
[Libguestfs] [PATCH 4/6] v2v: rhv-upload-plugin: Support multiple connections
On 1/26/21 9:03 AM, Richard W.M. Jones wrote:>>> https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md >>> >>> "bit 8, NBD_FLAG_CAN_MULTI_CONN: Indicates that the server operates >>> entirely without cache, or that the cache it uses is shared among >>> all connections to the given device. In particular, if this flag is >>> present, then the effects of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA MUST >>> be visible across all connections when the server sends its reply to >>> that command to the client. In the absence of this flag, clients >>> SHOULD NOT multiplex their commands over more than one connection to >>> the export." > > Although the text only mentions flush & FUA, I wonder if there's > another case where multi-conn should not be advertised or used: That > is where the block or sector size is larger than the size of writes, > so writes are being turned into r/m/w cycles. Multiple adjacent (even > non-overlapping) writes could then interfere with each other.That's the complication that explains why qemu-nbd does not advertise it on writeable connections. For proper cross-thread consistency when the bit is set, and assuming a situation where there are 2 writers and 1 reader client, if the two writers both issue an overlapping write request with CMD_FLAG_FUA or followed by NBD_CMD_FLUSH, then the reader must see exactly one of five states: the pre-write state, the state after just writer 1, the state after just writer 2, the state after both writers with writer 1 completing before writer 2 starts, or the state after both writers with writer 2 completing before writer 1 starts. Any other result would indicate write shredding (where the two writers were not properly interlocked, and therefore the second writer partially undoes some of the effects of the first).> This is vanishingly unlikely to affect nbdcopy which uses a huge block > size, but would it be a problem in the NBD protocol itself?I'm not convinced it is a hole in the protocol, so much as a reason that CAN_MULTI_CONN was added in the first place (if you don't advertise the bit, clients can't assume consistency at the server, and so have to do their own locking or otherwise avoid overlapping writes client-side to ensure consistency; if you do advertise the bits, clients can take shortcuts). But maybe you do have a point that when a server advertises block sizes, a client should pay attention to the recommended block size, and assume anything smaller than that may trigger rmw that might not be safe even if CAN_MULTI_CONN was advertised. Should we ask for clarification on the NBD mailing list? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org