Richard W.M. Jones
2022-May-21 14:33 UTC
[Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency
On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote:> Hi, > > How does the blocksize filter take into account writes that end-up > overlapping due to read-modify-write cycles? > > Specifically, suppose there are two non-overlapping writes handled > by two different threads, that, due to blocksize requirements, > overlap when expanded. I think there is a risk that one thread may > partially undo the work of the other here. > > Looking at the code, it seems that writes of unaligned heads and > tails are protected with a global lock., but writes of aligned data > can occur concurrently.I agree. Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no other filters impose thread model limits, the blocksize filter does not limit the thread model, so the thread model of nbdkit would also be NBDKIT_THREAD_MODEL_PARALLEL. That means that two writes either on different connections or pipelined on the same connection could happen at the same time. ?blocksize_pwrite? would be called concurrently for the two requests.> However, does this not miss the case where there is one unaligned > write that overlaps with an aligned one? > > For example, with blocksize 10, we could have: > > Thread 1: receives write request for offset=0, size=10 > Thread 2: receives write request for offset=4, size=16 > Thread 1: acquires lock, reads bytes 0-4 > Thread 2: does aligned write (no locking needed), writes bytes 0-10 > Thread 1: writes bytes 0-10, overwriting data from Thread 2I believe this analysis is correct. (CC'd to Eric who knows a lot more about this.) However I don't think it's a bug. If a client doesn't want writes to squash each other, then it shouldn't send overlapping requests. I bet the same thing happens with an SSD. NBD_CMD_FLAG_FUA is provided for clients that wish to ensure that a write has been committed before sending another request. Do you have an example of a client which sends overlapping requests and depends on particular behaviour of the server? You may be able to get it to work by using nbdkit-noparallel-filter which can be used to serialize nbdkit. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Nikolaus Rath
2022-May-21 16:37 UTC
[Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency
On May 21 2022, "Richard W.M. Jones" <rjones at redhat.com> wrote:> On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote: >> Hi, >> >> How does the blocksize filter take into account writes that end-up >> overlapping due to read-modify-write cycles? >> >> Specifically, suppose there are two non-overlapping writes handled >> by two different threads, that, due to blocksize requirements, >> overlap when expanded. I think there is a risk that one thread may >> partially undo the work of the other here. >> >> Looking at the code, it seems that writes of unaligned heads and >> tails are protected with a global lock., but writes of aligned data >> can occur concurrently. > > I agree. > > Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no > other filters impose thread model limits, the blocksize filter does > not limit the thread model, so the thread model of nbdkit would also > be NBDKIT_THREAD_MODEL_PARALLEL. > > That means that two writes either on different connections or > pipelined on the same connection could happen at the same time. > ?blocksize_pwrite? would be called concurrently for the two requests. > >> However, does this not miss the case where there is one unaligned >> write that overlaps with an aligned one? >> >> For example, with blocksize 10, we could have: >> >> Thread 1: receives write request for offset=0, size=10 >> Thread 2: receives write request for offset=4, size=16 >> Thread 1: acquires lock, reads bytes 0-4 >> Thread 2: does aligned write (no locking needed), writes bytes 0-10 >> Thread 1: writes bytes 0-10, overwriting data from Thread 2 > > I believe this analysis is correct. (CC'd to Eric who knows a lot > more about this.) > > However I don't think it's a bug. If a client doesn't want writes to > squash each other, then it shouldn't send overlapping requests. I bet > the same thing happens with an SSD.But the requests are not overlapping from the client point of view. They only become overlapping when the server applies its read-modify-write operation to align them to the blocksize. I think you elsewhere said that the blocksize reported by the NBD server is only a preferred blocksize, so I'd be surprised if not following this "preference" results in data corruption.> NBD_CMD_FLAG_FUA is provided for clients that wish to ensure that a > write has been committed before sending another request. > > Do you have an example of a client which sends overlapping requests > and depends on particular behaviour of the server? You may be able to > get it to work by using nbdkit-noparallel-filter which can be used to > serialize nbdkit.I'm working with the kernel's NBD client, and it would explain all the mysterious data corruption issues that I've seen with the S3 plugin. But I have not yet confirmed definitely that this is the root cause. For now, I'll avoid the blocksize filter and instead do the read-modify-write in the plugin with proper locking. If that fixes it, then I think we can conclude that the kernel is sending such requests (but, as I said above, I would not consider them overlapping nor would I consider this a bug). Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.?