Shaun McDowell
2018-Jan-19 17:41 UTC
Re: [Libguestfs] [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
Our cbdkit was branched from 1.1.12 with some patches from 1.1.13 added as well. The project has morphed significantly enough that a direct diff or merging between the two would not be feasible. Even the structure of the project directory and build has been changed to be in line with our other internal projects. I have uploaded the entire cbdkit source to our github at https://github.com/dev-cloudbd/cbdkit The relevant files are include/cbdkit-plugin.h src/connections.c src/plugins.c Specifically, the connections.c functions recv_request send_reply and the plugins.c functions plugin_pread plugin_pwrite cbdkit_async_reply cbdkit_async_reply_read cbdkit_async_reply_error On Fri, Jan 19, 2018 at 12:05 PM, Eric Blake <eblake@redhat.com> wrote:> On 01/19/2018 10:56 AM, Shaun McDowell wrote: > > > Limitation: The kernel will (with today's default settings) typically be > > willing to send up to 128 requests of 128kB size to the driver in > parallel. > > We wanted to support 128 parallel read operations on different areas of > the > > disk without requiring 128 separate threads and connections for the > driver. > > Right now in nbdkit that is impossible. The main loop in connection.c > will > > pull an nbd request off the socket and block until that read request is > > complete before sending a response and getting the next request, blocking > > other requests on the socket unless running X connections/threads in > > parallel. > > What version nbdkit are you using? We recently added parallel reads in > 1.1.17 (although some minor fixes went in later; current version is > 1.1.25) that should allow you to have a single socket serving multiple > requests in parallel, in response to your setting of nbdkit's --thread > option, and if your plugin is truly parallel (nbdkit now ships both a > 'file' and 'nbd' plugin that are truly parallel). > > > Change: We introduced an additional set of functions to the nbdkit_plugin > > struct that supports asynchronous handling of the requests and a few > helper > > functions for the plugin to use to respond when it has finished the > > request. This is very similar to the fuse filesystem low level api (async > > supported) vs the high level fuse fs api (sync only). The design goal > here > > is that a single connection/thread on nbdkit can support as many requests > > in parallel as the plugin allows. The nbdkit side pulls the request off > the > > socket and if the async function pointer is non-null it will wrap the > > request in an op struct and use the async plugin call for read/write/etc > > capturing any buffer allocated and some op details into the op pointer. > The > > plugin async_* will start the op and return to nbdkit while the plugin > > works on it in the background. Nbdkit will then go back to the socket and > > begin the next request. Our plugin uses 1 connection/nbdkit thread and > 2-4 > > threads internally with boost asio over sockets to service the requests > to > > cloud. We are able to achieve ~1GB/s (yes bytes) read/write performance > to > > aws s3 from an ec2 node with 10 gigabit networking on < 100MB of memory > in > > the driver with this approach. > > Definitely post patches to the list! My work to add parallel support > via --threads still spawns multiple threads (the plugin is operating > concurrently on multiple threads) while yours is a different approach of > breaking things into smaller stages that piece together and possible > with fewer threads. > > > > > Here are some of what our function prototypes look like that support an > > asynchronous nbdkit model > > > > #define CBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS 2 > > #define CBDKIT_THREAD_MODEL_PARALLEL 3 > > #define CBDKIT_THREAD_MODEL_ASYNC 4 > > > > struct cbdkit_plugin { > > ... > > int (*pread) (void *handle, void *buf, uint32_t count, uint64_t > offset); > > int (*pwrite) (void *handle, const void *buf, uint32_t count, uint64_t > > offset); > > int (*flush) (void *handle); > > int (*trim) (void *handle, uint32_t count, uint64_t offset); > > int (*zero) (void *handle, uint32_t count, uint64_t offset, int > may_trim); > > > > int errno_is_preserved; > > > > void (*async_pread) (void *op, void *handle, void *buf, uint32_t count, > > uint64_t offset); > > void (*async_pwrite) (void *op, void *handle, const void *buf, uint32_t > > count, uint64_t offset, int fua); > > void (*async_flush) (void *op, void *handle); > > void (*async_trim) (void *op, void *handle, uint32_t count, uint64_t > > offset, int fua); > > void (*async_zero) (void *op, void *handle, uint32_t count, uint64_t > > offset, int may_trim, int fua); > > ... > > } > > > > Additionally there are a few helper functions for the plugin to use to > > respond back to nbdkit when the job is eventually finished. The plugin > > contract when using the async functions is that every async func > guarantees > > it will call an appropriate async_reply function. > > > > /* call for completion of successful async_pwrite, async_flush, > > async_trim, or async_zero */ > > extern CBDKIT_CXX_LANG_C int cbdkit_async_reply (void *op); > > /* call for complete of successful async_pread */ > > extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_read (void *op); > > /* call for completion of any async operation with error */ > > extern CBDKIT_CXX_LANG_C int cbdkit_async_reply_error (void *op, > uint32_t > > error); > > > > If there is any interest in supporting async ops in the next api version > I > > am able to share the entire modified nbdkit (cbdkit) source that we use > > that supports this async op framework, fua, as well as some buffer > pooling. > > Yes, please post patches. > > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3266 > Virtualization: qemu.org | libvirt.org > >
Richard W.M. Jones
2018-Jan-21 22:16 UTC
Re: [Libguestfs] [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
On Fri, Jan 19, 2018 at 12:41:01PM -0500, Shaun McDowell wrote: [...] Thanks for the other email, I thought it was very interesting and I'm glad that people are looking at the performance of nbdkit seriously.> Our cbdkit was branched from 1.1.12 with some patches from 1.1.13 added as > well. The project has morphed significantly enough that a direct diff or > merging between the two would not be feasible. Even the structure of the > project directory and build has been changed to be in line with our other > internal projects. > > I have uploaded the entire cbdkit source to our github at > https://github.com/dev-cloudbd/cbdkit[...] As you say the structure is quite a lot different, making it difficult for me to use any of this work at the moment. I do have a couple of questions though ... - Is a multithreaded approach (as Eric has now implemented) completely out of the question? I'm guessing the problem will be with memory usage for all of those thread stacks. You mentioned memory usage of 100MB and maybe that's important for your cloud use case? - Are you going to try to pull any changes from upstream nbdkit or is the fork now too large to try? I think if you wanted to get any of this upstream [it wasn't really clear if you do, and of course licensing-wise it's entirely optional for you to release any changes at all], but if you did then maybe see if there are simple patches that could go first. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Shaun McDowell
2018-Jan-21 23:46 UTC
Re: [Libguestfs] [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
128 threads is out of the question for us primarily due memory concerns when running multiple disks off one server and because of the nature of our ops (high latency, low cpu). Our plugin forwards reads/writes as requests to cloud storage and uses asio to service our socket ops. Our plugin ops are primarily network throughput bound or op latency bound but not cpu bound. 128 threads per disk would be mostly sitting idle and when we run 8 disks on a server we don't want 1000 threads around to have 1000 cloud reads going in parallel when 16 threads would have been more than enough for 8 disks. I have cloned nbdkit from upstream (yesterday) and I am attempting to redo the critical changes into the latest nbdkit code and will post a patch when something is workable. This may take me a few weeks as I am only really able to do this work on nights and weekends right now but we do want to contribute this upstream if possible so others may use a similar async approach to plugin requests rather than the many threads approach. The main changes I'm trying to make are 1) break the recv_request_send_reply function into two distinct steps (recv_request and send_reply) 2) create an alternative plugin struct (lowlevel? name up for suggestions, fuse uses fuse_lowlevel.h) that requires the plugin functions to call the reply helper funcs (the nbdkit internals don't send replies for this plugin unless an error is returned starting the op) 3) wrap the existing plugins.c function calls when not a lowlevel plugin to call the new send_reply func as we are not calling that in the recv_request func unless an error is hit before the plugin level 4) document new lowlevel plugin requirements - Shaun On Sun, Jan 21, 2018 at 5:16 PM, Richard W.M. Jones <rjones@redhat.com> wrote:> On Fri, Jan 19, 2018 at 12:41:01PM -0500, Shaun McDowell wrote: > [...] > > Thanks for the other email, I thought it was very interesting and I'm > glad that people are looking at the performance of nbdkit seriously. > > > Our cbdkit was branched from 1.1.12 with some patches from 1.1.13 added > as > > well. The project has morphed significantly enough that a direct diff or > > merging between the two would not be feasible. Even the structure of the > > project directory and build has been changed to be in line with our other > > internal projects. > > > > I have uploaded the entire cbdkit source to our github at > > https://github.com/dev-cloudbd/cbdkit > [...] > > As you say the structure is quite a lot different, making it difficult > for me to use any of this work at the moment. I do have a couple of > questions though ... > > - Is a multithreaded approach (as Eric has now implemented) completely > out of the question? I'm guessing the problem will be with memory > usage for all of those thread stacks. You mentioned memory usage of > 100MB and maybe that's important for your cloud use case? > > - Are you going to try to pull any changes from upstream nbdkit or is > the fork now too large to try? > > I think if you wanted to get any of this upstream [it wasn't really > clear if you do, and of course licensing-wise it's entirely optional > for you to release any changes at all], but if you did then maybe see > if there are simple patches that could go first. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~ > rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > Fedora Windows cross-compiler. Compile Windows programs, test, and > build Windows installers. Over 100 libraries supported. > http://fedoraproject.org/wiki/MinGW >
Maybe Matching Threads
- Re: [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
- Re: [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
- Re: [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
- Re: [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics
- [nbdkit PATCH v2 13/13] RFC: plugins: Add callbacks for FUA semantics