Richard W.M. Jones
2020-Aug-25 19:21 UTC
Re: [Libguestfs] [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
On Tue, Aug 25, 2020 at 10:46:57AM -0500, Eric Blake wrote:> + const char *nbdkit_string_intern (const char *str); > + > +Returns a copy of str, so that the caller may reclaim str and use the > +copy in its place. If the copy is created outside the scope of a > +connection (such as during C<.load>), the lifetime of the copy will > +last at least through C<.unload>; if it was created within a > +connection (such as during C<.preconnect>, C<.default_export>, or > +C<.open>), the lifetime will last at least through C<.close>.Interesting - this isn't exactly how I imagined this would work. Would it make sense to intern strings for the lifetime of the server no matter where they are allocated from? Another suggestion would be for the function to take a flag indicating preferred lifetime, but that presents additional complications for the caller. Also I imagined that this function would be a true (LISP-style) INTERN, in that it would return a single pointer if called twice with the two equal strings. But I see from the implementation later on that this isn't the case.> - keys[optind] = strndup (argv[optind], n); > - if (keys[optind] == NULL) { > - perror ("strndup"); > + CLEANUP_FREE char *key = strndup (argv[optind], n); > + const char *safekey = nbdkit_string_intern (key); > + if (safekey == NULL) > exit (EXIT_FAILURE);It seems it might also be nice to have a "strndup" version of the intern function. I'm not completely sure which of these behaviours is better. Ideally back when I started writing nbdkit I would have used a pool allocator, but that's all water under the bridge now. So none of the above indicates I have a preference for a particular implementation, just ideas for further discussion. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Eric Blake
2020-Aug-25 20:32 UTC
Re: [Libguestfs] [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
On 8/25/20 2:21 PM, Richard W.M. Jones wrote:> On Tue, Aug 25, 2020 at 10:46:57AM -0500, Eric Blake wrote: >> + const char *nbdkit_string_intern (const char *str); >> + >> +Returns a copy of str, so that the caller may reclaim str and use the >> +copy in its place. If the copy is created outside the scope of a >> +connection (such as during C<.load>), the lifetime of the copy will >> +last at least through C<.unload>; if it was created within a >> +connection (such as during C<.preconnect>, C<.default_export>, or >> +C<.open>), the lifetime will last at least through C<.close>. > > Interesting - this isn't exactly how I imagined this would work. > Would it make sense to intern strings for the lifetime of the server > no matter where they are allocated from?If you allocate 100 bytes for every connection, but no longer reference the memory after .close, then intern'ing for the lifetime of the server will add up to a megabyte of usage after 10,000 guests even if most of those clients are no longer connected; while intern'ing just for the lifetime of the connection does not. It's actually pretty decent heuristics (and thus I didn't give the API a knob) - if you are allocating a string in response to .load, .config, or other callback outside of a connection context, you probably want that string to last until .unload; if you are allocating a string in response to .list_exports, .default_export, .open, or anything else in response to a given client connection, you probably don't need that string any longer than the .close of that same client, except maybe for the first client (but even then, you could use .get_ready to do that sort of global setup without waiting for the first connection).> > Another suggestion would be for the function to take a flag indicating > preferred lifetime, but that presents additional complications for the > caller.Yeah, I did think about a flag, and decided it wasn't worth it. When called outside the context of a connection, the API would fail if you request in-connection lifetime; and when called inside the context of a connection, requesting an out-of-connection lifetime gets us right back to that appearance of a memory leak if it is repeated for every connection (even though we aren't actually leaking anything, and valgrind is happy, the constant growth of memory usage as more clients come and go is annoying); and if it is not repeated for every connection, it should be just as easy to do the initialization during .get_ready instead of needing a knob.> > Also I imagined that this function would be a true (LISP-style) > INTERN, in that it would return a single pointer if called twice with > the two equal strings. But I see from the implementation later on > that this isn't the case.It isn't now, but the contract doesn't prevent it from doing so in the future. Callers are only promised that the lifetime is at least as long as .close, but nothing would stop us from making it longer if that made our implementation use memory more compactly. But right now, I don't even bother hashing or searching for duplicates, but always append at the end. Performance wise, it may mean a bit more memory usage and/or cleanup time, but so far there were few enough uses that the main benefit (of a lifetime guaranteed by nbdkit, with automatic cleanup without bookkeeping in the plugin) seemed to outweigh the need for more complexity.> >> - keys[optind] = strndup (argv[optind], n); >> - if (keys[optind] == NULL) { >> - perror ("strndup"); >> + CLEANUP_FREE char *key = strndup (argv[optind], n); >> + const char *safekey = nbdkit_string_intern (key); >> + if (safekey == NULL) >> exit (EXIT_FAILURE); > > It seems it might also be nice to have a "strndup" version of the > intern function.Or even a memdup. Yeah, those are possibilities as well. Doing all three (strdup, strndup, memdup) up front isn't hard. If API proliferation is a problem, you can combine strdup and strndup by accepting -1 as a length to mean stop at NUL termination; but I'm less worried about API proliferation and more about ease-of-use.> > I'm not completely sure which of these behaviours is better. Ideally > back when I started writing nbdkit I would have used a pool allocator, > but that's all water under the bridge now. So none of the above > indicates I have a preference for a particular implementation, just > ideas for further discussion.The nice part of the API as proposed is that a pool allocator for intern'd strings is still viable! Well, we have to be careful that a realloc doesn't invalidate prior return values, but nothing is requiring us to do a fresh malloc() per intern, rather than just carving out a slice of a larger pool when room remains and only malloc'ing large chunks (maybe 64k) when room is not available; where cleanup at the end of a connection or server lifetime only has to free the chunks instead of individual intern'd strings within those chunks. In contrast, if we had instead proposed an API where the user transfers ownership of a malloc()d string into the server, and the server is now responsible for calling free(), is much more constrained (we can't change that interface without breaking existing clients). Thus, avoiding a hard dependence on malloc() in our ABI is a good thing. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2020-Aug-26 09:14 UTC
Re: [Libguestfs] [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
OK, ACK in that case. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Eric Blake
2020-Aug-26 13:33 UTC
Re: [Libguestfs] [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
On 8/25/20 3:32 PM, Eric Blake wrote:>>> - keys[optind] = strndup (argv[optind], n); >>> - if (keys[optind] == NULL) { >>> - perror ("strndup"); >>> + CLEANUP_FREE char *key = strndup (argv[optind], n); >>> + const char *safekey = nbdkit_string_intern (key); >>> + if (safekey == NULL) >>> exit (EXIT_FAILURE); >> >> It seems it might also be nice to have a "strndup" version of the >> intern function. > > Or even a memdup. Yeah, those are possibilities as well. Doing all > three (strdup, strndup, memdup) up front isn't hard. If API > proliferation is a problem, you can combine strdup and strndup by > accepting -1 as a length to mean stop at NUL termination; but I'm less > worried about API proliferation and more about ease-of-use.Time for some name bike-shedding. How about: nbdkit_strdup_intern (const char *) nbdkit_strndup_intern (const char *s, size_t) and if we need something that permits embedded NUL (right now I don't have such a use) we could add nbdkit_memdup_intern (const void *s, size_t) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Possibly Parallel Threads
- Re: [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
- Re: [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
- Re: [nbdkit PATCH 3/5] api: Add nbdkit_string_intern helper
- [PATCH nbdkit] server/public.c: Uninline nbdkit_strdup_intern to avoid compiler warning.
- Re: [PATCH nbdkit] server/public.c: Uninline nbdkit_strdup_intern to avoid compiler warning.