Eric Blake
2023-Apr-27 15:35 UTC
[Libguestfs] [PATCH nbdkit 2/2] allocators: sparse: Split the highly contended mutex
On Thu, Apr 27, 2023 at 03:19:12PM +0100, Richard W.M. Jones wrote:> We've long known that nbdkit-memory-plugin with the default sparse > array allocator suffers because a global lock must be taken whenever > any read or write operation is performed. This commit aims to safely > improve performance by converting the lock into a read-write lock.I searched for other approaches, and found this to be an interesting read: https://stackoverflow.com/questions/2407558/pthreads-reader-writer-locks-upgrading-read-lock-to-write-lock One of the suggestions there is using fcntl() locking (which does support upgrade attempts) instead of pthread_rwlock*. Might be interesting to code up and compare an approach along those lines.> +++ b/common/allocators/sparse.c > @@ -129,11 +129,20 @@ DEFINE_VECTOR_TYPE (l1_dir, struct l1_entry); > struct sparse_array { > struct allocator a; /* Must come first. */ > > - /* This lock is highly contended. When hammering the memory plugin > - * with 8 fio threads, about 30% of the total system time was taken > - * just waiting for this lock. Fixing this is quite hard. > + /* The shared (read) lock must be held if you just want to access > + * the data without modifying any of the L1/L2 metadata or > + * allocating or freeing any page. > + * > + * To modify the L1/L2 metadata including allocating or freeing any > + * page you must hold the exclusive (write) lock. > + * > + * Because POSIX rwlocks are not upgradable this presents a problem. > + * We solve it below by speculatively performing the request while > + * holding the shared lock, but if we run into an operation that > + * needs to update the metadata, we restart the entire request > + * holding the exclusive lock. > */ > - pthread_mutex_t lock; > + pthread_rwlock_t lock;But as written, this does look like a nice division of labor that reduces serialization in the common case. And yes, it does look like posix_rwlock is a bit heavier-weight than bare mutex, which explains the slowdown on the read-heavy workload even while improving the write-heavy workload. Reviewed-by: Eric Blake <eblake at redhat.com> -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Richard W.M. Jones
2023-Apr-28 11:25 UTC
[Libguestfs] [PATCH nbdkit 2/2] allocators: sparse: Split the highly contended mutex
On Thu, Apr 27, 2023 at 10:35:30AM -0500, Eric Blake wrote:> Reviewed-by: Eric Blake <eblake at redhat.com>Thanks, I pushed the two patches as e397f0643..9b5b93a07 About using fcntl locks: it seems troublesome because we'd then need to have an actual file to lock (even one in /dev/shm). That makes the implementation significantly more complicated in unexpected ways. I'm thinking about error handling as we'd need to handle file errors rather than basically aborting on pthread error as we do now. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v