When I hotunplug a busy virtio-rng device or try to access hwrng attributes in non-smp guest, it gets stuck. My hotplug tests: | test 0: | hotunplug rng device from qemu monitor | | test 1: | guest) # dd if=/dev/hwrng of=/dev/null & | hotunplug rng device from qemu monitor | | test 2: | guest) # dd if=/dev/random of=/dev/null & | hotunplug rng device from qemu monitor | | test 4: | guest) # dd if=/dev/hwrng of=/dev/null & | cat /sys/devices/virtual/misc/hw_random/rng_* | | test 5: | guest) # dd if=/dev/hwrng of=/dev/null | cancel dd process after 10 seconds | guest) # dd if=/dev/hwrng of=/dev/null & | hotunplug rng device from qemu monitor | | test 6: | use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo V4: update patch 4 to fix corrupt, decrease last reference for triggering the cleanup, fix unregister race pointed by Herbert V3: initialize kref to 1 V2: added patch 2 to fix a deadlock, update current patch 3 to fix reference counting issue Amos Kong (1): hw_random: move some code out mutex_lock for avoiding underlying deadlock Rusty Russell (5): hw_random: place mutex around read functions and buffers. hw_random: use reference counts on each struct hwrng. hw_random: fix unregister race. hw_random: don't double-check old_rng. hw_random: don't init list element we're about to add to list. drivers/char/hw_random/core.c | 176 ++++++++++++++++++++++++++++++------------ include/linux/hw_random.h | 3 + 2 files changed, 129 insertions(+), 50 deletions(-) -- 1.9.3
Amos Kong
2014-Nov-03 15:56 UTC
[PATCH v4 1/6] hw_random: place mutex around read functions and buffers.
From: Rusty Russell <rusty at rustcorp.com.au> There's currently a big lock around everything, and it means that we can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current) while the rng is reading. This is a real problem when the rng is slow, or blocked (eg. virtio_rng with qemu's default /dev/random backend) This doesn't help (it leaves the current lock untouched), just adds a lock to protect the read function and the static buffers, in preparation for transition. Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> --- drivers/char/hw_random/core.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..b1b6042 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -53,7 +53,10 @@ static struct hwrng *current_rng; static struct task_struct *hwrng_fill; static LIST_HEAD(rng_list); +/* Protects rng_list and current_rng */ static DEFINE_MUTEX(rng_mutex); +/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */ +static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; static unsigned short current_quality; @@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng) unsigned char bytes[16]; int bytes_read; + mutex_lock(&reading_mutex); bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1); + mutex_unlock(&reading_mutex); if (bytes_read > 0) add_device_randomness(bytes, bytes_read); } @@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size, int wait) { int present; + BUG_ON(!mutex_is_locked(&reading_mutex)); if (rng->read) return rng->read(rng, (void *)buffer, size, wait); @@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, goto out_unlock; } + mutex_lock(&reading_mutex); if (!data_avail) { bytes_read = rng_get_data(current_rng, rng_buffer, rng_buffer_size(), !(filp->f_flags & O_NONBLOCK)); if (bytes_read < 0) { err = bytes_read; - goto out_unlock; + goto out_unlock_reading; } data_avail = bytes_read; } @@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (!data_avail) { if (filp->f_flags & O_NONBLOCK) { err = -EAGAIN; - goto out_unlock; + goto out_unlock_reading; } } else { len = data_avail; @@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, if (copy_to_user(buf + ret, rng_buffer + data_avail, len)) { err = -EFAULT; - goto out_unlock; + goto out_unlock_reading; } size -= len; @@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(&rng_mutex); + mutex_unlock(&reading_mutex); if (need_resched()) schedule_timeout_interruptible(1); @@ -208,6 +216,9 @@ out: out_unlock: mutex_unlock(&rng_mutex); goto out; +out_unlock_reading: + mutex_unlock(&reading_mutex); + goto out_unlock; } @@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused) while (!kthread_should_stop()) { if (!current_rng) break; + mutex_lock(&reading_mutex); rc = rng_get_data(current_rng, rng_fillbuf, rng_buffer_size(), 1); + mutex_unlock(&reading_mutex); if (rc <= 0) { pr_warn("hwrng: no data available\n"); msleep_interruptible(10000); continue; } + /* Outside lock, sure, but y'know: randomness. */ add_hwgenerator_randomness((void *)rng_fillbuf, rc, rc * current_quality * 8 >> 10); } -- 1.9.3
Amos Kong
2014-Nov-03 15:56 UTC
[PATCH v4 2/6] hw_random: move some code out mutex_lock for avoiding underlying deadlock
In next patch, we use reference counting for each struct hwrng, changing reference count also needs to take mutex_lock. Before releasing the lock, if we try to stop a kthread that waits to take the lock to reduce the referencing count, deadlock will occur. Signed-off-by: Amos Kong <akong at redhat.com> --- drivers/char/hw_random/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index b1b6042..a0905c8 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -474,12 +474,12 @@ void hwrng_unregister(struct hwrng *rng) } } if (list_empty(&rng_list)) { + mutex_unlock(&rng_mutex); unregister_miscdev(); if (hwrng_fill) kthread_stop(hwrng_fill); - } - - mutex_unlock(&rng_mutex); + } else + mutex_unlock(&rng_mutex); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.3
Amos Kong
2014-Nov-03 15:56 UTC
[PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
From: Rusty Russell <rusty at rustcorp.com.au> current_rng holds one reference, and we bump it every time we want to do a read from it. This means we only hold the rng_mutex to grab or drop a reference, so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't block on read of /dev/hwrng. Using a kref is overkill (we're always under the rng_mutex), but a standard pattern. This also solves the problem that the hwrng_fillfn thread was accessing current_rng without a lock, which could change (eg. to NULL) underneath it. v4: decrease last reference for triggering the cleanup v3: initialize kref (thanks Amos Kong) v2: fix missing put_rng() on exit path (thanks Amos Kong) Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> Signed-off-by: Amos Kong <akong at redhat.com> --- drivers/char/hw_random/core.c | 142 +++++++++++++++++++++++++++++------------- include/linux/hw_random.h | 2 + 2 files changed, 101 insertions(+), 43 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index a0905c8..27ad6b4 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -42,6 +42,7 @@ #include <linux/delay.h> #include <linux/slab.h> #include <linux/random.h> +#include <linux/err.h> #include <asm/uaccess.h> @@ -91,6 +92,65 @@ static void add_early_randomness(struct hwrng *rng) add_device_randomness(bytes, bytes_read); } +static inline void cleanup_rng(struct kref *kref) +{ + struct hwrng *rng = container_of(kref, struct hwrng, ref); + + if (rng->cleanup) + rng->cleanup(rng); +} + +static void set_current_rng(struct hwrng *rng) +{ + BUG_ON(!mutex_is_locked(&rng_mutex)); + kref_get(&rng->ref); + current_rng = rng; +} + +static void drop_current_rng(void) +{ + struct hwrng *rng = current_rng; + + BUG_ON(!mutex_is_locked(&rng_mutex)); + if (!current_rng) + return; + + /* release current_rng reference */ + kref_put(¤t_rng->ref, cleanup_rng); + current_rng = NULL; + + /* decrease last reference for triggering the cleanup */ + kref_put(&rng->ref, cleanup_rng); +} + +/* Returns ERR_PTR(), NULL or refcounted hwrng */ +static struct hwrng *get_current_rng(void) +{ + struct hwrng *rng; + + if (mutex_lock_interruptible(&rng_mutex)) + return ERR_PTR(-ERESTARTSYS); + + rng = current_rng; + if (rng) + kref_get(&rng->ref); + + mutex_unlock(&rng_mutex); + return rng; +} + +static void put_rng(struct hwrng *rng) +{ + /* + * Hold rng_mutex here so we serialize in case they set_current_rng + * on rng again immediately. + */ + mutex_lock(&rng_mutex); + if (rng) + kref_put(&rng->ref, cleanup_rng); + mutex_unlock(&rng_mutex); +} + static inline int hwrng_init(struct hwrng *rng) { if (rng->init) { @@ -110,13 +170,9 @@ static inline int hwrng_init(struct hwrng *rng) if (current_quality > 0 && !hwrng_fill) start_khwrngd(); - return 0; -} + kref_init(&rng->ref); -static inline void hwrng_cleanup(struct hwrng *rng) -{ - if (rng && rng->cleanup) - rng->cleanup(rng); + return 0; } static int rng_dev_open(struct inode *inode, struct file *filp) @@ -154,21 +210,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ssize_t ret = 0; int err = 0; int bytes_read, len; + struct hwrng *rng; while (size) { - if (mutex_lock_interruptible(&rng_mutex)) { - err = -ERESTARTSYS; + rng = get_current_rng(); + if (IS_ERR(rng)) { + err = PTR_ERR(rng); goto out; } - - if (!current_rng) { + if (!rng) { err = -ENODEV; - goto out_unlock; + goto out; } mutex_lock(&reading_mutex); if (!data_avail) { - bytes_read = rng_get_data(current_rng, rng_buffer, + bytes_read = rng_get_data(rng, rng_buffer, rng_buffer_size(), !(filp->f_flags & O_NONBLOCK)); if (bytes_read < 0) { @@ -200,8 +257,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, ret += len; } - mutex_unlock(&rng_mutex); mutex_unlock(&reading_mutex); + put_rng(rng); if (need_resched()) schedule_timeout_interruptible(1); @@ -213,12 +270,11 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } out: return ret ? : err; -out_unlock: - mutex_unlock(&rng_mutex); - goto out; + out_unlock_reading: mutex_unlock(&reading_mutex); - goto out_unlock; + put_rng(rng); + goto out; } @@ -257,8 +313,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev, err = hwrng_init(rng); if (err) break; - hwrng_cleanup(current_rng); - current_rng = rng; + drop_current_rng(); + set_current_rng(rng); err = 0; break; } @@ -272,17 +328,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev, struct device_attribute *attr, char *buf) { - int err; ssize_t ret; - const char *name = "none"; + struct hwrng *rng; - err = mutex_lock_interruptible(&rng_mutex); - if (err) - return -ERESTARTSYS; - if (current_rng) - name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); - mutex_unlock(&rng_mutex); + rng = get_current_rng(); + if (IS_ERR(rng)) + return PTR_ERR(rng); + + ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none"); + put_rng(rng); return ret; } @@ -357,12 +411,16 @@ static int hwrng_fillfn(void *unused) long rc; while (!kthread_should_stop()) { - if (!current_rng) + struct hwrng *rng; + + rng = get_current_rng(); + if (IS_ERR(rng) || !rng) break; mutex_lock(&reading_mutex); - rc = rng_get_data(current_rng, rng_fillbuf, + rc = rng_get_data(rng, rng_fillbuf, rng_buffer_size(), 1); mutex_unlock(&reading_mutex); + put_rng(rng); if (rc <= 0) { pr_warn("hwrng: no data available\n"); msleep_interruptible(10000); @@ -423,14 +481,13 @@ int hwrng_register(struct hwrng *rng) err = hwrng_init(rng); if (err) goto out_unlock; - current_rng = rng; + set_current_rng(rng); } err = 0; if (!old_rng) { err = register_miscdev(); if (err) { - hwrng_cleanup(rng); - current_rng = NULL; + drop_current_rng(); goto out_unlock; } } @@ -457,22 +514,21 @@ EXPORT_SYMBOL_GPL(hwrng_register); void hwrng_unregister(struct hwrng *rng) { - int err; - mutex_lock(&rng_mutex); list_del(&rng->list); if (current_rng == rng) { - hwrng_cleanup(rng); - if (list_empty(&rng_list)) { - current_rng = NULL; - } else { - current_rng = list_entry(rng_list.prev, struct hwrng, list); - err = hwrng_init(current_rng); - if (err) - current_rng = NULL; + drop_current_rng(); + if (!list_empty(&rng_list)) { + struct hwrng *tail; + + tail = list_entry(rng_list.prev, struct hwrng, list); + + if (hwrng_init(tail) == 0) + set_current_rng(tail); } } + if (list_empty(&rng_list)) { mutex_unlock(&rng_mutex); unregister_miscdev(); diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index 914bb08..c212e71 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -14,6 +14,7 @@ #include <linux/types.h> #include <linux/list.h> +#include <linux/kref.h> /** * struct hwrng - Hardware Random Number Generator driver @@ -44,6 +45,7 @@ struct hwrng { /* internal. */ struct list_head list; + struct kref ref; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.3
From: Rusty Russell <rusty at rustcorp.com.au> The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. v4: add cleanup_done flag to insure that cleanup is done Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> Signed-off-by: Amos Kong <akong at redhat.com> --- drivers/char/hw_random/core.c | 8 ++++++++ include/linux/hw_random.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 27ad6b4..c31bf91 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to "off" */ @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) if (rng->cleanup) rng->cleanup(rng); + rng->cleanup_done = true; + wake_up_all(&rng_done); } static void set_current_rng(struct hwrng *rng) @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) kthread_stop(hwrng_fill); } else mutex_unlock(&rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, rng->cleanup_done && + atomic_read(&rng->ref.refcount) == 0); + } EXPORT_SYMBOL_GPL(hwrng_unregister); diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index c212e71..7832e50 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -46,6 +46,7 @@ struct hwrng { /* internal. */ struct list_head list; struct kref ref; + bool cleanup_done; }; /** Register a new Hardware Random Number Generator driver. */ -- 1.9.3
From: Rusty Russell <rusty at rustcorp.com.au> Interesting anti-pattern. Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> --- drivers/char/hw_random/core.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c31bf91..fc5de7d 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -480,14 +480,13 @@ int hwrng_register(struct hwrng *rng) } old_rng = current_rng; + err = 0; if (!old_rng) { err = hwrng_init(rng); if (err) goto out_unlock; set_current_rng(rng); - } - err = 0; - if (!old_rng) { + err = register_miscdev(); if (err) { drop_current_rng(); -- 1.9.3
Amos Kong
2014-Nov-03 15:56 UTC
[PATCH v4 6/6] hw_random: don't init list element we're about to add to list.
From: Rusty Russell <rusty at rustcorp.com.au> Another interesting anti-pattern. Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> --- drivers/char/hw_random/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index fc5de7d..b2cc8a1 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -493,7 +493,6 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } } - INIT_LIST_HEAD(&rng->list); list_add_tail(&rng->list, &rng_list); if (old_rng && !rng->init) { -- 1.9.3
On Mon, Nov 03, 2014 at 11:56:24PM +0800, Amos Kong wrote:> > @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) > > if (rng->cleanup) > rng->cleanup(rng);You need a compiler barrier here to prevent reordering.> + rng->cleanup_done = true;Thanks, -- Email: Herbert Xu <herbert at gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Rusty Russell
2014-Nov-12 03:41 UTC
[PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
Amos Kong <akong at redhat.com> writes:> From: Rusty Russell <rusty at rustcorp.com.au> > > current_rng holds one reference, and we bump it every time we want > to do a read from it. > > This means we only hold the rng_mutex to grab or drop a reference, > so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't > block on read of /dev/hwrng. > > Using a kref is overkill (we're always under the rng_mutex), but > a standard pattern. > > This also solves the problem that the hwrng_fillfn thread was > accessing current_rng without a lock, which could change (eg. to NULL) > underneath it. > > v4: decrease last reference for triggering the cleanupThis doesn't make any sense:> +static void drop_current_rng(void) > +{ > + struct hwrng *rng = current_rng; > + > + BUG_ON(!mutex_is_locked(&rng_mutex)); > + if (!current_rng) > + return; > + > + /* release current_rng reference */ > + kref_put(¤t_rng->ref, cleanup_rng); > + current_rng = NULL; > + > + /* decrease last reference for triggering the cleanup */ > + kref_put(&rng->ref, cleanup_rng); > +}Why would it drop the refcount twice? This doesn't make sense. Hmm, because you added kref_init, which initializes the reference count to 1, you created this bug. Leave out the kref_init, and let it naturally be 0 (until, and if, it becomes current_rng). Add a comment if you want. Thanks, Rusty.
Amos Kong <akong at redhat.com> writes:> From: Rusty Russell <rusty at rustcorp.com.au> > > The previous patch added one potential problem: we can still be > reading from a hwrng when it's unregistered. Add a wait for zero > in the hwrng_unregister path. > > v4: add cleanup_done flag to insure that cleanup is doneThat's a bit weird. The usual pattern would be to hold a reference until we're actually finished, but this reference is a bit weird. We hold the mutex across cleanup, so we could grab that but we have to take care sleeping inside wait_event, otherwise Peter will have to fix my code again :) AFAICT the wake_woken() stuff isn't merged yet, so your patch will have to do for now.> @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref) > > if (rng->cleanup) > rng->cleanup(rng); > + rng->cleanup_done = true; > + wake_up_all(&rng_done); > } > > static void set_current_rng(struct hwrng *rng) > @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng) > kthread_stop(hwrng_fill); > } else > mutex_unlock(&rng_mutex); > + > + /* Just in case rng is reading right now, wait. */ > + wait_event(rng_done, rng->cleanup_done && > + atomic_read(&rng->ref.refcount) == 0); > +The atomic_read() isn't necessary here. However, you should probably init cleanup_done in hwrng_register(). (Probably noone does unregister then register, but let's be clear). Thanks, Rusty.> } > EXPORT_SYMBOL_GPL(hwrng_unregister); > > diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h > index c212e71..7832e50 100644 > --- a/include/linux/hw_random.h > +++ b/include/linux/hw_random.h > @@ -46,6 +46,7 @@ struct hwrng { > /* internal. */ > struct list_head list; > struct kref ref; > + bool cleanup_done; > }; > > /** Register a new Hardware Random Number Generator driver. */ > -- > 1.9.3
Possibly Parallel Threads
- [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.