Rusty Russell
2014-Sep-18  02:43 UTC
[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
Amos Kong <akong at redhat.com> writes:> I started a QEMU (non-smp) guest with one virtio-rng device, and read > random data from /dev/hwrng by dd: > > # dd if=/dev/hwrng of=/dev/null & > > In the same time, if I check hwrng attributes from sysfs by cat: > > # cat /sys/class/misc/hw_random/rng_* > > The cat process always gets stuck with slow backend (5 k/s), if we > use a quick backend (1.2 M/s), the cat process will cost 1 to 2 > minutes. The stuck doesn't exist for smp guest. > > Reading syscall enters kernel and call rng_dev_read(), it's user > context. We used need_resched() to check if other tasks need to > be run, but it almost always return false, and re-hold the mutex > lock. The attributes accessing process always fails to hold the > lock, so the cat gets stuck. > > User context doesn't allow other user contexts run on that CPU, > unless the kernel code sleeps for some reason. This is why the > need_reshed() always return false here. > > This patch removed need_resched() and always schedule other tasks > then other tasks can have chance to hold the lock and execute > protected code.OK, this is going to be a rant. Your explanation doesn't make sense at all. Worse, your solution breaks the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite it.". But worst of all, this detailed explanation might have convinced me you understood the problem better than I did, and applied your patch. I did some tests. For me, as expected, the process spends its time inside the virtio rng read function, holding the mutex and thus blocking sysfs access; it's not a failure of this code at all. Your schedule_timeout() "fix" probably just helps by letting the host refresh entropy, so we spend less time waiting in the read fn. I will post a series, which unfortunately is only lightly tested, then I'm going to have some beer to begin my holiday. That may help me forget my disappointment at seeing respected fellow developers monkey-patching random code they don't understand. Grrr.... Rusty.> Signed-off-by: Amos Kong <akong at redhat.com> > --- > drivers/char/hw_random/core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index c591d7e..263a370 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > mutex_unlock(&rng_mutex); > > - if (need_resched()) > - schedule_timeout_interruptible(1); > + schedule_timeout_interruptible(1); > > if (signal_pending(current)) { > err = -ERESTARTSYS; > -- > 1.9.3
Rusty Russell
2014-Sep-18  02:48 UTC
[PATCH 1/5] hw_random: place mutex around read functions and buffers.
There's currently a big lock around everything, and it means that we
can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current)
while the rng is reading.  This is a real problem when the rng is slow,
or blocked (eg. virtio_rng with qemu's default /dev/random backend)
This doesn't help (it leaves the current lock untouched), just adds a
lock to protect the read function and the static buffers, in preparation
for transition.
Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25c8d49..b1b6042ad85c 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -53,7 +53,10 @@
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
 static LIST_HEAD(rng_list);
+/* Protects rng_list and current_rng */
 static DEFINE_MUTEX(rng_mutex);
+/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
+static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
 static unsigned short current_quality;
@@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng)
 	unsigned char bytes[16];
 	int bytes_read;
 
+	mutex_lock(&reading_mutex);
 	bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+	mutex_unlock(&reading_mutex);
 	if (bytes_read > 0)
 		add_device_randomness(bytes, bytes_read);
 }
@@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8
*buffer, size_t size,
 			int wait) {
 	int present;
 
+	BUG_ON(!mutex_is_locked(&reading_mutex));
 	if (rng->read)
 		return rng->read(rng, (void *)buffer, size, wait);
 
@@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 			goto out_unlock;
 		}
 
+		mutex_lock(&reading_mutex);
 		if (!data_avail) {
 			bytes_read = rng_get_data(current_rng, rng_buffer,
 				rng_buffer_size(),
 				!(filp->f_flags & O_NONBLOCK));
 			if (bytes_read < 0) {
 				err = bytes_read;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 			data_avail = bytes_read;
 		}
@@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 		if (!data_avail) {
 			if (filp->f_flags & O_NONBLOCK) {
 				err = -EAGAIN;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 		} else {
 			len = data_avail;
@@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 			if (copy_to_user(buf + ret, rng_buffer + data_avail,
 								len)) {
 				err = -EFAULT;
-				goto out_unlock;
+				goto out_unlock_reading;
 			}
 
 			size -= len;
@@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 		}
 
 		mutex_unlock(&rng_mutex);
+		mutex_unlock(&reading_mutex);
 
 		if (need_resched())
 			schedule_timeout_interruptible(1);
@@ -208,6 +216,9 @@ out:
 out_unlock:
 	mutex_unlock(&rng_mutex);
 	goto out;
+out_unlock_reading:
+	mutex_unlock(&reading_mutex);
+	goto out_unlock;
 }
 
 
@@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused)
 	while (!kthread_should_stop()) {
 		if (!current_rng)
 			break;
+		mutex_lock(&reading_mutex);
 		rc = rng_get_data(current_rng, rng_fillbuf,
 				  rng_buffer_size(), 1);
+		mutex_unlock(&reading_mutex);
 		if (rc <= 0) {
 			pr_warn("hwrng: no data available\n");
 			msleep_interruptible(10000);
 			continue;
 		}
+		/* Outside lock, sure, but y'know: randomness. */
 		add_hwgenerator_randomness((void *)rng_fillbuf, rc,
 					   rc * current_quality * 8 >> 10);
 	}
-- 
1.9.1
Rusty Russell
2014-Sep-18  02:48 UTC
[PATCH 2/5] hw_random: use reference counts on each struct hwrng.
current_rng holds one reference, and we bump it every time we want
to do a read from it.
This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.
Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.
This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.
Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++--------------
 include/linux/hw_random.h     |   2 +
 2 files changed, 94 insertions(+), 43 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b1b6042ad85c..dc9092a1075d 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/random.h>
+#include <linux/err.h>
 #include <asm/uaccess.h>
 
 
@@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
 		add_device_randomness(bytes, bytes_read);
 }
 
+static inline void cleanup_rng(struct kref *kref)
+{
+	struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+	if (rng->cleanup)
+		rng->cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+	BUG_ON(!mutex_is_locked(&rng_mutex));
+	kref_get(&rng->ref);
+	current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+	BUG_ON(!mutex_is_locked(&rng_mutex));
+	if (!current_rng)
+		return;
+
+	kref_put(¤t_rng->ref, cleanup_rng);
+	current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+	struct hwrng *rng;
+
+	if (mutex_lock_interruptible(&rng_mutex))
+		return ERR_PTR(-ERESTARTSYS);
+
+	rng = current_rng;
+	if (rng)
+		kref_get(&rng->ref);
+
+	mutex_unlock(&rng_mutex);
+	return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+	/*
+	 * Hold rng_mutex here so we serialize in case they set_current_rng
+	 * on rng again immediately.
+	 */
+	mutex_lock(&rng_mutex);
+	if (rng)
+		kref_put(&rng->ref, cleanup_rng);
+	mutex_unlock(&rng_mutex);
+}
+
 static inline int hwrng_init(struct hwrng *rng)
 {
 	if (rng->init) {
@@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng)
 	return 0;
 }
 
-static inline void hwrng_cleanup(struct hwrng *rng)
-{
-	if (rng && rng->cleanup)
-		rng->cleanup(rng);
-}
-
 static int rng_dev_open(struct inode *inode, struct file *filp)
 {
 	/* enforce read-only access to this chrdev */
@@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 	ssize_t ret = 0;
 	int err = 0;
 	int bytes_read, len;
+	struct hwrng *rng;
 
 	while (size) {
-		if (mutex_lock_interruptible(&rng_mutex)) {
-			err = -ERESTARTSYS;
+		rng = get_current_rng();
+		if (IS_ERR(rng)) {
+			err = PTR_ERR(rng);
 			goto out;
 		}
-
-		if (!current_rng) {
+		if (!rng) {
 			err = -ENODEV;
-			goto out_unlock;
+			goto out;
 		}
 
 		mutex_lock(&reading_mutex);
 		if (!data_avail) {
-			bytes_read = rng_get_data(current_rng, rng_buffer,
+			bytes_read = rng_get_data(rng, rng_buffer,
 				rng_buffer_size(),
 				!(filp->f_flags & O_NONBLOCK));
 			if (bytes_read < 0) {
@@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 			ret += len;
 		}
 
-		mutex_unlock(&rng_mutex);
 		mutex_unlock(&reading_mutex);
 
 		if (need_resched())
@@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 			err = -ERESTARTSYS;
 			goto out;
 		}
+
+		put_rng(rng);
 	}
 out:
 	return ret ? : err;
-out_unlock:
-	mutex_unlock(&rng_mutex);
-	goto out;
+
 out_unlock_reading:
 	mutex_unlock(&reading_mutex);
-	goto out_unlock;
+	put_rng(rng);
+	goto out;
 }
 
 
@@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
 			err = hwrng_init(rng);
 			if (err)
 				break;
-			hwrng_cleanup(current_rng);
-			current_rng = rng;
+			drop_current_rng();
+			set_current_rng(rng);
 			err = 0;
 			break;
 		}
@@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
 				       struct device_attribute *attr,
 				       char *buf)
 {
-	int err;
 	ssize_t ret;
-	const char *name = "none";
+	struct hwrng *rng;
 
-	err = mutex_lock_interruptible(&rng_mutex);
-	if (err)
-		return -ERESTARTSYS;
-	if (current_rng)
-		name = current_rng->name;
-	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
-	mutex_unlock(&rng_mutex);
+	rng = get_current_rng();
+	if (IS_ERR(rng))
+		return PTR_ERR(rng);
+
+	ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name :
"none");
+	put_rng(rng);
 
 	return ret;
 }
@@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused)
 	long rc;
 
 	while (!kthread_should_stop()) {
-		if (!current_rng)
+		struct hwrng *rng;
+
+		rng = get_current_rng();
+		if (IS_ERR(rng) || !rng)
 			break;
 		mutex_lock(&reading_mutex);
-		rc = rng_get_data(current_rng, rng_fillbuf,
+		rc = rng_get_data(rng, rng_fillbuf,
 				  rng_buffer_size(), 1);
 		mutex_unlock(&reading_mutex);
+		put_rng(rng);
 		if (rc <= 0) {
 			pr_warn("hwrng: no data available\n");
 			msleep_interruptible(10000);
@@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng)
 		err = hwrng_init(rng);
 		if (err)
 			goto out_unlock;
-		current_rng = rng;
+		set_current_rng(rng);
 	}
 	err = 0;
 	if (!old_rng) {
 		err = register_miscdev();
 		if (err) {
-			hwrng_cleanup(rng);
-			current_rng = NULL;
+			drop_current_rng();
 			goto out_unlock;
 		}
 	}
@@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register);
 
 void hwrng_unregister(struct hwrng *rng)
 {
-	int err;
-
 	mutex_lock(&rng_mutex);
 
 	list_del(&rng->list);
 	if (current_rng == rng) {
-		hwrng_cleanup(rng);
-		if (list_empty(&rng_list)) {
-			current_rng = NULL;
-		} else {
-			current_rng = list_entry(rng_list.prev, struct hwrng, list);
-			err = hwrng_init(current_rng);
-			if (err)
-				current_rng = NULL;
+		drop_current_rng();
+		if (!list_empty(&rng_list)) {
+			struct hwrng *tail;
+
+			tail = list_entry(rng_list.prev, struct hwrng, list);
+
+			if (hwrng_init(tail) == 0)
+				set_current_rng(tail);
 		}
 	}
+
 	if (list_empty(&rng_list)) {
 		unregister_miscdev();
 		if (hwrng_fill)
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index 914bb08cd738..c212e71ea886 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -14,6 +14,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
+#include <linux/kref.h>
 
 /**
  * struct hwrng - Hardware Random Number Generator driver
@@ -44,6 +45,7 @@ struct hwrng {
 
 	/* internal. */
 	struct list_head list;
+	struct kref ref;
 };
 
 /** Register a new Hardware Random Number Generator driver. */
-- 
1.9.1
The previous patch added one potential problem: we can still be reading from a hwrng when it's unregistered. Add a wait for zero in the hwrng_unregister path. Signed-off-by: Rusty Russell <rusty at rustcorp.com.au> --- drivers/char/hw_random/core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index dc9092a1075d..b4a21e9521cf 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex); static DEFINE_MUTEX(reading_mutex); static int data_avail; static u8 *rng_buffer, *rng_fillbuf; +static DECLARE_WAIT_QUEUE_HEAD(rng_done); static unsigned short current_quality; static unsigned short default_quality; /* = 0; default to "off" */ @@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref) if (rng->cleanup) rng->cleanup(rng); + wake_up_all(&rng_done); } static void set_current_rng(struct hwrng *rng) @@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng) } mutex_unlock(&rng_mutex); + + /* Just in case rng is reading right now, wait. */ + wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0); } EXPORT_SYMBOL_GPL(hwrng_unregister); -- 1.9.1
Interesting anti-pattern.
Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b4a21e9521cf..6a34feca6b43 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -472,14 +472,13 @@ int hwrng_register(struct hwrng *rng)
 	}
 
 	old_rng = current_rng;
+	err = 0;
 	if (!old_rng) {
 		err = hwrng_init(rng);
 		if (err)
 			goto out_unlock;
 		set_current_rng(rng);
-	}
-	err = 0;
-	if (!old_rng) {
+
 		err = register_miscdev();
 		if (err) {
 			drop_current_rng();
-- 
1.9.1
Rusty Russell
2014-Sep-18  02:48 UTC
[PATCH 5/5] hw_random: don't init list element we're about to add to list.
Another interesting anti-pattern.
Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 6a34feca6b43..96fa06716e95 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng)
 			goto out_unlock;
 		}
 	}
-	INIT_LIST_HEAD(&rng->list);
 	list_add_tail(&rng->list, &rng_list);
 
 	if (old_rng && !rng->init) {
-- 
1.9.1
Amos Kong
2014-Sep-18  12:47 UTC
[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes
On Thu, Sep 18, 2014 at 12:13:08PM +0930, Rusty Russell wrote:> Amos Kong <akong at redhat.com> writes: > > > I started a QEMU (non-smp) guest with one virtio-rng device, and read > > random data from /dev/hwrng by dd: > > > > # dd if=/dev/hwrng of=/dev/null & > > > > In the same time, if I check hwrng attributes from sysfs by cat: > > > > # cat /sys/class/misc/hw_random/rng_* > > > > The cat process always gets stuck with slow backend (5 k/s), if we > > use a quick backend (1.2 M/s), the cat process will cost 1 to 2 > > minutes. The stuck doesn't exist for smp guest. > > > > Reading syscall enters kernel and call rng_dev_read(), it's user > > context. We used need_resched() to check if other tasks need to > > be run, but it almost always return false, and re-hold the mutex > > lock. The attributes accessing process always fails to hold the > > lock, so the cat gets stuck. > > > > User context doesn't allow other user contexts run on that CPU, > > unless the kernel code sleeps for some reason. This is why the > > need_reshed() always return false here. > > > > This patch removed need_resched() and always schedule other tasks > > then other tasks can have chance to hold the lock and execute > > protected code.Hi Rusty,> OK, this is going to be a rant. > > Your explanation doesn't make sense at all. Worse, your solution breaks > the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite > it.". > > But worst of all, this detailed explanation might have convinced me you > understood the problem better than I did, and applied your patch.I'm sorry about the misleading.> I did some tests. For me, as expected, the process spends its time > inside the virtio rng read function, holding the mutex and thus blocking > sysfs access; it's not a failure of this code at all.Got it now. The catting hang bug was found when I try to fix unhotplug issue, the unhotplug issue can't be reproduced if I try to debug by gdb or printk. So I forgot to debug cat hang ... but spend time to misunderstand schedle code :(> Your schedule_timeout() "fix" probably just helps by letting the host > refresh entropy, so we spend less time waiting in the read fn. > > I will post a series, which unfortunately is only lightly tested, then > I'm going to have some beer to begin my holiday. That may help me > forget my disappointment at seeing respected fellow developers > monkey-patching random code they don't understand.I just posted a V2 with two additional fixes, hotunplugging works well now :)> Grrr....Enjoy your holiday! Amos> Rusty. > > > Signed-off-by: Amos Kong <akong at redhat.com> > > --- > > drivers/char/hw_random/core.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index c591d7e..263a370 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > mutex_unlock(&rng_mutex); > > > > - if (need_resched()) > > - schedule_timeout_interruptible(1); > > + schedule_timeout_interruptible(1); > > > > if (signal_pending(current)) { > > err = -ERESTARTSYS; > > -- > > 1.9.3-- Amos.
Possibly Parallel Threads
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.
- [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.