thr3ads.net - Linux Virtualization - [PATCH v2 0/3] fix stuck in accessing hwrng attributes [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Amos Kong

2014-Sep-15 16:02 UTC

[PATCH v2 0/3] fix stuck in accessing hwrng attributes

If we read hwrng by long-running dd process, it takes too much cpu
time and almost hold the mutex lock. When we check hwrng attributes
from sysfs by cat, it gets stuck in waiting the lock releaseing.
The problem can only be reproduced with non-smp guest with slow backend.

This patchset resolves the issue by changing rng_dev_read() to always
schedule 10 jiffies after release mutex lock, then cat process can
have chance to get the lock and execute protected code without stuck.

Thanks.

V2: update commitlog to describe PATCH 2, split second patch.

Amos Kong (3):
  virtio-rng cleanup: move some code out of mutex protection
  hw_random: fix stuck in catting hwrng attributes
  hw_random: increase schedule timeout in rng_dev_read()

 drivers/char/hw_random/core.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
1.9.3

Amos Kong

2014-Sep-15 16:02 UTC

head link

[PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection

It doesn't save too much cpu time as expected, just a cleanup.

Signed-off-by: Amos Kong <akong at redhat.com>
---
 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
 		return -ERESTARTSYS;
 	if (current_rng)
 		name = current_rng->name;
-	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 	mutex_unlock(&rng_mutex);
+	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
 
 	return ret;
 }
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device
*dev,
 	ssize_t ret = 0;
 	struct hwrng *rng;
 
+	buf[0] = '\0';
 	err = mutex_lock_interruptible(&rng_mutex);
 	if (err)
 		return -ERESTARTSYS;
-	buf[0] = '\0';
 	list_for_each_entry(rng, &rng_list, list) {
 		strncat(buf, rng->name, PAGE_SIZE - ret - 1);
 		ret += strlen(rng->name);
 		strncat(buf, " ", PAGE_SIZE - ret - 1);
 		ret++;
 	}
+	mutex_unlock(&rng_mutex);
 	strncat(buf, "\n", PAGE_SIZE - ret - 1);
 	ret++;
-	mutex_unlock(&rng_mutex);
 
 	return ret;
 }
-- 
1.9.3

Amos Kong

2014-Sep-15 16:02 UTC

head link

[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

I started a QEMU (non-smp) guest with one virtio-rng device, and read
random data from /dev/hwrng by dd:

 # dd if=/dev/hwrng of=/dev/null &

In the same time, if I check hwrng attributes from sysfs by cat:

 # cat /sys/class/misc/hw_random/rng_*

The cat process always gets stuck with slow backend (5 k/s), if we
use a quick backend (1.2 M/s), the cat process will cost 1 to 2
minutes. The stuck doesn't exist for smp guest.

Reading syscall enters kernel and call rng_dev_read(), it's user
context. We used need_resched() to check if other tasks need to
be run, but it almost always return false, and re-hold the mutex
lock. The attributes accessing process always fails to hold the
lock, so the cat gets stuck.

User context doesn't allow other user contexts run on that CPU,
unless the kernel code sleeps for some reason. This is why the
need_reshed() always return false here.

This patch removed need_resched() and always schedule other tasks
then other tasks can have chance to hold the lock and execute
protected code.

Signed-off-by: Amos Kong <akong at redhat.com>
---
 drivers/char/hw_random/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 
 		mutex_unlock(&rng_mutex);
 
-		if (need_resched())
-			schedule_timeout_interruptible(1);
+		schedule_timeout_interruptible(1);
 
 		if (signal_pending(current)) {
 			err = -ERESTARTSYS;
-- 
1.9.3

Amos Kong

2014-Sep-15 16:02 UTC

head link

[PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()

This patch increases the schedule timeout to 10 jiffies, it's more
appropriate, then other takes can easy to hold the mutex lock.

Signed-off-by: Amos Kong <akong at redhat.com>
---
 drivers/char/hw_random/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 263a370..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user
*buf,
 
 		mutex_unlock(&rng_mutex);
 
-		schedule_timeout_interruptible(1);
+		schedule_timeout_interruptible(10);
 
 		if (signal_pending(current)) {
 			err = -ERESTARTSYS;
-- 
1.9.3

Michael Büsch

2014-Sep-15 16:13 UTC

head link

[PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection

On Tue, 16 Sep 2014 00:02:27 +0800
Amos Kong <akong at redhat.com> wrote:
> It doesn't save too much cpu time as expected, just a cleanup.
> 
> Signed-off-by: Amos Kong <akong at redhat.com>
> ---
>  drivers/char/hw_random/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index aa30a25..c591d7e 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device
*dev,
>  		return -ERESTARTSYS;
>  	if (current_rng)
>  		name = current_rng->name;
> -	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
>  	mutex_unlock(&rng_mutex);
> +	ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
I'm not sure this is safe.
Name is just a pointer.
What if the hwrng gets unregistered after unlock and just before the snprintf?
>  	return ret;
>  }
> @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct
device *dev,
>  	ssize_t ret = 0;
>  	struct hwrng *rng;
>  
> +	buf[0] = '\0';
>  	err = mutex_lock_interruptible(&rng_mutex);
>  	if (err)
>  		return -ERESTARTSYS;
> -	buf[0] = '\0';
>  	list_for_each_entry(rng, &rng_list, list) {
>  		strncat(buf, rng->name, PAGE_SIZE - ret - 1);
>  		ret += strlen(rng->name);
>  		strncat(buf, " ", PAGE_SIZE - ret - 1);
>  		ret++;
>  	}
> +	mutex_unlock(&rng_mutex);
>  	strncat(buf, "\n", PAGE_SIZE - ret - 1);
>  	ret++;
> -	mutex_unlock(&rng_mutex);
>  
>  	return ret;
>  }
This looks ok.

-- 
Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20140915/03e897f8/attachment.sig>

Michael Büsch

2014-Sep-15 16:13 UTC

head link

[PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()

On Tue, 16 Sep 2014 00:02:29 +0800
Amos Kong <akong at redhat.com> wrote:
> This patch increases the schedule timeout to 10 jiffies, it's more
> appropriate, then other takes can easy to hold the mutex lock.
> 
> Signed-off-by: Amos Kong <akong at redhat.com>
> ---
>  drivers/char/hw_random/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 263a370..b5d1b6f 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char
__user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(10);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;
Does a schedule of 1 ms or 10 ms decrease the throughput?
I think we need some benchmarks.

-- 
Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20140915/bdcf57a0/attachment-0001.sig>

Herbert Xu

2014-Sep-17 09:30 UTC

head link

[PATCH v2 0/3] fix stuck in accessing hwrng attributes

On Tue, Sep 16, 2014 at 12:02:26AM +0800, Amos Kong
wrote:> If we read hwrng by long-running dd process, it takes too much cpu
> time and almost hold the mutex lock. When we check hwrng attributes
> from sysfs by cat, it gets stuck in waiting the lock releaseing.
> The problem can only be reproduced with non-smp guest with slow backend.
> 
> This patchset resolves the issue by changing rng_dev_read() to always
> schedule 10 jiffies after release mutex lock, then cat process can
> have chance to get the lock and execute protected code without stuck.
Sorry I'm not going to accept your fix which simply papers over
the problem.

Please bite the bullet and convert this over to RCU.

Cheers,
-- 
Email: Herbert Xu <herbert at gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Rusty Russell

2014-Sep-18 02:43 UTC

head link

[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

Amos Kong <akong at redhat.com> writes:
> I started a QEMU (non-smp) guest with one virtio-rng device, and read
> random data from /dev/hwrng by dd:
>
>  # dd if=/dev/hwrng of=/dev/null &
>
> In the same time, if I check hwrng attributes from sysfs by cat:
>
>  # cat /sys/class/misc/hw_random/rng_*
>
> The cat process always gets stuck with slow backend (5 k/s), if we
> use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> minutes. The stuck doesn't exist for smp guest.
>
> Reading syscall enters kernel and call rng_dev_read(), it's user
> context. We used need_resched() to check if other tasks need to
> be run, but it almost always return false, and re-hold the mutex
> lock. The attributes accessing process always fails to hold the
> lock, so the cat gets stuck.
>
> User context doesn't allow other user contexts run on that CPU,
> unless the kernel code sleeps for some reason. This is why the
> need_reshed() always return false here.
>
> This patch removed need_resched() and always schedule other tasks
> then other tasks can have chance to hold the lock and execute
> protected code.
OK, this is going to be a rant.

Your explanation doesn't make sense at all.  Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests.  For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday.  That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr....
Rusty.
> Signed-off-by: Amos Kong <akong at redhat.com>
> ---
>  drivers/char/hw_random/core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index c591d7e..263a370 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char
__user *buf,
>  
>  		mutex_unlock(&rng_mutex);
>  
> -		if (need_resched())
> -			schedule_timeout_interruptible(1);
> +		schedule_timeout_interruptible(1);
>  
>  		if (signal_pending(current)) {
>  			err = -ERESTARTSYS;
> -- 
> 1.9.3

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Sep 2014 - [PATCH v2 0/3] fix stuck in accessing hwrng attributes

[PATCH v2 0/3] fix stuck in accessing hwrng attributes

[PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection

[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

[PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()

[PATCH v2 1/3] virtio-rng cleanup: move some code out of mutex protection

[PATCH v2 3/3] hw_random: increase schedule timeout in rng_dev_read()

[PATCH v2 0/3] fix stuck in accessing hwrng attributes

[PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

Maybe Matching Threads