If we read hwrng by long-running dd process, it takes too much cpu time. When we check hwrng attributes from sysfs by cat, it gets stuck. The problem can only be reproduced with non-smp guest with slow backend. This patchset changed hwrng core to always delay 10 jiffies, cat process have chance to execute protected code, the problem is resolved. Thanks. Amos Kong (2): virtio-rng cleanup: move some code out of mutex protection virtio-rng: fix stuck in catting hwrng attributes drivers/char/hw_random/core.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) -- 1.9.3
Amos Kong
2014-Sep-10 09:07 UTC
[PATCH 1/2] virtio-rng cleanup: move some code out of mutex protection
It doesn't save too much cpu time as expected, just a cleanup. Signed-off-by: Amos Kong <akong at redhat.com> --- drivers/char/hw_random/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..c591d7e 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, return -ERESTARTSYS; if (current_rng) name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); mutex_unlock(&rng_mutex); + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); return ret; } @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, ssize_t ret = 0; struct hwrng *rng; + buf[0] = '\0'; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - buf[0] = '\0'; list_for_each_entry(rng, &rng_list, list) { strncat(buf, rng->name, PAGE_SIZE - ret - 1); ret += strlen(rng->name); strncat(buf, " ", PAGE_SIZE - ret - 1); ret++; } + mutex_unlock(&rng_mutex); strncat(buf, "\n", PAGE_SIZE - ret - 1); ret++; - mutex_unlock(&rng_mutex); return ret; } -- 1.9.3
Amos Kong
2014-Sep-10 09:07 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
When I check hwrng attributes in sysfs, cat process always gets stuck if guest has only 1 vcpu and uses a slow rng backend. Currently we check if there is any tasks waiting to be run on current cpu in rng_dev_read() by need_resched(). But need_resched() doesn't work because rng_dev_read() is executing in user context. This patch removed need_resched() and increase delay to 10 jiffies, then other tasks can have chance to execute protected code. Delaying 1 jiffy also works, but 10 jiffies is safer. Signed-off-by: Amos Kong <akong at redhat.com> --- drivers/char/hw_random/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..b5d1b6f 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(&rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(10); if (signal_pending(current)) { err = -ERESTARTSYS; -- 1.9.3
Amit Shah
2014-Sep-11 06:07 UTC
[PATCH 1/2] virtio-rng cleanup: move some code out of mutex protection
On (Wed) 10 Sep 2014 [17:07:06], Amos Kong wrote:> It doesn't save too much cpu time as expected, just a cleanup.Frankly I won't bother with this. It doesn't completely remove all copying from the mutex, so it's not worthwhile.> Signed-off-by: Amos Kong <akong at redhat.com> > --- > drivers/char/hw_random/core.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index aa30a25..c591d7e 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, > return -ERESTARTSYS; > if (current_rng) > name = current_rng->name; > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > mutex_unlock(&rng_mutex); > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > > return ret; > } > @@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, > ssize_t ret = 0; > struct hwrng *rng; > > + buf[0] = '\0'; > err = mutex_lock_interruptible(&rng_mutex); > if (err) > return -ERESTARTSYS; > - buf[0] = '\0'; > list_for_each_entry(rng, &rng_list, list) { > strncat(buf, rng->name, PAGE_SIZE - ret - 1); > ret += strlen(rng->name); > strncat(buf, " ", PAGE_SIZE - ret - 1); > ret++; > } > + mutex_unlock(&rng_mutex); > strncat(buf, "\n", PAGE_SIZE - ret - 1); > ret++; > - mutex_unlock(&rng_mutex); > > return ret; > } > -- > 1.9.3 >Amit
Amit Shah
2014-Sep-11 06:08 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
On (Wed) 10 Sep 2014 [17:07:07], Amos Kong wrote:> When I check hwrng attributes in sysfs, cat process always gets > stuck if guest has only 1 vcpu and uses a slow rng backend. > > Currently we check if there is any tasks waiting to be run on > current cpu in rng_dev_read() by need_resched(). But need_resched() > doesn't work because rng_dev_read() is executing in user context. > > This patch removed need_resched() and increase delay to 10 jiffies, > then other tasks can have chance to execute protected code. > Delaying 1 jiffy also works, but 10 jiffies is safer.I'd prefer two patches for this one: one to remove the need_resched() check, and the other to increase the timeout. Anyway, Reviewed-by: Amit Shah <amit.shah at redhat.com>> > Signed-off-by: Amos Kong <akong at redhat.com> > --- > drivers/char/hw_random/core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index c591d7e..b5d1b6f 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > mutex_unlock(&rng_mutex); > > - if (need_resched()) > - schedule_timeout_interruptible(1); > + schedule_timeout_interruptible(10); > > if (signal_pending(current)) { > err = -ERESTARTSYS; > -- > 1.9.3 >Amit
Rusty Russell
2014-Sep-11 11:38 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
Amos Kong <akong at redhat.com> writes:> When I check hwrng attributes in sysfs, cat process always gets > stuck if guest has only 1 vcpu and uses a slow rng backend. > > Currently we check if there is any tasks waiting to be run on > current cpu in rng_dev_read() by need_resched(). But need_resched() > doesn't work because rng_dev_read() is executing in user context.I don't understand this explanation? I'd expect the sysfs process to be woken by the mutex_unlock(). If we're really high priority (vs. the sysfs process) then I can see why we'd need schedule_timeout_interruptible() instead of just schedule(), and in that case, need_resched() would be false too. You could argue that's intended behaviour, but I can't see how it happens in the normal case anyway. What am I missing? Thanks, Rusty.> This patch removed need_resched() and increase delay to 10 jiffies, > then other tasks can have chance to execute protected code. > Delaying 1 jiffy also works, but 10 jiffies is safer. > > Signed-off-by: Amos Kong <akong at redhat.com> > --- > drivers/char/hw_random/core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index c591d7e..b5d1b6f 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > mutex_unlock(&rng_mutex); > > - if (need_resched()) > - schedule_timeout_interruptible(1); > + schedule_timeout_interruptible(10); > > if (signal_pending(current)) { > err = -ERESTARTSYS; > -- > 1.9.3
Possibly Parallel Threads
- [PATCH 0/2] fix stuck in catting hwrng attributes
- [PATCH v2 0/3] fix stuck in accessing hwrng attributes
- [PATCH v2 0/3] fix stuck in accessing hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes