Amos Kong
2014-Sep-14 01:12 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
On Sun, Sep 14, 2014 at 01:12:58AM +0800, Amos Kong wrote:> On Thu, Sep 11, 2014 at 09:08:03PM +0930, Rusty Russell wrote: > > Amos Kong <akong at redhat.com> writes: > > > When I check hwrng attributes in sysfs, cat process always gets > > > stuck if guest has only 1 vcpu and uses a slow rng backend. > > > > > > Currently we check if there is any tasks waiting to be run on > > > current cpu in rng_dev_read() by need_resched(). But need_resched() > > > doesn't work because rng_dev_read() is executing in user context. > > > > I don't understand this explanation? I'd expect the sysfs process to be > > woken by the mutex_unlock(). > > But actually sysfs process's not woken always, this is they the > process gets stuck.%s/they/why/ Hi Rusty, Reference: http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-2.html read() syscall of /dev/hwrng will enter into kernel, the read operation is rng_dev_read(), it's userspace context (not interrupt context). Userspace context doesn't allow other user contexts run on that CPU, unless the kernel code sleeps for some reason. In this case, the need_resched() doesn't work. My solution is removing need_resched() and use an appropriate delay by schedule_timeout_interruptible(10). Thanks, Amos> > If we're really high priority (vs. the sysfs process) then I can see why > > we'd need schedule_timeout_interruptible() instead of just schedule(), > > and in that case, need_resched() would be false too. > > > > You could argue that's intended behaviour, but I can't see how it > > happens in the normal case anyway. > > > > What am I missing?> > Thanks, > > Rusty. > > > > > This patch removed need_resched() and increase delay to 10 jiffies, > > > then other tasks can have chance to execute protected code. > > > Delaying 1 jiffy also works, but 10 jiffies is safer. > > > > > > Signed-off-by: Amos Kong <akong at redhat.com> > > > --- > > > drivers/char/hw_random/core.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > > index c591d7e..b5d1b6f 100644 > > > --- a/drivers/char/hw_random/core.c > > > +++ b/drivers/char/hw_random/core.c > > > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > > > mutex_unlock(&rng_mutex); > > > > > > - if (need_resched()) > > > - schedule_timeout_interruptible(1); > > > + schedule_timeout_interruptible(10); > > > > > > if (signal_pending(current)) { > > > err = -ERESTARTSYS; > > > --
Amos Kong
2014-Sep-14 02:25 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
On Sun, Sep 14, 2014 at 09:12:08AM +0800, Amos Kong wrote: ...> > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > > > index c591d7e..b5d1b6f 100644 > > > > --- a/drivers/char/hw_random/core.c > > > > +++ b/drivers/char/hw_random/core.c > > > > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > > > > > mutex_unlock(&rng_mutex); > > > > > > > > - if (need_resched()) > > > > - schedule_timeout_interruptible(1); > > > > + schedule_timeout_interruptible(10);Problem only occurred in non-smp guest, we can improve it to: if(!is_smp()) schedule_timeout_interruptible(10); is_smp() is only available for arm arch, we need a general one.> > > > > > > > if (signal_pending(current)) { > > > > err = -ERESTARTSYS; > > > > ---- Amos.
Radim Krčmář
2014-Sep-15 16:48 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
2014-09-14 10:25+0800, Amos Kong:> On Sun, Sep 14, 2014 at 09:12:08AM +0800, Amos Kong wrote: > > ... > > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > > > > index c591d7e..b5d1b6f 100644 > > > > > --- a/drivers/char/hw_random/core.c > > > > > +++ b/drivers/char/hw_random/core.c > > > > > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, > > > > > > > > > > mutex_unlock(&rng_mutex); > > > > > > > > > > - if (need_resched()) > > > > > - schedule_timeout_interruptible(1); > > > > > + schedule_timeout_interruptible(10);If cond_resched() does not work, it is a bug elsewehere.> Problem only occurred in non-smp guest, we can improve it to: > > if(!is_smp()) > schedule_timeout_interruptible(10); > > is_smp() is only available for arm arch, we need a general one.(It is num_online_cpus() > 1.)
Rusty Russell
2014-Sep-16 15:35 UTC
[PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
Amos Kong <akong at redhat.com> writes:> On Sun, Sep 14, 2014 at 01:12:58AM +0800, Amos Kong wrote: >> On Thu, Sep 11, 2014 at 09:08:03PM +0930, Rusty Russell wrote: >> > Amos Kong <akong at redhat.com> writes: >> > > When I check hwrng attributes in sysfs, cat process always gets >> > > stuck if guest has only 1 vcpu and uses a slow rng backend. >> > > >> > > Currently we check if there is any tasks waiting to be run on >> > > current cpu in rng_dev_read() by need_resched(). But need_resched() >> > > doesn't work because rng_dev_read() is executing in user context. >> > >> > I don't understand this explanation? I'd expect the sysfs process to be >> > woken by the mutex_unlock(). >> >> But actually sysfs process's not woken always, this is they the >> process gets stuck. > > %s/they/why/ > > Hi Rusty, > > > Reference: > http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-2.htmlSure, that was true when I wrote it, and is still true when preempt is off.> read() syscall of /dev/hwrng will enter into kernel, the read operation is > rng_dev_read(), it's userspace context (not interrupt context). > > Userspace context doesn't allow other user contexts run on that CPU, > unless the kernel code sleeps for some reason.This is true assuming preempt is off, yes.> In this case, the need_resched() doesn't work.This is exactly what need_resched() is for: it should return true if there's another process of sufficient priority waiting to be run. It implies that schedule() would run it. git blame doesn't offer any enlightenment here, as to why we use schedule_timeout_interruptible() at all. I would expect mutex_unlock() to wake the other reader. The code certainly seems to, so it should now be runnable and need_resched() should return true. I suspect something else is happening which makes this "work". Cheers, Rusty.
Possibly Parallel Threads
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes
- [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes