Stephen Hemminger
2004-Jun-17 22:56 UTC
[PATCH] (3/4) delay scheduler race with device stopped
The delay scheduler dequeue routine has some code cut&pasted from the TBF scheduler that caused a race with E1000 when ring got full. It looks like net schedulers should never be calling netif_queue_stopped because the queue may get unstopped by interrrupt or receive soft irq (NAPI) which races with the dequeue in the transmit scheduler. Also, if requeuing the packet fails, it is probably because the queue became full by a racing enqueue. So the right thing to do is to go back and try again. Same patch should apply to both 2.6 and 2.4 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> diff -Nru a/net/sched/sch_delay.c b/net/sched/sch_delay.c --- a/net/sched/sch_delay.c 2004-06-17 15:21:49 -07:00 +++ b/net/sched/sch_delay.c 2004-06-17 15:21:49 -07:00 @@ -111,7 +111,7 @@ if (skb) { struct dly_skb_cb *cb = (struct dly_skb_cb *)skb->cb; psched_time_t now; - long diff; + long diff, delay; PSCHED_GET_TIME(now); diff = q->latency - PSCHED_TDIFF(now, cb->queuetime); @@ -128,13 +128,10 @@ goto retry; } - if (!netif_queue_stopped(sch->dev)) { - long delay = PSCHED_US2JIFFIE(diff); - if (delay <= 0) - delay = 1; - mod_timer(&q->timer, jiffies+delay); - } - + delay = PSCHED_US2JIFFIE(diff); + if (delay <= 0) + delay = 1; + mod_timer(&q->timer, jiffies+delay); sch->flags |= TCQ_F_THROTTLED; } return NULL;
There is a race between the device and scheduler if the scheduler looks at netif_queue_stopped. What can happen is that the device decides it is ready, just after the stopped check, and the scheduler decides it is throttled. The simple way is to just have the scheduler always dequeue and leave the flow control up to the driver and the core scheduling loop. Here is an untested fix for TBF scheduler based on the tested changes to the delay scheduler. diff -Nru a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c --- a/net/sched/sch_tbf.c 2004-06-17 15:56:47 -07:00 +++ b/net/sched/sch_tbf.c 2004-06-17 15:56:47 -07:00 @@ -201,7 +201,7 @@ if (skb) { psched_time_t now; - long toks; + long toks, delay; long ptoks = 0; unsigned int len = skb->len; @@ -229,14 +229,11 @@ return skb; } - if (!netif_queue_stopped(sch->dev)) { - long delay = PSCHED_US2JIFFIE(max_t(long, -toks, -ptoks)); + delay = PSCHED_US2JIFFIE(max_t(long, -toks, -ptoks)); + if (delay == 0) + delay = 1; - if (delay == 0) - delay = 1; - - mod_timer(&q->wd_timer, jiffies+delay); - } + mod_timer(&q->wd_timer, jiffies+delay); /* Maybe we have a shorter packet in the queue, which can be sent now. It sounds cool,
On Thu, 17 Jun 2004 16:09:30 -0700 Stephen Hemminger <shemminger@osdl.org> wrote:> There is a race between the device and scheduler if the scheduler looks > at netif_queue_stopped. What can happen is that the device decides it is ready, > just after the stopped check, and the scheduler decides it is throttled. > > The simple way is to just have the scheduler always dequeue and leave the flow > control up to the driver and the core scheduling loop.CBQ, HFSC, etc. etc. have this same logic too. I agree, the flow control should be handled at the top level by qdisc_restart(), and it does by invoking ->requeue() if the xmit fails at the device level or it cannot obtain the necessary locks. I''m going to make this fix across the board.