thr3ads.net - LARTC - [PATCH] (3/4) delay scheduler race with device stopped [Jun 2004]

If this information is useful, please help other people find it:
Share via:

Stephen Hemminger

2004-Jun-17 22:56 UTC

[PATCH] (3/4) delay scheduler race with device stopped

The delay scheduler dequeue routine has some code cut&pasted from the TBF
scheduler
that caused a race with E1000 when ring got full.

It looks like net schedulers should never be calling netif_queue_stopped because
the queue may get unstopped by interrrupt or receive soft irq (NAPI) which races
with the dequeue in the transmit scheduler. 

Also, if requeuing the packet fails, it is probably because the queue became
full
by a racing enqueue.  So the right thing to do is to go back and try again.

Same patch should apply to both 2.6 and 2.4

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>

diff -Nru a/net/sched/sch_delay.c b/net/sched/sch_delay.c
--- a/net/sched/sch_delay.c	2004-06-17 15:21:49 -07:00
+++ b/net/sched/sch_delay.c	2004-06-17 15:21:49 -07:00
@@ -111,7 +111,7 @@
 	if (skb) {
 		struct dly_skb_cb *cb = (struct dly_skb_cb *)skb->cb;
 		psched_time_t now;
-		long diff;
+		long diff, delay;
 
 		PSCHED_GET_TIME(now);
 		diff = q->latency - PSCHED_TDIFF(now, cb->queuetime);
@@ -128,13 +128,10 @@
 			goto retry;
 		}
 
-		if (!netif_queue_stopped(sch->dev)) {
-			long delay = PSCHED_US2JIFFIE(diff);
-			if (delay <= 0)
-				delay = 1;
-			mod_timer(&q->timer, jiffies+delay);
-		}
-
+		delay = PSCHED_US2JIFFIE(diff);
+		if (delay <= 0) 
+			delay = 1;
+		mod_timer(&q->timer, jiffies+delay);
 		sch->flags |= TCQ_F_THROTTLED;
 	}
 	return NULL;

Stephen Hemminger

2004-Jun-17 23:09 UTC

head link

[RFT] netif_queue_stopped in TBF scheduler

There is a race between the device and scheduler if the scheduler looks
at netif_queue_stopped.  What can happen is that the device decides it is ready,
just after the stopped check, and the scheduler decides it is throttled.

The simple way is to just have the scheduler always dequeue and leave the flow
control up to the driver and the core scheduling loop.

Here is an untested fix for TBF scheduler based on the tested changes to the
delay scheduler. 


diff -Nru a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
--- a/net/sched/sch_tbf.c	2004-06-17 15:56:47 -07:00
+++ b/net/sched/sch_tbf.c	2004-06-17 15:56:47 -07:00
@@ -201,7 +201,7 @@
 
 	if (skb) {
 		psched_time_t now;
-		long toks;
+		long toks, delay;
 		long ptoks = 0;
 		unsigned int len = skb->len;
 
@@ -229,14 +229,11 @@
 			return skb;
 		}
 
-		if (!netif_queue_stopped(sch->dev)) {
-			long delay = PSCHED_US2JIFFIE(max_t(long, -toks, -ptoks));
+		delay = PSCHED_US2JIFFIE(max_t(long, -toks, -ptoks));
+		if (delay == 0)
+			delay = 1;
 
-			if (delay == 0)
-				delay = 1;
-
-			mod_timer(&q->wd_timer, jiffies+delay);
-		}
+		mod_timer(&q->wd_timer, jiffies+delay);
 
 		/* Maybe we have a shorter packet in the queue,
 		   which can be sent now. It sounds cool,

David S. Miller

2004-Jun-18 20:51 UTC

head link

Re: [RFT] netif_queue_stopped in TBF scheduler

On Thu, 17 Jun 2004 16:09:30 -0700
Stephen Hemminger <shemminger@osdl.org> wrote:
> There is a race between the device and scheduler if the scheduler looks
> at netif_queue_stopped.  What can happen is that the device decides it is
ready,
> just after the stopped check, and the scheduler decides it is throttled.
> 
> The simple way is to just have the scheduler always dequeue and leave the
flow
> control up to the driver and the core scheduling loop.
CBQ, HFSC, etc. etc. have this same logic too.

I agree, the flow control should be handled at the top level by
qdisc_restart(), and it does by invoking ->requeue() if the xmit
fails at the device level or it cannot obtain the necessary locks.

I''m going to make this fix across the board.

Possibly Parallel Threads

Search for more reasonably related threads

LARTC - Jun 2004 - [PATCH] (3/4) delay scheduler race with device stopped

[PATCH] (3/4) delay scheduler race with device stopped

[RFT] netif_queue_stopped in TBF scheduler

Re: [RFT] netif_queue_stopped in TBF scheduler

Possibly Parallel Threads