Yi, Shunli
2012-Nov-30 08:09 UTC
[patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
Netback driver use " time_after_eq()" to check the jiffies wrapping, while this function was only called when the credit is running out. So, if the jiffies wrapped and the credit isn''t run out in first half jiffies circle, the time_after_eq() cannot check the wrapping any more. This will cause the credit_timerout.expires is set to dozens of days in future. The netback will stop receiving data from netfront. For example: Jiffies initialized to 0xffffff-(300*HZ), and the credit_timeout.expires was initialized to 0xffffff00, After dozens of days, when the jiffies grow to upper than 0x80000000, and the time_after_eq() will cannot check for the wrapping. --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long rmb(); /* Ensure that we see the request before we copy it. */ memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); + /* Check for the jiffies wrapping */ + if (time_after_eq(jiffies, netif->credit_timeout.expires)) + netif->credit_timeout.expires = jiffies; + /* Credit-based scheduling. */ if (txreq.size > netif->remaining_credit) { unsigned long now = jiffies; Protected by Websense Hosted Email Security -- www.websense.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Nov-30 08:35 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
On Fri, 2012-11-30 at 08:09 +0000, Yi, Shunli wrote:> Netback driver use " time_after_eq()" to check the jiffies wrapping, > while this function was only called when the credit is running out. > So, if the jiffies wrapped and the credit isn''t run out in first half > jiffies circle, the time_after_eq() cannot check the wrapping any > more.Which tree is this against? It doesn''t appear to be mainline Linux, which is all I am really interested in these days. Also your patch is missing a Signed-off-by and is whitespace damaged. Please read Documentation/SubmittingPatches and Documentation/SubmitChecklist.> This will cause the credit_timerout.expires is set to dozens of days > in future. > > The netback will stop receiving data from netfront. > > For example: > Jiffies initialized to 0xffffff-(300*HZ), and the > credit_timeout.expires was initialized to 0xffffff00, > After dozens of days, when the jiffies grow to upper than 0x80000000, > and the time_after_eq() will cannot check for the wrapping. > > > > --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 > +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 > @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long > rmb(); /* Ensure that we see the request before we copy it. */ > memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); > > + /* Check for the jiffies wrapping */ > + if (time_after_eq(jiffies, netif->credit_timeout.expires)) > + netif->credit_timeout.expires = jiffies;Do you not need to remove the similar check from the following block? I''m also surprised that block doesn''t need further adjustment since it uses netif->credit_timeout.expires as its input and so this change will change its behaviour. Perhaps this new addition needs to be in an else block after if (txreq.size > netif->remaining_credit) ? Why is there no need to call tx_add_credit in this case? What about needing to check for a pending timer?> + > /* Credit-based scheduling. */ > if (txreq.size > netif->remaining_credit) { > unsigned long now = jiffies; > > > > > > Protected by Websense Hosted Email Security -- www.websense.com
Jan Beulich
2012-Nov-30 09:25 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
>>> On 30.11.12 at 09:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2012-11-30 at 08:09 +0000, Yi, Shunli wrote: >> Netback driver use " time_after_eq()" to check the jiffies wrapping, >> while this function was only called when the credit is running out. >> So, if the jiffies wrapped and the credit isn''t run out in first half >> jiffies circle, the time_after_eq() cannot check the wrapping any >> more. > > Which tree is this against? It doesn''t appear to be mainline Linux, > which is all I am really interested in these days. > > Also your patch is missing a Signed-off-by and is whitespace damaged. > Please read Documentation/SubmittingPatches and > Documentation/SubmitChecklist. > >> This will cause the credit_timerout.expires is set to dozens of days >> in future. >> >> The netback will stop receiving data from netfront. >> >> For example: >> Jiffies initialized to 0xffffff-(300*HZ), and the >> credit_timeout.expires was initialized to 0xffffff00, >> After dozens of days, when the jiffies grow to upper than 0x80000000, >> and the time_after_eq() will cannot check for the wrapping. >> >> >> >> --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 >> +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 >> @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long >> rmb(); /* Ensure that we see the request before we copy it. */ >> memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); >> >> + /* Check for the jiffies wrapping */ >> + if (time_after_eq(jiffies, netif->credit_timeout.expires)) >> + netif->credit_timeout.expires = jiffies; > > Do you not need to remove the similar check from the following block?I don''t think so, but I also can''t see how that adjustment would help in the first place: If it gets executed after a very long period of no traffic, it would itself not be able to reliably tell whether the clock wrapped. That said, I agree that the code as is appears to have a problem (with 32-bit jiffies at least), but I can''t see how to easily deal with it. Jan> I''m also surprised that block doesn''t need further adjustment since it > uses netif->credit_timeout.expires as its input and so this change will > change its behaviour. > > Perhaps this new addition needs to be in an else block after if > (txreq.size > netif->remaining_credit) ? > > Why is there no need to call tx_add_credit in this case? What about > needing to check for a pending timer? > >> + >> /* Credit-based scheduling. */ >> if (txreq.size > netif->remaining_credit) { >> unsigned long now = jiffies; >> >> >> >> >> >> Protected by Websense Hosted Email Security -- www.websense.com > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Nov-30 09:44 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
On Fri, 2012-11-30 at 09:25 +0000, Jan Beulich wrote:> >>> On 30.11.12 at 09:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Fri, 2012-11-30 at 08:09 +0000, Yi, Shunli wrote: > >> Netback driver use " time_after_eq()" to check the jiffies wrapping, > >> while this function was only called when the credit is running out. > >> So, if the jiffies wrapped and the credit isn''t run out in first half > >> jiffies circle, the time_after_eq() cannot check the wrapping any > >> more. > > > > Which tree is this against? It doesn''t appear to be mainline Linux, > > which is all I am really interested in these days. > > > > Also your patch is missing a Signed-off-by and is whitespace damaged. > > Please read Documentation/SubmittingPatches and > > Documentation/SubmitChecklist. > > > >> This will cause the credit_timerout.expires is set to dozens of days > >> in future. > >> > >> The netback will stop receiving data from netfront. > >> > >> For example: > >> Jiffies initialized to 0xffffff-(300*HZ), and the > >> credit_timeout.expires was initialized to 0xffffff00, > >> After dozens of days, when the jiffies grow to upper than 0x80000000, > >> and the time_after_eq() will cannot check for the wrapping. > >> > >> > >> > >> --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 > >> +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 > >> @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long > >> rmb(); /* Ensure that we see the request before we copy it. */ > >> memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); > >> > >> + /* Check for the jiffies wrapping */ > >> + if (time_after_eq(jiffies, netif->credit_timeout.expires)) > >> + netif->credit_timeout.expires = jiffies; > > > > Do you not need to remove the similar check from the following block? > > I don''t think so, but I also can''t see how that adjustment would > help in the first place: If it gets executed after a very long period > of no traffic, it would itself not be able to reliably tell whether the > clock wrapped.Hrm, yes, This change would help in the case of a dribble of traffic which never hits the limit, but not in the case of no timer at all.> That said, I agree that the code as is appears to have a problem > (with 32-bit jiffies at least), but I can''t see how to easily deal with > it.Would it help to always have the pending timer armed, for either the next tick if credit needs replenishing or for, say MAX_JIFFIES/4 as a backstop to avoid wrapping issues? Ian.
Jan Beulich
2012-Nov-30 10:02 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
>>> On 30.11.12 at 10:44, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Fri, 2012-11-30 at 09:25 +0000, Jan Beulich wrote: >> >>> On 30.11.12 at 09:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > On Fri, 2012-11-30 at 08:09 +0000, Yi, Shunli wrote: >> >> Netback driver use " time_after_eq()" to check the jiffies wrapping, >> >> while this function was only called when the credit is running out. >> >> So, if the jiffies wrapped and the credit isn''t run out in first half >> >> jiffies circle, the time_after_eq() cannot check the wrapping any >> >> more. >> > >> > Which tree is this against? It doesn''t appear to be mainline Linux, >> > which is all I am really interested in these days. >> > >> > Also your patch is missing a Signed-off-by and is whitespace damaged. >> > Please read Documentation/SubmittingPatches and >> > Documentation/SubmitChecklist. >> > >> >> This will cause the credit_timerout.expires is set to dozens of days >> >> in future. >> >> >> >> The netback will stop receiving data from netfront. >> >> >> >> For example: >> >> Jiffies initialized to 0xffffff-(300*HZ), and the >> >> credit_timeout.expires was initialized to 0xffffff00, >> >> After dozens of days, when the jiffies grow to upper than 0x80000000, >> >> and the time_after_eq() will cannot check for the wrapping. >> >> >> >> >> >> >> >> --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 >> >> +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 >> >> @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long >> >> rmb(); /* Ensure that we see the request before we copy it. */ >> >> memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); >> >> >> >> + /* Check for the jiffies wrapping */ >> >> + if (time_after_eq(jiffies, netif->credit_timeout.expires)) >> >> + netif->credit_timeout.expires = jiffies; >> > >> > Do you not need to remove the similar check from the following block? >> >> I don''t think so, but I also can''t see how that adjustment would >> help in the first place: If it gets executed after a very long period >> of no traffic, it would itself not be able to reliably tell whether the >> clock wrapped. > > Hrm, yes, This change would help in the case of a dribble of traffic > which never hits the limit, but not in the case of no timer at all. > >> That said, I agree that the code as is appears to have a problem >> (with 32-bit jiffies at least), but I can''t see how to easily deal with >> it. > > Would it help to always have the pending timer armed, for either the > next tick if credit needs replenishing or for, say MAX_JIFFIES/4 as a > backstop to avoid wrapping issues?If that can be made work cleanly, that would probably be the easiest solution. But I don''t see MAX_JIFFIES being defined anywhere, and I''m unsure ULONG_MAX/4 would be well received as a timeout on 64-bit systems. Jan
Ian Campbell
2012-Nov-30 10:15 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
On Fri, 2012-11-30 at 10:02 +0000, Jan Beulich wrote:> >>> On 30.11.12 at 10:44, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Fri, 2012-11-30 at 09:25 +0000, Jan Beulich wrote: > >> >>> On 30.11.12 at 09:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: > >> > On Fri, 2012-11-30 at 08:09 +0000, Yi, Shunli wrote: > >> >> Netback driver use " time_after_eq()" to check the jiffies wrapping, > >> >> while this function was only called when the credit is running out. > >> >> So, if the jiffies wrapped and the credit isn''t run out in first half > >> >> jiffies circle, the time_after_eq() cannot check the wrapping any > >> >> more. > >> > > >> > Which tree is this against? It doesn''t appear to be mainline Linux, > >> > which is all I am really interested in these days. > >> > > >> > Also your patch is missing a Signed-off-by and is whitespace damaged. > >> > Please read Documentation/SubmittingPatches and > >> > Documentation/SubmitChecklist. > >> > > >> >> This will cause the credit_timerout.expires is set to dozens of days > >> >> in future. > >> >> > >> >> The netback will stop receiving data from netfront. > >> >> > >> >> For example: > >> >> Jiffies initialized to 0xffffff-(300*HZ), and the > >> >> credit_timeout.expires was initialized to 0xffffff00, > >> >> After dozens of days, when the jiffies grow to upper than 0x80000000, > >> >> and the time_after_eq() will cannot check for the wrapping. > >> >> > >> >> > >> >> > >> >> --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 > >> >> +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 > >> >> @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long > >> >> rmb(); /* Ensure that we see the request before we copy it. */ > >> >> memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), sizeof(txreq)); > >> >> > >> >> + /* Check for the jiffies wrapping */ > >> >> + if (time_after_eq(jiffies, netif->credit_timeout.expires)) > >> >> + netif->credit_timeout.expires = jiffies; > >> > > >> > Do you not need to remove the similar check from the following block? > >> > >> I don''t think so, but I also can''t see how that adjustment would > >> help in the first place: If it gets executed after a very long period > >> of no traffic, it would itself not be able to reliably tell whether the > >> clock wrapped. > > > > Hrm, yes, This change would help in the case of a dribble of traffic > > which never hits the limit, but not in the case of no timer at all. > > > >> That said, I agree that the code as is appears to have a problem > >> (with 32-bit jiffies at least), but I can''t see how to easily deal with > >> it. > > > > Would it help to always have the pending timer armed, for either the > > next tick if credit needs replenishing or for, say MAX_JIFFIES/4 as a > > backstop to avoid wrapping issues? > > If that can be made work cleanly, that would probably be the > easiest solution. But I don''t see MAX_JIFFIES being defined > anywhere, and I''m unsure ULONG_MAX/4 would be well received > as a timeout on 64-bit systems.I guess several trillion years *is* a tad extreme ;-) (or my maths is wrong). We could always choose a more practical backstop, like a day or even a just few minutes -- we''d expect in general to always be pushing the timeout ahead rather than hitting it. Ian.
Jan Beulich
2012-Nov-30 10:29 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
>>> On 30.11.12 at 11:15, Ian Campbell <Ian.Campbell@citrix.com> wrote: > We could always choose a more practical backstop, like a day or even a > just few minutes -- we''d expect in general to always be pushing the > timeout ahead rather than hitting it.Or really, if cleanly doable, we could consider not enabling this when jiffies if 64 bits wide anyway. Jan
Yi, Shunli
2012-Nov-30 13:22 UTC
Re: [patch] netback: Xennet half die---netback driver didn''t detect the jiffies wrapping correctly.
> >> >> --- drivers/xen/netback/netback.c.org 2012-11-30 15:48:13.109039998 -0500 > >> >> +++ drivers/xen/netback/netback.c 2012-11-30 15:48:55.212072898 -0500 > >> >> @@ -1272,6 +1272,10 @@ static void net_tx_action(unsigned long > >> >> rmb(); /* Ensure that we see the request before we copy it. */ > >> >> memcpy(&txreq, RING_GET_REQUEST(&netif->tx, i), > >> >> sizeof(txreq)); > >> >> > >> >> + /* Check for the jiffies wrapping */ > >> >> + if (time_after_eq(jiffies, netif->credit_timeout.expires)) > >> >> + netif->credit_timeout.expires = jiffies; > >> > > >> > Do you not need to remove the similar check from the following block? > >> > >> I don''t think so, but I also can''t see how that adjustment would > >> help in the first place: If it gets executed after a very long > >> period of no traffic, it would itself not be able to reliably tell > >> whether the clock wrapped. > > > > Hrm, yes, This change would help in the case of a dribble of traffic > > which never hits the limit, but not in the case of no timer at all. > > > >> That said, I agree that the code as is appears to have a problem > >> (with 32-bit jiffies at least), but I can''t see how to easily deal > >> with it. > > > > Would it help to always have the pending timer armed, for either the > > next tick if credit needs replenishing or for, say MAX_JIFFIES/4 as > > a backstop to avoid wrapping issues? > > If that can be made work cleanly, that would probably be the easiest > solution. But I don''t see MAX_JIFFIES being defined anywhere, and I''m > unsure ULONG_MAX/4 would be well received as a timeout on 64-bit > systems.Thanks guys for your input, I''m really a newbie in kernel coding. The patch base on following thoughs: - If we enabled the credit mechanism, the next_credit should always in future of jiffies. - So, the timer should not be in pending(the code will be run, we just carry the timer with jiffies) or the timer is set to a time in future ( the code will not run ) I''m not sure my understand the credit mechanism right, but we can always calculate the next_credit base on jiffies instead of the old timer value: unsigned long next_credit netif->credit_timeout.expires + msecs_to_jiffies(netif->credit_usec / 1000);> I guess several trillion years *is* a tad extreme ;-) (or my maths is wrong).>We could always choose a more practical backstop, like a day or even a just few minutes -- we''d expect in general to always be pushing the timeout ahead rather than hitting it.>IanTo report this as spam, please forward to spam@websense.com. Thank you. Protected by Websense Hosted Email Security -- www.websense.com