jianhai luan
2013-Oct-12 08:53 UTC
DomU''s network interface will hung when Dom0 running 32bit
Hi Ian, I meet the DomU''s network interface hung issue recently, and have been working on the issue from that time. I find that DomU''s network interface, which send lesser package, will hung if Dom0 running 32bit and DomU''s up-time is very long. I think that one jiffies overflow bug exist in the function tx_credit_exceeded(). I know the inline function time_after_eq(a,b) will process jiffies overflow, but the function have one limit a should little that (b + MAX_SIGNAL_LONG). If a large than the value, time_after_eq will return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit machine. If DomU''s network interface send lesser package (<0.5k/s if jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, next_credit) will failure (should be true). So one timer which will not be trigger in short time, and later process will be aborted when timer_pending(&vif->credit_timeout) is true. The result will be DomU''s network interface will be hung in long time (> 40days). Please think about the below scenario: Condition: Dom0 running 32-bit and HZ = 1000 vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit = 0xffffffff, vif->credit_usec=0 jiffies=0 vif receive lesser package (DomU send lesser package). If the value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 hours. jiffies will large than 0x7ffffff. we guess jiffies = 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and one time which expire is 0xfffffff will be pended into system. So the interface will hung until jiffies recount 0xffffffff (that will need very long time). If some error exist in above explain, please help me point it out. Thanks, Jason
Wei Liu
2013-Oct-14 11:19 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:> Hi Ian, > I meet the DomU''s network interface hung issue recently, and have > been working on the issue from that time. I find that DomU''s network > interface, which send lesser package, will hung if Dom0 running > 32bit and DomU''s up-time is very long. I think that one jiffies > overflow bug exist in the function tx_credit_exceeded(). > I know the inline function time_after_eq(a,b) will process jiffies > overflow, but the function have one limit a should little that (b + > MAX_SIGNAL_LONG). If a large than the value, time_after_eq will > return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit > machine. > If DomU''s network interface send lesser package (<0.5k/s if > jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out > (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, > next_credit) will failure (should be true). So one timer which will > not be trigger in short time, and later process will be aborted when > timer_pending(&vif->credit_timeout) is true. The result will be > DomU''s network interface will be hung in long time (> 40days). > Please think about the below scenario: > Condition: > Dom0 running 32-bit and HZ = 1000 > vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit > = 0xffffffff, vif->credit_usec=0 jiffies=0 > vif receive lesser package (DomU send lesser package). If the > value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 > hours. jiffies will large than 0x7ffffff. we guess jiffies > 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and > one time which expire is 0xfffffff will be pended into system. So > the interface will hung until jiffies recount 0xffffffff (that will > need very long time).If I''m not mistaken you meant time_after_eq(now, next_credit) in netback. How does next_credit become 0xffffffff? Wei.> > If some error exist in above explain, please help me point it out. > > Thanks, > Jason
jianhai luan
2013-Oct-15 02:44 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-14 19:19, Wei Liu wrote:> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: >> Hi Ian, >> I meet the DomU''s network interface hung issue recently, and have >> been working on the issue from that time. I find that DomU''s network >> interface, which send lesser package, will hung if Dom0 running >> 32bit and DomU''s up-time is very long. I think that one jiffies >> overflow bug exist in the function tx_credit_exceeded(). >> I know the inline function time_after_eq(a,b) will process jiffies >> overflow, but the function have one limit a should little that (b + >> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will >> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit >> machine. >> If DomU''s network interface send lesser package (<0.5k/s if >> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out >> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, >> next_credit) will failure (should be true). So one timer which will >> not be trigger in short time, and later process will be aborted when >> timer_pending(&vif->credit_timeout) is true. The result will be >> DomU''s network interface will be hung in long time (> 40days). >> Please think about the below scenario: >> Condition: >> Dom0 running 32-bit and HZ = 1000 >> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit >> = 0xffffffff, vif->credit_usec=0 jiffies=0 >> vif receive lesser package (DomU send lesser package). If the >> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 >> hours. jiffies will large than 0x7ffffff. we guess jiffies >> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and >> one time which expire is 0xfffffff will be pended into system. So >> the interface will hung until jiffies recount 0xffffffff (that will >> need very long time). > If I''m not mistaken you meant time_after_eq(now, next_credit) in > netback. How does next_credit become 0xffffffff?I only assume the value is 0xfffffff, and the value of next_credit isn''t point. If the delta between now and next_credit larger than ULONG_MAX, time_after_eq will do wrong judge.> > Wei. > >> If some error exist in above explain, please help me point it out. >> >> Thanks, >> Jason
Ian Campbell
2013-Oct-15 08:43 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:> On 2013-10-14 19:19, Wei Liu wrote: > > On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: > >> Hi Ian, > >> I meet the DomU''s network interface hung issue recently, and have > >> been working on the issue from that time. I find that DomU''s network > >> interface, which send lesser package, will hung if Dom0 running > >> 32bit and DomU''s up-time is very long. I think that one jiffies > >> overflow bug exist in the function tx_credit_exceeded(). > >> I know the inline function time_after_eq(a,b) will process jiffies > >> overflow, but the function have one limit a should little that (b + > >> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will > >> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit > >> machine. > >> If DomU''s network interface send lesser package (<0.5k/s if > >> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out > >> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, > >> next_credit) will failure (should be true). So one timer which will > >> not be trigger in short time, and later process will be aborted when > >> timer_pending(&vif->credit_timeout) is true. The result will be > >> DomU''s network interface will be hung in long time (> 40days). > >> Please think about the below scenario: > >> Condition: > >> Dom0 running 32-bit and HZ = 1000 > >> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit > >> = 0xffffffff, vif->credit_usec=0 jiffies=0 > >> vif receive lesser package (DomU send lesser package). If the > >> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 > >> hours. jiffies will large than 0x7ffffff. we guess jiffies > >> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and > >> one time which expire is 0xfffffff will be pended into system. So > >> the interface will hung until jiffies recount 0xffffffff (that will > >> need very long time). > > If I''m not mistaken you meant time_after_eq(now, next_credit) in > > netback. How does next_credit become 0xffffffff? > > I only assume the value is 0xfffffff, and the value of next_credit > isn''t point. If the delta between now and next_credit larger than > ULONG_MAX, time_after_eq will do wrong judge.So it sounds like we need a timer which is independent of the traffic being sent to keep credit_timeout.expires rolling over. Can you propose a patch? Ian.> > > > Wei. > > > >> If some error exist in above explain, please help me point it out. > >> > >> Thanks, > >> Jason >
jianhai luan
2013-Oct-15 09:34 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 16:43, Ian Campbell wrote:> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote: >> On 2013-10-14 19:19, Wei Liu wrote: >>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: >>>> Hi Ian, >>>> I meet the DomU''s network interface hung issue recently, and have >>>> been working on the issue from that time. I find that DomU''s network >>>> interface, which send lesser package, will hung if Dom0 running >>>> 32bit and DomU''s up-time is very long. I think that one jiffies >>>> overflow bug exist in the function tx_credit_exceeded(). >>>> I know the inline function time_after_eq(a,b) will process jiffies >>>> overflow, but the function have one limit a should little that (b + >>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will >>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit >>>> machine. >>>> If DomU''s network interface send lesser package (<0.5k/s if >>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out >>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, >>>> next_credit) will failure (should be true). So one timer which will >>>> not be trigger in short time, and later process will be aborted when >>>> timer_pending(&vif->credit_timeout) is true. The result will be >>>> DomU''s network interface will be hung in long time (> 40days). >>>> Please think about the below scenario: >>>> Condition: >>>> Dom0 running 32-bit and HZ = 1000 >>>> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit >>>> = 0xffffffff, vif->credit_usec=0 jiffies=0 >>>> vif receive lesser package (DomU send lesser package). If the >>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 >>>> hours. jiffies will large than 0x7ffffff. we guess jiffies >>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and >>>> one time which expire is 0xfffffff will be pended into system. So >>>> the interface will hung until jiffies recount 0xffffffff (that will >>>> need very long time). >>> If I''m not mistaken you meant time_after_eq(now, next_credit) in >>> netback. How does next_credit become 0xffffffff? >> I only assume the value is 0xfffffff, and the value of next_credit >> isn''t point. If the delta between now and next_credit larger than >> ULONG_MAX, time_after_eq will do wrong judge. > So it sounds like we need a timer which is independent of the traffic > being sent to keep credit_timeout.expires rolling over. > > Can you propose a patch?Because credit_timeout.expire always after jiffies, i judge the value over the range of time_after_eq() by time_before(now, vif->credit_timeout.expires). please check the patch.> > Ian. > >>> Wei. >>> >>>> If some error exist in above explain, please help me point it out. >>>> >>>> Thanks, >>>> Jason >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Wei Liu
2013-Oct-15 10:06 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote:> > On 2013-10-15 16:43, Ian Campbell wrote: > >On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote: > >>On 2013-10-14 19:19, Wei Liu wrote: > >>>On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: > >>>>Hi Ian, > >>>> I meet the DomU''s network interface hung issue recently, and have > >>>>been working on the issue from that time. I find that DomU''s network > >>>>interface, which send lesser package, will hung if Dom0 running > >>>>32bit and DomU''s up-time is very long. I think that one jiffies > >>>>overflow bug exist in the function tx_credit_exceeded(). > >>>> I know the inline function time_after_eq(a,b) will process jiffies > >>>>overflow, but the function have one limit a should little that (b + > >>>>MAX_SIGNAL_LONG). If a large than the value, time_after_eq will > >>>>return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit > >>>>machine. > >>>> If DomU''s network interface send lesser package (<0.5k/s if > >>>>jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out > >>>>(credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, > >>>>next_credit) will failure (should be true). So one timer which will > >>>>not be trigger in short time, and later process will be aborted when > >>>>timer_pending(&vif->credit_timeout) is true. The result will be > >>>>DomU''s network interface will be hung in long time (> 40days). > >>>> Please think about the below scenario: > >>>> Condition: > >>>> Dom0 running 32-bit and HZ = 1000 > >>>> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit > >>>>= 0xffffffff, vif->credit_usec=0 jiffies=0 > >>>> vif receive lesser package (DomU send lesser package). If the > >>>>value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 > >>>>hours. jiffies will large than 0x7ffffff. we guess jiffies > >>>>0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and > >>>>one time which expire is 0xfffffff will be pended into system. So > >>>>the interface will hung until jiffies recount 0xffffffff (that will > >>>>need very long time). > >>>If I''m not mistaken you meant time_after_eq(now, next_credit) in > >>>netback. How does next_credit become 0xffffffff? > >>I only assume the value is 0xfffffff, and the value of next_credit > >>isn''t point. If the delta between now and next_credit larger than > >>ULONG_MAX, time_after_eq will do wrong judge. > >So it sounds like we need a timer which is independent of the traffic > >being sent to keep credit_timeout.expires rolling over. > > > >Can you propose a patch? > > Because credit_timeout.expire always after jiffies, i judge the > value over the range of time_after_eq() by time_before(now, > vif->credit_timeout.expires). please check the patch.I don''t think this really fix the issue for you. You still have chance that now wraps around and falls between expires and next_credit. In that case it''s stalled again. Wei.
jianhai luan
2013-Oct-15 11:26 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 18:06, Wei Liu wrote:> On Tue, Oct 15, 2013 at 05:34:57PM +0800, jianhai luan wrote: >> On 2013-10-15 16:43, Ian Campbell wrote: >>> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote: >>>> On 2013-10-14 19:19, Wei Liu wrote: >>>>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: >>>>>> Hi Ian, >>>>>> I meet the DomU''s network interface hung issue recently, and have >>>>>> been working on the issue from that time. I find that DomU''s network >>>>>> interface, which send lesser package, will hung if Dom0 running >>>>>> 32bit and DomU''s up-time is very long. I think that one jiffies >>>>>> overflow bug exist in the function tx_credit_exceeded(). >>>>>> I know the inline function time_after_eq(a,b) will process jiffies >>>>>> overflow, but the function have one limit a should little that (b + >>>>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will >>>>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit >>>>>> machine. >>>>>> If DomU''s network interface send lesser package (<0.5k/s if >>>>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out >>>>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, >>>>>> next_credit) will failure (should be true). So one timer which will >>>>>> not be trigger in short time, and later process will be aborted when >>>>>> timer_pending(&vif->credit_timeout) is true. The result will be >>>>>> DomU''s network interface will be hung in long time (> 40days). >>>>>> Please think about the below scenario: >>>>>> Condition: >>>>>> Dom0 running 32-bit and HZ = 1000 >>>>>> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit >>>>>> = 0xffffffff, vif->credit_usec=0 jiffies=0 >>>>>> vif receive lesser package (DomU send lesser package). If the >>>>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 >>>>>> hours. jiffies will large than 0x7ffffff. we guess jiffies >>>>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and >>>>>> one time which expire is 0xfffffff will be pended into system. So >>>>>> the interface will hung until jiffies recount 0xffffffff (that will >>>>>> need very long time). >>>>> If I''m not mistaken you meant time_after_eq(now, next_credit) in >>>>> netback. How does next_credit become 0xffffffff? >>>> I only assume the value is 0xfffffff, and the value of next_credit >>>> isn''t point. If the delta between now and next_credit larger than >>>> ULONG_MAX, time_after_eq will do wrong judge. >>> So it sounds like we need a timer which is independent of the traffic >>> being sent to keep credit_timeout.expires rolling over. >>> >>> Can you propose a patch? >> Because credit_timeout.expire always after jiffies, i judge the >> value over the range of time_after_eq() by time_before(now, >> vif->credit_timeout.expires). please check the patch. > I don''t think this really fix the issue for you. You still have chance > that now wraps around and falls between expires and next_credit. In that > case it''s stalled again.if time_before(now, vif->credit_timeout.expires) is true, time wrap and do operation. Otherwise time_before(now, vif->credit_timeout.expires) isn''t true, now - vif->credit_timeout.expires should be letter than ULONG_MAX/2. Because next_credit large than vif->credit_timeout.expires (next_crdit = vif->credit_timeout.expires + msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and next_credit should be in range of time_after_eq(). So time_after_eq() do correctly judge. Jason> > Wei.
Wei Liu
2013-Oct-15 12:58 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: [...]> >>>Can you propose a patch? > >>Because credit_timeout.expire always after jiffies, i judge the > >>value over the range of time_after_eq() by time_before(now, > >>vif->credit_timeout.expires). please check the patch. > >I don''t think this really fix the issue for you. You still have chance > >that now wraps around and falls between expires and next_credit. In that > >case it''s stalled again. > > if time_before(now, vif->credit_timeout.expires) is true, time wrap > and do operation. Otherwise time_before(now, > vif->credit_timeout.expires) isn''t true, now - > vif->credit_timeout.expires should be letter than ULONG_MAX/2. > Because next_credit large than vif->credit_timeout.expires > (next_crdit = vif->credit_timeout.expires + > msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and > next_credit should be in range of time_after_eq(). So > time_after_eq() do correctly judge. >Not sure I understand you. Consider "now" is placed like this: expires now next_credit ----time increases this direction---> * time_after_eq(now, next_credit) -> false * time_before(now, expires) -> false Then it''s stuck again. You''re merely narrowing the window, not fixing the real problem. Wei.> Jason > > > >Wei.
jianhai luan
2013-Oct-15 14:29 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 20:58, Wei Liu wrote:> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: > [...] >>>>> Can you propose a patch? >>>> Because credit_timeout.expire always after jiffies, i judge the >>>> value over the range of time_after_eq() by time_before(now, >>>> vif->credit_timeout.expires). please check the patch. >>> I don''t think this really fix the issue for you. You still have chance >>> that now wraps around and falls between expires and next_credit. In that >>> case it''s stalled again. >> if time_before(now, vif->credit_timeout.expires) is true, time wrap >> and do operation. Otherwise time_before(now, >> vif->credit_timeout.expires) isn''t true, now - >> vif->credit_timeout.expires should be letter than ULONG_MAX/2. >> Because next_credit large than vif->credit_timeout.expires >> (next_crdit = vif->credit_timeout.expires + >> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and >> next_credit should be in range of time_after_eq(). So >> time_after_eq() do correctly judge. >> > Not sure I understand you. Consider "now" is placed like this: > > expires now next_credit > ----time increases this direction---> > > * time_after_eq(now, next_credit) -> false > * time_before(now, expires) -> falseIf now is placed in above environment, the result will be correct (Sending package will be not allowed until next_credit). * time_after_eq(now, next_credit) --> false will include two environment: expires now next_credit -----------time increases this direction ----> Or expires next_credit next_credit + MAX_LONG/2 now -----------time increases this direction ----> the first environment should be correct to control transmit. the second environment is our included environment. Jason> > Then it''s stuck again. You''re merely narrowing the window, not fixing > the real problem. > > Wei. > >> Jason >>> Wei.
Wei Liu
2013-Oct-15 14:49 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote:> > On 2013-10-15 20:58, Wei Liu wrote: > >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: > >[...] > >>>>>Can you propose a patch? > >>>>Because credit_timeout.expire always after jiffies, i judge the > >>>>value over the range of time_after_eq() by time_before(now, > >>>>vif->credit_timeout.expires). please check the patch. > >>>I don''t think this really fix the issue for you. You still have chance > >>>that now wraps around and falls between expires and next_credit. In that > >>>case it''s stalled again. > >>if time_before(now, vif->credit_timeout.expires) is true, time wrap > >>and do operation. Otherwise time_before(now, > >>vif->credit_timeout.expires) isn''t true, now - > >>vif->credit_timeout.expires should be letter than ULONG_MAX/2. > >>Because next_credit large than vif->credit_timeout.expires > >>(next_crdit = vif->credit_timeout.expires + > >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and > >>next_credit should be in range of time_after_eq(). So > >>time_after_eq() do correctly judge. > >> > >Not sure I understand you. Consider "now" is placed like this: > > > > expires now next_credit > > ----time increases this direction---> > > > >* time_after_eq(now, next_credit) -> false > >* time_before(now, expires) -> false > > If now is placed in above environment, the result will be correct > (Sending package will be not allowed until next_credit).No, it is not necessarily correct. Keep in mind that "now" wraps around, which is the issue you try to fix. You still have a window to stall your frontend.> * time_after_eq(now, next_credit) --> false will include two environment: > expires now next_credit > -----------time increases this direction ----> > > Or > expires next_credit next_credit + MAX_LONG/2 now > -----------time increases this direction ----> > > > the first environment should be correct to control transmit. the > second environment is our included environment. > > Jason > > > >Then it''s stuck again. You''re merely narrowing the window, not fixing > >the real problem. > > > >Wei. > > > >>Jason > >>>Wei.
Ian Campbell
2013-Oct-15 14:50 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote:> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote: > > > > On 2013-10-15 20:58, Wei Liu wrote: > > >On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: > > >[...] > > >>>>>Can you propose a patch? > > >>>>Because credit_timeout.expire always after jiffies, i judge the > > >>>>value over the range of time_after_eq() by time_before(now, > > >>>>vif->credit_timeout.expires). please check the patch. > > >>>I don''t think this really fix the issue for you. You still have chance > > >>>that now wraps around and falls between expires and next_credit. In that > > >>>case it''s stalled again. > > >>if time_before(now, vif->credit_timeout.expires) is true, time wrap > > >>and do operation. Otherwise time_before(now, > > >>vif->credit_timeout.expires) isn''t true, now - > > >>vif->credit_timeout.expires should be letter than ULONG_MAX/2. > > >>Because next_credit large than vif->credit_timeout.expires > > >>(next_crdit = vif->credit_timeout.expires + > > >>msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and > > >>next_credit should be in range of time_after_eq(). So > > >>time_after_eq() do correctly judge. > > >> > > >Not sure I understand you. Consider "now" is placed like this: > > > > > > expires now next_credit > > > ----time increases this direction---> > > > > > >* time_after_eq(now, next_credit) -> false > > >* time_before(now, expires) -> false > > > > If now is placed in above environment, the result will be correct > > (Sending package will be not allowed until next_credit). > > No, it is not necessarily correct. Keep in mind that "now" wraps around, > which is the issue you try to fix. You still have a window to stall your > frontend.Remember that time_after_eq is supposed to work even with wraparound occurring, so long as the two times are less than MAX_LONG/2 apart.> > > * time_after_eq(now, next_credit) --> false will include two environment: > > expires now next_credit > > -----------time increases this direction ----> > > > > Or > > expires next_credit next_credit + MAX_LONG/2 now > > -----------time increases this direction ----> > > > > > > the first environment should be correct to control transmit. the > > second environment is our included environment. > > > > Jason > > > > > >Then it''s stuck again. You''re merely narrowing the window, not fixing > > >the real problem. > > > > > >Wei. > > > > > >>Jason > > >>>Wei.
jianhai luan
2013-Oct-15 15:19 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 22:50, Ian Campbell wrote:> On Tue, 2013-10-15 at 15:49 +0100, Wei Liu wrote: >> On Tue, Oct 15, 2013 at 10:29:15PM +0800, jianhai luan wrote: >>> On 2013-10-15 20:58, Wei Liu wrote: >>>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: >>>> [...] >>>>>>>> Can you propose a patch? >>>>>>> Because credit_timeout.expire always after jiffies, i judge the >>>>>>> value over the range of time_after_eq() by time_before(now, >>>>>>> vif->credit_timeout.expires). please check the patch. >>>>>> I don''t think this really fix the issue for you. You still have chance >>>>>> that now wraps around and falls between expires and next_credit. In that >>>>>> case it''s stalled again. >>>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap >>>>> and do operation. Otherwise time_before(now, >>>>> vif->credit_timeout.expires) isn''t true, now - >>>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2. >>>>> Because next_credit large than vif->credit_timeout.expires >>>>> (next_crdit = vif->credit_timeout.expires + >>>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and >>>>> next_credit should be in range of time_after_eq(). So >>>>> time_after_eq() do correctly judge. >>>>> >>>> Not sure I understand you. Consider "now" is placed like this: >>>> >>>> expires now next_credit >>>> ----time increases this direction---> >>>> >>>> * time_after_eq(now, next_credit) -> false >>>> * time_before(now, expires) -> false >>> If now is placed in above environment, the result will be correct >>> (Sending package will be not allowed until next_credit). >> No, it is not necessarily correct. Keep in mind that "now" wraps around, >> which is the issue you try to fix. You still have a window to stall your >> frontend. > Remember that time_after_eq is supposed to work even with wraparound > occurring, so long as the two times are less than MAX_LONG/2 apart.Sorry for my misunderstand explanation. I mean that * time_after_eq()/time_before_eq() fix the jiffies wraparound, so please think about jiffies in line increasing. * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2), the judge will be wrong if out of the range. So please think about three kind environment - expires now next_credit --------time increases this direction ----------> - expires [next_credit now next_credit+MAX_LONG/2 --------time increase this direction -----------> - expires next_credit next_credit+MAX_LONG/2 now --------time increadse this direction ----------> The first environment should be netfront consume all credit_byte before next_credit, So we should pending one timer to calculator the new credit_byte, and don''t transmit until next_credit. the second environment should be calculator the credit_byte because netfront don''t consume all credit_byte before next_credit, and time_after_eq() do correct judge. the third environment should be calculator in time because netfront don''t consume all credit_byte until next_credit.But time_after_eq do error judge (time_after_eq(now, next_credit) is false), so the remaining_byte isn''t be increased. and I work on the third environment. You know now > next_credit+MAX_LONG/2, time_before(now, expire) should be true(time_before(now, expire) is false in first environment)> >>> * time_after_eq(now, next_credit) --> false will include two environment: >>> expires now next_credit >>> -----------time increases this direction ----> >>> >>> Or >>> expires next_credit next_credit + MAX_LONG/2 now >>> -----------time increases this direction ----> >>> >>> >>> the first environment should be correct to control transmit. the >>> second environment is our included environment. >>> >>> Jason >>>> Then it''s stuck again. You''re merely narrowing the window, not fixing >>>> the real problem. >>>> >>>> Wei. >>>> >>>>> Jason >>>>>> Wei. >
Wei Liu
2013-Oct-15 16:03 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote: [...]> >>>> > >>>>* time_after_eq(now, next_credit) -> false > >>>>* time_before(now, expires) -> false > >>>If now is placed in above environment, the result will be correct > >>>(Sending package will be not allowed until next_credit). > >>No, it is not necessarily correct. Keep in mind that "now" wraps around, > >>which is the issue you try to fix. You still have a window to stall your > >>frontend. > >Remember that time_after_eq is supposed to work even with wraparound > >occurring, so long as the two times are less than MAX_LONG/2 apart. > > Sorry for my misunderstand explanation. I mean that > * time_after_eq()/time_before_eq() fix the jiffies wraparound, so > please think about jiffies in line increasing. > * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2), > the judge will be wrong if out of the range. > > So please think about three kind environment > - expires now next_credit > --------time increases this direction ----------> > > - expires [next_credit now next_credit+MAX_LONG/2 > --------time increase this direction -----------> > > - expires next_credit next_credit+MAX_LONG/2 now > --------time increadse this direction ----------> > > The first environment should be netfront consume all credit_byte > before next_credit, So we should pending one timer to calculator the > new credit_byte, and don''t transmit until next_credit. > > the second environment should be calculator the credit_byte because > netfront don''t consume all credit_byte before next_credit, and > time_after_eq() do correct judge. > > the third environment should be calculator in time because netfront > don''t consume all credit_byte until next_credit.But time_after_eq do > error judge (time_after_eq(now, next_credit) is false), so the > remaining_byte isn''t be increased. > > and I work on the third environment. You know now > > next_credit+MAX_LONG/2, time_before(now, expire) should be > true(time_before(now, expire) is false in first environment) > >Thanks for staighten this out for me. I''m just too dumb for this, please be patient with me. :-) Could you prove that time_before(now, expire) is always true in third case? That''s where my main cencern lies. Is it because msecs_to_jiffies always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most? Wei.
jianhai luan
2013-Oct-15 16:23 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 0:03, Wei Liu wrote:> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote: > [...] >>>>>> * time_after_eq(now, next_credit) -> false >>>>>> * time_before(now, expires) -> false >>>>> If now is placed in above environment, the result will be correct >>>>> (Sending package will be not allowed until next_credit). >>>> No, it is not necessarily correct. Keep in mind that "now" wraps around, >>>> which is the issue you try to fix. You still have a window to stall your >>>> frontend. >>> Remember that time_after_eq is supposed to work even with wraparound >>> occurring, so long as the two times are less than MAX_LONG/2 apart. >> Sorry for my misunderstand explanation. I mean that >> * time_after_eq()/time_before_eq() fix the jiffies wraparound, so >> please think about jiffies in line increasing. >> * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2), >> the judge will be wrong if out of the range. >> >> So please think about three kind environment >> - expires now next_credit >> --------time increases this direction ----------> >> >> - expires [next_credit now next_credit+MAX_LONG/2 >> --------time increase this direction -----------> >> >> - expires next_credit next_credit+MAX_LONG/2 now >> --------time increadse this direction ----------> >> >> The first environment should be netfront consume all credit_byte >> before next_credit, So we should pending one timer to calculator the >> new credit_byte, and don''t transmit until next_credit. >> >> the second environment should be calculator the credit_byte because >> netfront don''t consume all credit_byte before next_credit, and >> time_after_eq() do correct judge. >> >> the third environment should be calculator in time because netfront >> don''t consume all credit_byte until next_credit.But time_after_eq do >> error judge (time_after_eq(now, next_credit) is false), so the >> remaining_byte isn''t be increased. >> >> and I work on the third environment. You know now > >> next_credit+MAX_LONG/2, time_before(now, expire) should be >> true(time_before(now, expire) is false in first environment) > Thanks for staighten this out for me. I''m just too dumb for this, please > be patient with me. :-) > > Could you prove that time_before(now, expire) is always true in third > case? That''s where my main cencern lies. Is it because msecs_to_jiffies > always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?I have wrong judge in third environment. If now large than expires + MAX_UNLONG, time_before(now, expires) will be false. expires next_credit next_credit+MAX_UNLONG/2 expires + MAX_UNLONG now next_credit+MAX_UNLONG --------------------------------------------------------- time increadse this direction ----------------------------------> In the above environment, time_before(now, expires) will return false. But the jiffies elapsed more time and next_credit will be reachable in soon(time_after_eq(now, next_credit) will be true).> > Wei.
jianhai luan
2013-Oct-16 00:15 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 0:23, jianhai luan wrote:> > On 2013-10-16 0:03, Wei Liu wrote: >> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote: >> [...] >>>>>>> * time_after_eq(now, next_credit) -> false >>>>>>> * time_before(now, expires) -> false >>>>>> If now is placed in above environment, the result will be correct >>>>>> (Sending package will be not allowed until next_credit). >>>>> No, it is not necessarily correct. Keep in mind that "now" wraps >>>>> around, >>>>> which is the issue you try to fix. You still have a window to >>>>> stall your >>>>> frontend. >>>> Remember that time_after_eq is supposed to work even with wraparound >>>> occurring, so long as the two times are less than MAX_LONG/2 apart. >>> Sorry for my misunderstand explanation. I mean that >>> * time_after_eq()/time_before_eq() fix the jiffies wraparound, so >>> please think about jiffies in line increasing. >>> * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2), >>> the judge will be wrong if out of the range. >>> >>> So please think about three kind environment >>> - expires now next_credit >>> --------time increases this direction ----------> >>> >>> - expires [next_credit now next_credit+MAX_LONG/2 >>> --------time increase this direction -----------> >>> >>> - expires next_credit next_credit+MAX_LONG/2 now >>> --------time increadse this direction ----------> >>> >>> The first environment should be netfront consume all credit_byte >>> before next_credit, So we should pending one timer to calculator the >>> new credit_byte, and don''t transmit until next_credit. >>> >>> the second environment should be calculator the credit_byte because >>> netfront don''t consume all credit_byte before next_credit, and >>> time_after_eq() do correct judge. >>> >>> the third environment should be calculator in time because netfront >>> don''t consume all credit_byte until next_credit.But time_after_eq do >>> error judge (time_after_eq(now, next_credit) is false), so the >>> remaining_byte isn''t be increased. >>> >>> and I work on the third environment. You know now > >>> next_credit+MAX_LONG/2, time_before(now, expire) should be >>> true(time_before(now, expire) is false in first environment) >> Thanks for staighten this out for me. I''m just too dumb for this, please >> be patient with me. :-) >> >> Could you prove that time_before(now, expire) is always true in third >> case? That''s where my main cencern lies. Is it because msecs_to_jiffies >> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most? > > I have wrong judge in third environment. If now large than expires + > MAX_UNLONG, time_before(now, expires) will be false. > expires next_credit next_credit+MAX_UNLONG/2 expires + > MAX_UNLONG now next_credit+MAX_UNLONG > --------------------------------------------------------- time > increadse this direction ----------------------------------> > > In the above environment, time_before(now, expires) will return > false. But the jiffies elapsed more time and next_credit will be > reachable in soon(time_after_eq(now, next_credit) will be true).After above talk, the window should be exist (expire+MAX_ULONG next_credit+MAX_ULONG, expire + 2MAX_ULONF next_credit+MAX_ULONG ....,expire+<n>ULONG next_credit+<n>MAX_ULONG). Other time window should be not exist (maybe i don''t think about). If so, please think about the below: * If no speed control, vif->credit_usec should be zero. expire = next_credit and the window is zero * If we have done speed control, the scenario should be very likely than first environment except the value is larger (the delta is <n>MAX_UNLONG) - If do speed control. the window should have been think about speed worse time 100M/s 40s 1000M/s 4s - the time should be acceptable.>> >> Wei. >
annie li
2013-Oct-16 07:10 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 16:43, Ian Campbell wrote:> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote: >> On 2013-10-14 19:19, Wei Liu wrote: >>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote: >>>> Hi Ian, >>>> I meet the DomU''s network interface hung issue recently, and have >>>> been working on the issue from that time. I find that DomU''s network >>>> interface, which send lesser package, will hung if Dom0 running >>>> 32bit and DomU''s up-time is very long. I think that one jiffies >>>> overflow bug exist in the function tx_credit_exceeded(). >>>> I know the inline function time_after_eq(a,b) will process jiffies >>>> overflow, but the function have one limit a should little that (b + >>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will >>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit >>>> machine. >>>> If DomU''s network interface send lesser package (<0.5k/s if >>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out >>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now, >>>> next_credit) will failure (should be true). So one timer which will >>>> not be trigger in short time, and later process will be aborted when >>>> timer_pending(&vif->credit_timeout) is true. The result will be >>>> DomU''s network interface will be hung in long time (> 40days). >>>> Please think about the below scenario: >>>> Condition: >>>> Dom0 running 32-bit and HZ = 1000 >>>> vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit >>>> = 0xffffffff, vif->credit_usec=0 jiffies=0 >>>> vif receive lesser package (DomU send lesser package). If the >>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55 >>>> hours. jiffies will large than 0x7ffffff. we guess jiffies >>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and >>>> one time which expire is 0xfffffff will be pended into system. So >>>> the interface will hung until jiffies recount 0xffffffff (that will >>>> need very long time). >>> If I''m not mistaken you meant time_after_eq(now, next_credit) in >>> netback. How does next_credit become 0xffffffff? >> I only assume the value is 0xfffffff, and the value of next_credit >> isn''t point. If the delta between now and next_credit larger than >> ULONG_MAX, time_after_eq will do wrong judge. > So it sounds like we need a timer which is independent of the traffic > being sent to keep credit_timeout.expires rolling over.Resending it because of the wrong format in last email... Is it a timer to be set as less than ULONG_MAX/2 to avoid credit_timeout.expires rolling over? But the problem is that we can not assure where jiffies start from, and this probably results into current issue again. I assume Jason''s patch fix this issue and this patch only uses __mod_timer to add a timer with next_credit when the netback fails to send out current available credits. Thanks Annie> > Can you propose a patch? > > Ian. > >>> Wei. >>> >>>> If some error exist in above explain, please help me point it out. >>>> >>>> Thanks, >>>> Jason > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
jianhai luan
2013-Oct-16 07:35 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-15 20:58, Wei Liu wrote:> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: > [...] >>>>> Can you propose a patch? >>>> Because credit_timeout.expire always after jiffies, i judge the >>>> value over the range of time_after_eq() by time_before(now, >>>> vif->credit_timeout.expires). please check the patch. >>> I don''t think this really fix the issue for you. You still have chance >>> that now wraps around and falls between expires and next_credit. In that >>> case it''s stalled again. >> if time_before(now, vif->credit_timeout.expires) is true, time wrap >> and do operation. Otherwise time_before(now, >> vif->credit_timeout.expires) isn''t true, now - >> vif->credit_timeout.expires should be letter than ULONG_MAX/2. >> Because next_credit large than vif->credit_timeout.expires >> (next_crdit = vif->credit_timeout.expires + >> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and >> next_credit should be in range of time_after_eq(). So >> time_after_eq() do correctly judge. >> > Not sure I understand you. Consider "now" is placed like this: > > expires now next_credit > ----time increases this direction---> > > * time_after_eq(now, next_credit) -> false > * time_before(now, expires) -> false > > Then it''s stuck again. You''re merely narrowing the window, not fixing > the real problem.The above environment isn''t stack again. The netback will pending one timer to process the environment. The attachment program will prove if !(time_after_eq(now, next_credit) || time_before(now, vif->credit_timeout.expires)), now will only be placed in above environment [ expires next_credit), and the above environment will be processed by timer in soon.> > Wei. > >> Jason >>> Wei.Jason. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-16 08:46 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Wed, 2013-10-16 at 15:10 +0800, annie li wrote:> Is it a timer to be set as less than ULONG_MAX/2 to avoid > credit_timeout.expires rolling over? But the problem is that we can > not assure where jiffies start from, and this probably results into > current issue again.Not less than the absolute value ULONG_MAX/2, just less than ULONG_MAX/2 offset from "now". The time_eq macros deal with wraparound. Looks like Jianhai and Wei are discussing a better fix than an additional timer though. Ian
annie li
2013-Oct-16 09:39 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 15:35, jianhai luan wrote:> > On 2013-10-15 20:58, Wei Liu wrote: >> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: >> [...] >>>>>> Can you propose a patch? >>>>> Because credit_timeout.expire always after jiffies, i judge the >>>>> value over the range of time_after_eq() by time_before(now, >>>>> vif->credit_timeout.expires). please check the patch. >>>> I don''t think this really fix the issue for you. You still have chance >>>> that now wraps around and falls between expires and next_credit. In >>>> that >>>> case it''s stalled again. >>> if time_before(now, vif->credit_timeout.expires) is true, time wrap >>> and do operation. Otherwise time_before(now, >>> vif->credit_timeout.expires) isn''t true, now - >>> vif->credit_timeout.expires should be letter than ULONG_MAX/2. >>> Because next_credit large than vif->credit_timeout.expires >>> (next_crdit = vif->credit_timeout.expires + >>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and >>> next_credit should be in range of time_after_eq(). So >>> time_after_eq() do correctly judge. >>> >> Not sure I understand you. Consider "now" is placed like this: >> >> expires now next_credit >> ----time increases this direction---> >> >> * time_after_eq(now, next_credit) -> false >> * time_before(now, expires) -> false >> >> Then it''s stuck again. You''re merely narrowing the window, not fixing >> the real problem. > > The above environment isn''t stack again. The netback will pending one > timer to process the environment. > > The attachment program will prove if !(time_after_eq(now, next_credit) > || time_before(now, vif->credit_timeout.expires)), now will only be > placed in above environment [ expires next_credit), and the above > environment will be processed by timer in soon.Or check following to see what the if condition really do, ----------expires-------now-------credit---------- is the only case where we need to add a timer. Other cases like following would match the if condition above, then no timer is added. ----------expires----------credit------now------ -----now-----expires----------credit---------- Or we can consider the extreme condition, when the rate control does not exist, "credit_usec" is zero, and "next_credit" is equal to "expires". The above if condition would cover all conditions, and no rate control really happens. If credit_usec is not zero, the "if condition" would cover the range outside of that from expires to next_credit. Even if "now" is wrapped again into the range from "expires" to "next_credit", the "next_credit" that is set in __mod_timer is reasonable value(this can be gotten from credit_usec), and the timer would be hit soon. Thanks Annie>> >> Wei. >> >>> Jason >>>> Wei. > Jason. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
jianhai luan
2013-Oct-16 13:08 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 17:39, annie li wrote:> > On 2013-10-16 15:35, jianhai luan wrote: >> >> On 2013-10-15 20:58, Wei Liu wrote: >>> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote: >>> [...] >>>>>>> Can you propose a patch? >>>>>> Because credit_timeout.expire always after jiffies, i judge the >>>>>> value over the range of time_after_eq() by time_before(now, >>>>>> vif->credit_timeout.expires). please check the patch. >>>>> I don''t think this really fix the issue for you. You still have >>>>> chance >>>>> that now wraps around and falls between expires and next_credit. >>>>> In that >>>>> case it''s stalled again. >>>> if time_before(now, vif->credit_timeout.expires) is true, time wrap >>>> and do operation. Otherwise time_before(now, >>>> vif->credit_timeout.expires) isn''t true, now - >>>> vif->credit_timeout.expires should be letter than ULONG_MAX/2. >>>> Because next_credit large than vif->credit_timeout.expires >>>> (next_crdit = vif->credit_timeout.expires + >>>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and >>>> next_credit should be in range of time_after_eq(). So >>>> time_after_eq() do correctly judge. >>>> >>> Not sure I understand you. Consider "now" is placed like this: >>> >>> expires now next_credit >>> ----time increases this direction---> >>> >>> * time_after_eq(now, next_credit) -> false >>> * time_before(now, expires) -> false >>> >>> Then it''s stuck again. You''re merely narrowing the window, not fixing >>> the real problem. >> >> The above environment isn''t stack again. The netback will pending one >> timer to process the environment. >> >> The attachment program will prove if !(time_after_eq(now, >> next_credit) || time_before(now, vif->credit_timeout.expires)), now >> will only be placed in above environment [ expires next_credit), and >> the above environment will be processed by timer in soon. > > Or check following to see what the if condition really do, > > ----------expires-------now-------credit---------- is the only case > where we need to add a timer. > > Other cases like following would match the if condition above, then no > timer is added. > ----------expires----------credit------now------ > -----now-----expires----------credit---------- > > Or we can consider the extreme condition, when the rate control does > not exist, "credit_usec" is zero, and "next_credit" is equal to > "expires". The above if condition would cover all conditions, and no > rate control really happens. If credit_usec is not zero, the "if > condition" would cover the range outside of that from expires to > next_credit. > > Even if "now" is wrapped again into the range from "expires" to > "next_credit", the "next_credit" that is set in __mod_timer is > reasonable value(this can be gotten from credit_usec), and the timer > would be hit soon.Thanks Annie''s express, my option is consistent with Annie. Jason> > Thanks > Annie >>> >>> Wei. >>> >>>> Jason >>>>> Wei. >> Jason. >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >
Wei Liu
2013-Oct-16 13:47 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Wed, Oct 16, 2013 at 09:08:03PM +0800, jianhai luan wrote: [...]> >>> > >>> expires now next_credit > >>> ----time increases this direction---> > >>> > >>>* time_after_eq(now, next_credit) -> false > >>>* time_before(now, expires) -> false > >>> > >>>Then it''s stuck again. You''re merely narrowing the window, not fixing > >>>the real problem. > >> > >>The above environment isn''t stack again. The netback will > >>pending one timer to process the environment. > >> > >>The attachment program will prove if !(time_after_eq(now, > >>next_credit) || time_before(now, vif->credit_timeout.expires)), > >>now will only be placed in above environment [ expires > >>next_credit), and the above environment will be processed by > >>timer in soon. > > > >Or check following to see what the if condition really do, > > > >----------expires-------now-------credit---------- is the only > >case where we need to add a timer. > > > >Other cases like following would match the if condition above, > >then no timer is added. > >----------expires----------credit------now------ > >-----now-----expires----------credit---------- > > > >Or we can consider the extreme condition, when the rate control > >does not exist, "credit_usec" is zero, and "next_credit" is equal > >to "expires". The above if condition would cover all conditions, > >and no rate control really happens. If credit_usec is not zero, > >the "if condition" would cover the range outside of that from > >expires to next_credit. > > > >Even if "now" is wrapped again into the range from "expires" to > >"next_credit", the "next_credit" that is set in __mod_timer is > >reasonable value(this can be gotten from credit_usec), and the > >timer would be hit soon. > > Thanks Annie''s express, my option is consistent with Annie. >OK, thanks for the explanation. I think I get the idea. In any case, could you use !time_in_range / !time_in_range_open instead of open coded one-liner? Though I presume one-liner open coding would not hurt. Wei.
jianhai luan
2013-Oct-16 15:04 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 21:47, Wei Liu wrote:> On Wed, Oct 16, 2013 at 09:08:03PM +0800, jianhai luan wrote: > [...] >>>>> expires now next_credit >>>>> ----time increases this direction---> >>>>> >>>>> * time_after_eq(now, next_credit) -> false >>>>> * time_before(now, expires) -> false >>>>> >>>>> Then it''s stuck again. You''re merely narrowing the window, not fixing >>>>> the real problem. >>>> The above environment isn''t stack again. The netback will >>>> pending one timer to process the environment. >>>> >>>> The attachment program will prove if !(time_after_eq(now, >>>> next_credit) || time_before(now, vif->credit_timeout.expires)), >>>> now will only be placed in above environment [ expires >>>> next_credit), and the above environment will be processed by >>>> timer in soon. >>> Or check following to see what the if condition really do, >>> >>> ----------expires-------now-------credit---------- is the only >>> case where we need to add a timer. >>> >>> Other cases like following would match the if condition above, >>> then no timer is added. >>> ----------expires----------credit------now------ >>> -----now-----expires----------credit---------- >>> >>> Or we can consider the extreme condition, when the rate control >>> does not exist, "credit_usec" is zero, and "next_credit" is equal >>> to "expires". The above if condition would cover all conditions, >>> and no rate control really happens. If credit_usec is not zero, >>> the "if condition" would cover the range outside of that from >>> expires to next_credit. >>> >>> Even if "now" is wrapped again into the range from "expires" to >>> "next_credit", the "next_credit" that is set in __mod_timer is >>> reasonable value(this can be gotten from credit_usec), and the >>> timer would be hit soon. >> Thanks Annie''s express, my option is consistent with Annie. >> > OK, thanks for the explanation. I think I get the idea. > > In any case, could you use !time_in_range / !time_in_range_open instead > of open coded one-liner? Though I presume one-liner open coding would > not hurt.I agree above the suggest. The attachment patch may be better than previous patch. Jason> > Wei._______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Wei Liu
2013-Oct-16 15:17 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote: [...]> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001 > From: Jason Luan <jianhai.luan@oracle.com> > Date: Tue, 15 Oct 2013 17:07:49 +0800 > Subject: [PATCH] xen-netback: pending timer only in the range [expire, > next_credit) > > The function time_after_eq() do correct judge in range of MAX_UNLONG/2. > If net-front send lesser package, the delta between now and next_credit > will out of the range and time_after_eq() will do wrong judge in result > to net-front hung. For example: > expire next_credit .... next_credit+MAX_UNLONG/2 now > -----------------time increases this direction-----------------> > > We should be add the environment which now beyond next_credit+MAX_UNLONG/2. > Because the fact now mustn''t before expire, time_before(now, expire) == true > will show the environment. > time_after_eq(now, next_credit) || time_before (now, expire) > => !time_in_range_open(now, expire, next_credit) >OK, you say !time_in_range_open here but the code below has !time_in_range. I suppose it is a typo? Your patch should go against "net" tree so you need to have "[PATCH net]" as prefix. Also don''t send your patch as attachment, send it inline. You can use git format-patch, git send-email to do this. Could you please fix the above issues and resend the patch to netdev, with Ian and me CC''ed. You can find out how netdev works in Documentation/networking/netdev-FAQ.txt. Last but not least, thanks for your contribution! Wei.> Signed-off-by: Jason Luan <jianhai.luan@oracle.com> > --- > drivers/net/xen-netback/netback.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index f3e591c..62492f0 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1195,7 +1195,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) > return true; > > /* Passed the point where we can replenish credit? */ > - if (time_after_eq(now, next_credit)) { > + if (!time_in_range(now, vif->credit_timeout.expires, next_credit)) { > vif->credit_timeout.expires = now; > tx_add_credit(vif); > } > -- > 1.7.6.5 >
annie li
2013-Oct-16 15:26 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-16 23:04, jianhai luan wrote:> > From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001 > From: Jason Luan<jianhai.luan@oracle.com> > Date: Tue, 15 Oct 2013 17:07:49 +0800 > Subject: [PATCH] xen-netback: pending timer only in the range [expire, > next_credit) > > The function time_after_eq() do correct judge in range of MAX_UNLONG/2. > If net-front send lesser package, the delta between now and next_credit > will out of the range and time_after_eq() will do wrong judge in result > to net-front hung. For example: > expire next_credit .... next_credit+MAX_UNLONG/2 now > -----------------time increases this direction-----------------> > > We should be add the environment which now beyond next_credit+MAX_UNLONG/2. > Because the fact now mustn''t before expire, time_before(now, expire) == true > will show the environment. > time_after_eq(now, next_credit) || time_before (now, expire) > => !time_in_range_open(now, expire, next_credit) > > Signed-off-by: Jason Luan<jianhai.luan@oracle.com> > --- > drivers/net/xen-netback/netback.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index f3e591c..62492f0 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1195,7 +1195,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) > return true; > > /* Passed the point where we can replenish credit? */I think the comments above can be removed, and it is better to explain the if condition here briefly. Thanks Annie> - if (time_after_eq(now, next_credit)) { > + if (!time_in_range(now, vif->credit_timeout.expires, next_credit)) { > vif->credit_timeout.expires = now; > tx_add_credit(vif); > }
David Vrabel
2013-Oct-16 16:11 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 16/10/13 16:17, Wei Liu wrote:> On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote: > [...] >> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001 >> From: Jason Luan <jianhai.luan@oracle.com> >> Date: Tue, 15 Oct 2013 17:07:49 +0800 >> Subject: [PATCH] xen-netback: pending timer only in the range [expire, >> next_credit) >> >> The function time_after_eq() do correct judge in range of MAX_UNLONG/2. >> If net-front send lesser package, the delta between now and next_credit >> will out of the range and time_after_eq() will do wrong judge in result >> to net-front hung. For example: >> expire next_credit .... next_credit+MAX_UNLONG/2 now >> -----------------time increases this direction-----------------> >> >> We should be add the environment which now beyond next_credit+MAX_UNLONG/2. >> Because the fact now mustn''t before expire, time_before(now, expire) == true >> will show the environment. >> time_after_eq(now, next_credit) || time_before (now, expire) >> =>> !time_in_range_open(now, expire, next_credit) >>I would like the description improved because it''s too hard to understand. How about something like: "time_after_eq() only works if the delta is < MAX_ULONG/2. If netfront sends at a very low rate, the time between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for timer_after_eq() will be incorrect. Credit will not be replenished and the guest may become unable to send (e.g., if prior to the long gap, all credit was exhausted)." But that''s as far as I get because I can''t see how the fix is correct. The time_in_range() test might still return the wrong value if now has advanced even further and wrapped so it is between expire and next_credit again. I think the credit timeout should be always armed to expire in MAX_ULONG/4 jiffies (or some other large value). If credit is exceeded, this timer is then adjusted to fire earlier (at next_credit as it does already). David
jianhai luan
2013-Oct-16 16:44 UTC
Re: DomU''s network interface will hung when Dom0 running 32bit
On 2013-10-17 0:11, David Vrabel wrote:> On 16/10/13 16:17, Wei Liu wrote: >> On Wed, Oct 16, 2013 at 11:04:34PM +0800, jianhai luan wrote: >> [...] >>> >From ef02403a10173896c5c102f768741d0700b8a3a2 Mon Sep 17 00:00:00 2001 >>> From: Jason Luan <jianhai.luan@oracle.com> >>> Date: Tue, 15 Oct 2013 17:07:49 +0800 >>> Subject: [PATCH] xen-netback: pending timer only in the range [expire, >>> next_credit) >>> >>> The function time_after_eq() do correct judge in range of MAX_UNLONG/2. >>> If net-front send lesser package, the delta between now and next_credit >>> will out of the range and time_after_eq() will do wrong judge in result >>> to net-front hung. For example: >>> expire next_credit .... next_credit+MAX_UNLONG/2 now >>> -----------------time increases this direction-----------------> >>> >>> We should be add the environment which now beyond next_credit+MAX_UNLONG/2. >>> Because the fact now mustn''t before expire, time_before(now, expire) == true >>> will show the environment. >>> time_after_eq(now, next_credit) || time_before (now, expire) >>> =>>> !time_in_range_open(now, expire, next_credit) >>> > I would like the description improved because it''s too hard to understand. > > How about something like: > > "time_after_eq() only works if the delta is < MAX_ULONG/2. > > If netfront sends at a very low rate, the time between subsequent calls > to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for > timer_after_eq() will be incorrect. Credit will not be replenished and > the guest may become unable to send (e.g., if prior to the long gap, all > credit was exhausted)."Thanks your description, i will accept it. :)> > But that''s as far as I get because I can''t see how the fix is correct. > The time_in_range() test might still return the wrong value if now has > advanced even further and wrapped so it is between expire and > next_credit again.typo, time_in_range() should be time_in_range_open(). Yes, if now have advanced even further and wrapped, it will always fall in [ expire, next_credit). In the range, please think two scenario: * No transmit limit: expire == next_credit, the range will be zero, replenish will always be done. * Transmit limit: Because guest may be consume all credit_bytes in very short time, other time in [expire, next_credit) will don''t send any package. So the time which don''t send package should be think about when we set the rate parameter. So if now fall in the range, the hung time should be acceptable. (if rate=10000M/s, the worse time will be 4s).> > I think the credit timeout should be always armed to expire in > MAX_ULONG/4 jiffies (or some other large value). If credit is exceeded, > this timer is then adjusted to fire earlier (at next_credit as it does > already).Setting timer may be fixed the issue. But i don''t think how to verify the fixed expect waiting 180 days. I verified the above patch only change expire''s value to emulator the scenario.> > DavidJason.