Hi Ian, with the upstream netback introduction consisting of a single big commit I wonder whether you could point me to where the full history of it is. I''m asking particularly in the context of us being asked to add a safety check similar to the vif->netbk != NULL one at the beginning of xenvif_start_xmit() (also in xenvif_interrupt(), but there we have a similar check in place), which hadn''t been in the legacy tree (obviously) nor in the original multiple-tasklets patch that I retained a copy of. That is, I''d like to understand the reasons for this check as it seems wrong to me to have to do it there - I''d rather think that if an interface got disabled, execution shouldn''t even reach that function anymore. One thing I wonder about in this context is whether the netif_stop_queue() call from xenvif_close() shouldn''t happen before xenvif_down() (not the least for reasons of symmetry with xenvif_open()). Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote:> Hi Ian, > > with the upstream netback introduction consisting of a single big commit > I wonder whether you could point me to where the full history of it is.Yeah, that was a bit annoying, luckily I had the foresight to post where the history was. http://www.gossamer-threads.com/lists/xen/devel/202139 says it is: git://xenbits.xen.org/people/ianc/linux-2.6.git upstream/dom0/backend/netback-history> I''m asking particularly in the context of us being asked to add a safety > check similar to the vif->netbk != NULL one at the beginning of > xenvif_start_xmit() (also in xenvif_interrupt(), but there we have a > similar check in place), which hadn''t been in the legacy tree (obviously) > nor in the original multiple-tasklets patch that I retained a copy of.Seems to have come from bc05ada1283eb583c9789c27429af36b034c4a74 in that history tree and was a conversion from a check for group == -1. That commit changes from storing a group index to a group pointer so I think it''s roughly equivalent from a validity point of view. The original group == -1 check appears to be in the 2.6.32.x pvops kernels at least, I expect it is also in the multiple tasklet patch which you have as well?> That > is, I''d like to understand the reasons for this check as it seems wrong > to me to have to do it there - I''d rather think that if an interface got > disabled, execution shouldn''t even reach that function anymore.Yes, I''d think that too but I don''t really know. Maybe Dongxiao remembers why he added it?> One thing I wonder about in this context is whether the > netif_stop_queue() call from xenvif_close() shouldn''t happen before > xenvif_down() (not the least for reasons of symmetry with > xenvif_open()).I seem to recall looking at that too, it was the same in the old kernels too and I didn''t know why so I avoided touching it (I was doing too much other cleanup at the time to risk it). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: >> with the upstream netback introduction consisting of a single big commit >> I wonder whether you could point me to where the full history of it is. > > Yeah, that was a bit annoying, luckily I had the foresight to post where > the history was. http://www.gossamer-threads.com/lists/xen/devel/202139 > says it is: > git://xenbits.xen.org/people/ianc/linux-2.6.git > upstream/dom0/backend/netback-historyDoes this have a http:// representation somewhere (it doesn''t show up under http://xenbits.xen.org/people/ianc/)?>> I''m asking particularly in the context of us being asked to add a safety >> check similar to the vif->netbk != NULL one at the beginning of >> xenvif_start_xmit() (also in xenvif_interrupt(), but there we have a >> similar check in place), which hadn''t been in the legacy tree (obviously) >> nor in the original multiple-tasklets patch that I retained a copy of. > > Seems to have come from bc05ada1283eb583c9789c27429af36b034c4a74 in that > history tree and was a conversion from a check for group == -1. That > commit changes from storing a group index to a group pointer so I think > it''s roughly equivalent from a validity point of view. > > The original group == -1 check appears to be in the 2.6.32.x pvops > kernels at least, I expect it is also in the multiple tasklet patch > which you have as well?That''s the point - we don''t.>> One thing I wonder about in this context is whether the >> netif_stop_queue() call from xenvif_close() shouldn''t happen before >> xenvif_down() (not the least for reasons of symmetry with >> xenvif_open()). > > I seem to recall looking at that too, it was the same in the old kernels > too and I didn''t know why so I avoided touching it (I was doing too much > other cleanup at the time to risk it).Understandable. Thanks for the really quick response, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: >> One thing I wonder about in this context is whether the >> netif_stop_queue() call from xenvif_close() shouldn''t happen before >> xenvif_down() (not the least for reasons of symmetry with >> xenvif_open()). > > I seem to recall looking at that too, it was the same in the old kernels > too and I didn''t know why so I avoided touching it (I was doing too much > other cleanup at the time to risk it).After looking further I don''t think that would help (though I still think it would be more correct), as netif_stop_queue() basically evaluates to a set_bit() without any other locks taken. So it''s completely asynchronous wrt dev_hard_start_xmit() (and its callers, which are the ones looking at the bit with HARD_TX_LOCK() carried out). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 20.09.11 at 14:10, "Jan Beulich" <JBeulich@suse.com> wrote: >>>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: >> On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: >>> One thing I wonder about in this context is whether the >>> netif_stop_queue() call from xenvif_close() shouldn''t happen before >>> xenvif_down() (not the least for reasons of symmetry with >>> xenvif_open()). >> >> I seem to recall looking at that too, it was the same in the old kernels >> too and I didn''t know why so I avoided touching it (I was doing too much >> other cleanup at the time to risk it). > > After looking further I don''t think that would help (though I still think > it would be more correct), as netif_stop_queue() basically evaluates > to a set_bit() without any other locks taken. So it''s completely > asynchronous wrt dev_hard_start_xmit() (and its callers, which are > the ones looking at the bit with HARD_TX_LOCK() carried out).Which in turn suggests that the upstream driver isn''t safe either: There''s nothing preventing vif->netbk from becoming NULL between the early check in xenvif_start_xmit() and its actual use in xen_netbk_queue_tx_skb(). I think vif->netbk needs to be latched into a local variable, checked against NULL, and passed instead of vif to xen_netbk_queue_tx_skb(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2011-09-20 at 12:46 +0100, Jan Beulich wrote:> >>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > > On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: > >> with the upstream netback introduction consisting of a single big commit > >> I wonder whether you could point me to where the full history of it is. > > > > Yeah, that was a bit annoying, luckily I had the foresight to post where > > the history was. http://www.gossamer-threads.com/lists/xen/devel/202139 > > says it is: > > git://xenbits.xen.org/people/ianc/linux-2.6.git > > upstream/dom0/backend/netback-history > > Does this have a http:// representation somewhere (it doesn''t show up > under http://xenbits.xen.org/people/ianc/)?http://xenbits.xen.org/gitweb/?p=people/ianc/linux-2.6.git;a=summary> > >> I''m asking particularly in the context of us being asked to add a safety > >> check similar to the vif->netbk != NULL one at the beginning of > >> xenvif_start_xmit() (also in xenvif_interrupt(), but there we have a > >> similar check in place), which hadn''t been in the legacy tree (obviously) > >> nor in the original multiple-tasklets patch that I retained a copy of. > > > > Seems to have come from bc05ada1283eb583c9789c27429af36b034c4a74 in that > > history tree and was a conversion from a check for group == -1. That > > commit changes from storing a group index to a group pointer so I think > > it''s roughly equivalent from a validity point of view. > > > > The original group == -1 check appears to be in the 2.6.32.x pvops > > kernels at least, I expect it is also in the multiple tasklet patch > > which you have as well? > > That''s the point - we don''t.Wierd, it is in 020ba9067e121b720a3335521698ea9cf31f6166 in Jeremy''s xen/2.6.32-stable branch which is the original "xen/netback: Multiple tasklets support." commit. Did you pick up an earlier posting? I found the original postings v2: http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01578.html v3: http://www.gossamer-threads.com/lists/xen/devel/170582 v4: http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00140.html>From the looks things the check arrived in v3 but there is no comment onthe patch saying why, nor does any review I found of v2 give a hint. I had a bit of a trawl through various list archives of the other postings of the series but didn''t spot anything.> >> One thing I wonder about in this context is whether the > >> netif_stop_queue() call from xenvif_close() shouldn''t happen before > >> xenvif_down() (not the least for reasons of symmetry with > >> xenvif_open()). > > > > I seem to recall looking at that too, it was the same in the old kernels > > too and I didn''t know why so I avoided touching it (I was doing too much > > other cleanup at the time to risk it). > > Understandable. > > Thanks for the really quick response, > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 20.09.11 at 14:40, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Tue, 2011-09-20 at 12:46 +0100, Jan Beulich wrote: >> >>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: >> > On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: >> >> with the upstream netback introduction consisting of a single big commit >> >> I wonder whether you could point me to where the full history of it is. >> > >> > Yeah, that was a bit annoying, luckily I had the foresight to post where >> > the history was. http://www.gossamer-threads.com/lists/xen/devel/202139 >> > says it is: >> > git://xenbits.xen.org/people/ianc/linux-2.6.git >> > upstream/dom0/backend/netback-history >> >> Does this have a http:// representation somewhere (it doesn''t show up >> under http://xenbits.xen.org/people/ianc/)? > > http://xenbits.xen.org/gitweb/?p=people/ianc/linux-2.6.git;a=summary > >> >> >> I''m asking particularly in the context of us being asked to add a safety >> >> check similar to the vif->netbk != NULL one at the beginning of >> >> xenvif_start_xmit() (also in xenvif_interrupt(), but there we have a >> >> similar check in place), which hadn''t been in the legacy tree (obviously) >> >> nor in the original multiple-tasklets patch that I retained a copy of. >> > >> > Seems to have come from bc05ada1283eb583c9789c27429af36b034c4a74 in that >> > history tree and was a conversion from a check for group == -1. That >> > commit changes from storing a group index to a group pointer so I think >> > it''s roughly equivalent from a validity point of view. >> > >> > The original group == -1 check appears to be in the 2.6.32.x pvops >> > kernels at least, I expect it is also in the multiple tasklet patch >> > which you have as well? >> >> That''s the point - we don''t. > > Wierd, it is in 020ba9067e121b720a3335521698ea9cf31f6166 in Jeremy''s > xen/2.6.32-stable branch which is the original "xen/netback: Multiple > tasklets support." commit. Did you pick up an earlier posting? > > I found the original postings > v2: http://lists.xensource.com/archives/html/xen-devel/2010-04/msg01578.html > v3: http://www.gossamer-threads.com/lists/xen/devel/170582 > v4: http://lists.xensource.com/archives/html/xen-devel/2010-05/msg00140.html > > From the looks things the check arrived in v3 but there is no comment on > the patch saying why, nor does any review I found of v2 give a hint. I > had a bit of a trawl through various list archives of the other postings > of the series but didn''t spot anything.Seems like my original is actually v1, and I may not have picked up that later change precisely because it wasn''t mentioned in the description and I didn''t look closely enough at the changes plus I had done quite a bit of other cleanup on v1 already and hence didn''t want to start over. v3 is also where the similar check in netif_be_int() appears, and I know for sure we had to add this due to another bug report, not due to the change there. In any case, thanks a lot for helping with this! Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2011-09-20 at 13:38 +0100, Jan Beulich wrote:> >>> On 20.09.11 at 14:10, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>> On 20.09.11 at 13:26, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > >> On Tue, 2011-09-20 at 12:04 +0100, Jan Beulich wrote: > >>> One thing I wonder about in this context is whether the > >>> netif_stop_queue() call from xenvif_close() shouldn''t happen before > >>> xenvif_down() (not the least for reasons of symmetry with > >>> xenvif_open()). > >> > >> I seem to recall looking at that too, it was the same in the old kernels > >> too and I didn''t know why so I avoided touching it (I was doing too much > >> other cleanup at the time to risk it). > > > > After looking further I don''t think that would help (though I still think > > it would be more correct), as netif_stop_queue() basically evaluates > > to a set_bit() without any other locks taken. So it''s completely > > asynchronous wrt dev_hard_start_xmit() (and its callers, which are > > the ones looking at the bit with HARD_TX_LOCK() carried out). > > Which in turn suggests that the upstream driver isn''t safe either: > There''s nothing preventing vif->netbk from becoming NULL between > the early check in xenvif_start_xmit() and its actual use in > xen_netbk_queue_tx_skb(). I think vif->netbk needs to be latched > into a local variable, checked against NULL, and passed instead of > vif to xen_netbk_queue_tx_skb().The xmit function is called with txq->_xmit_lock held. I think we should be calling netif_tx_disable at some point before making vif->netbk == NULL. Need to figure out where (or if we are already doing it indirectly) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel