Glauber de Oliveira Costa
2006-Dec-19 15:37 UTC
[Xen-devel] [PATCH] Unmatched decrementing of net device reference count
Hello, This bug was found when heavy stressing the netfront attach/detach mechanism with the following script: for i in $(seq 200); do xm network-attach <domid>; xm network-detach <domid> $i; done Guest kernel shows the following messages: unregister_netdevice: waiting for eth1 to become free. Usage count = -1 After this patch, it ran okay in multiple iterations -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Dec-19 23:58 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
Glauber de Oliveira Costa <gcosta@redhat.com> wrote:> > This bug was found when heavy stressing the netfront > attach/detach mechanism with the following script: > > for i in $(seq 200); > do > xm network-attach <domid>; > xm network-detach <domid> $i; > done > > Guest kernel shows the following messages: > > unregister_netdevice: waiting for eth1 to become free. Usage count = -1 > > After this patch, it ran okay in multiple iterationsCould you please use in-line patches? It''s much easier to comment on. Your patch description doesn''t make sense. unregister_netdev() cannot possibly cause the device to be freed. Otherwise the subsequent free_netdev() call which you kept would be wrong. So most likely what''s happening is that free_netdev() is occuring without a preceding unregister_netdev(), which implies that there is a bug in the frontend state transition. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Dec-20 13:21 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
On Wed, Dec 20, 2006 at 10:58:22AM +1100, Herbert Xu wrote:> Glauber de Oliveira Costa <gcosta@redhat.com> wrote: > > > > This bug was found when heavy stressing the netfront > > attach/detach mechanism with the following script: > > > > for i in $(seq 200); > > do > > xm network-attach <domid>; > > xm network-detach <domid> $i; > > done > > > > Guest kernel shows the following messages: > > > > unregister_netdevice: waiting for eth1 to become free. Usage count = -1 > > > > After this patch, it ran okay in multiple iterations > > Could you please use in-line patches? It''s much easier to comment on.It is. I could swear I inlined it, but maybe I forgot.> Your patch description doesn''t make sense. unregister_netdev() > cannot possibly cause the device to be freed. Otherwise the > subsequent free_netdev() call which you kept would be wrong.In fact. I read it again, and it was confusing (I myself was confused). I''ll try to rephrase: ( I digged more, cleared things up, and it''ll be more precise now) unregister_netdev() works as a barrier in this case. The call to netif_disconnect_backend() introduces a new carrier watch, which hold()s a reference to be put()''d in a future time. If we call free right after that, it might be the case that put() is called after free. Nothing in this case prevents this memory region to have been allocated again to another device. unregister_netdev() holds the rntl lock. It means that when the lock is released, netdev_run_todo() (which is setup by unregister_netdev() itself, with net_set_todo() ), will call netdev_wait_allrefs(), which takes care of the linkwatch_runqueue. Calling unregister_netdev() between the carrier watch and free_netdev() guarantees that the device will be only free''d when the watches were already handled. There would most probably be other ways to guarantee that, such as, calling linkwatch_runqueue() directly. But I think that we lose nothing by calling unregister_netdev() in the middle, and gain serialization for free.> So most likely what''s happening is that free_netdev() is occuring > without a preceding unregister_netdev(), which implies that there > is a bug in the frontend state transition.It is not the case, see above. -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Dec-21 08:49 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
On Wed, Dec 20, 2006 at 11:21:51AM -0200, Glauber de Oliveira Costa wrote:> > unregister_netdev() works as a barrier in this case. The call to > netif_disconnect_backend() introduces a new carrier watch, which hold()s a > reference to be put()''d in a future time. If we call free right after that, > it might be the case that put() is called after free. Nothing in this > case prevents this memory region to have been allocated again to another > device.Thanks for the explanation. I understand the problem now. However, I think your patch isn''t adequate because the closing of the backend no longer shuts down the transmitter in the frontend. Looking at this again it comes down to an asymmetry in the setup and tear-down processes. On startup, we have two stages: 1) netfront_probe => create_netdev => open_netdev => register_netdev; 2) network_connect => sets up IRQ/ring buffer/etc. On tear-down, things occur in the wrong order: 1) netfront_closing => close_netdev => unregister_netdev; 2) netfront_remove => kills IRQ/ring buffer and free_netdev. The tear-down order should be the opposite of the setup, i.e., 1) netfront_closing => kills IRQ/ring buffer; 2) netfront_remove => close_netdev => unregister_netdev => free_netdev. So I suggest we move the netif_disconnct_backend call to netfront_closing and close_netdev to netfront_remove. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Dec-21 11:03 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
On 21/12/06 08:49, "Herbert Xu" <herbert.xu@redhat.com> wrote:> Thanks for the explanation. I understand the problem now. However, > I think your patch isn''t adequate because the closing of the backend > no longer shuts down the transmitter in the frontend.Changeset 13100:e99ba0c6c is what I checked in to fix this issue. I think it goes far enough, but please do take a look and check. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Dec-21 12:40 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
On Thu, Dec 21, 2006 at 11:03:50AM +0000, Keir Fraser wrote:> On 21/12/06 08:49, "Herbert Xu" <herbert.xu@redhat.com> wrote: > > > Thanks for the explanation. I understand the problem now. However, > > I think your patch isn''t adequate because the closing of the backend > > no longer shuts down the transmitter in the frontend. > > Changeset 13100:e99ba0c6c is what I checked in to fix this issue. I think it > goes far enough, but please do take a look and check.Yes it''s OK for now because the frontend doesn''t hold any resources from the backend. So even if the frontend tries to xmit after the backend starts closing, it won''t cause any problems. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Glauber de Oliveira Costa
2006-Dec-21 13:02 UTC
Re: [Xen-devel] [PATCH] Unmatched decrementing of net device reference count
On Thu, Dec 21, 2006 at 11:40:24PM +1100, Herbert Xu wrote:> On Thu, Dec 21, 2006 at 11:03:50AM +0000, Keir Fraser wrote: > > On 21/12/06 08:49, "Herbert Xu" <herbert.xu@redhat.com> wrote: > > > > > Thanks for the explanation. I understand the problem now. However, > > > I think your patch isn''t adequate because the closing of the backend > > > no longer shuts down the transmitter in the frontend. > > > > Changeset 13100:e99ba0c6c is what I checked in to fix this issue. I think it > > goes far enough, but please do take a look and check. > > Yes it''s OK for now because the frontend doesn''t hold any resources > from the backend. So even if the frontend tries to xmit after the > backend starts closing, it won''t cause any problems.It indeed looks sane, and passes my tests. Thanks! -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel