Jacek Konieczny
2007-Jun-01 14:05 UTC
[Xen-devel] netfront: rx->offset: 12, size: 4294967295
Hello, I have sent the following message to xen-users last week and got no reply. I am quite desperate (I have been compilling Xen and Linux kernels, different versions with different options, all the week), maybe someone here can help me. I still have problems with network in DomU, as I previously described here: http://lists.xensource.com/archives/html/xen-users/2007-02/msg00888.html The symptoms are: - no packets received in domU (`ip -s link show` shows only errors) - messages in dmesg: netfront: rx->offset: 12, size: 4294967295 netfront: rx->offset: 12, size: 4294967295 (one line for any frame received) I got rid of the problem for a moment when upgrading one of my test systems to Xen 3.1.0. I have only upgraded dom0 kernel -- the problem disappeared. When I set up full Xen 3.1.0 system (hypervisor and both dom0 and domU kernels are Xen 3.1.0 now) I got the problem again. In the system I also have two old virtual machines, with some old domU kernel. I am not sure for which xen version it was, but I guess this should help: # zgrep XEN.*INTERFACE /proc/config.gz CONFIG_XEN_INTERFACE_VERSION=0x00030101 And in that old domU network works. When starting the same virtual machine (configuration and images) with the new kernel I get the errors again. I recompiled the kernel again and again, with different options (of course, I am not able to test all possible configurations) and still no go. I am using vanilla 2.6.18 kernel patched with output of Xen''s ''make mkpatches'' and with squashfs (it touches nothing but the filesystem). Any idea what may be wrong? What else may I try? Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-01 14:22 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On 1/6/07 15:05, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> I recompiled the kernel again and again, with different options (of > course, I am not able to test all possible configurations) and still no > go. I am using vanilla 2.6.18 kernel patched with output of Xen''s > ''make mkpatches'' and with squashfs (it touches nothing but the > filesystem). > > Any idea what may be wrong? What else may I try?Any weird options on your dom0 or domU command lines? Netback is sending error responses to your domUs, hence the messages you get from netfront in domU. Have you tried a debug build of Xen, and also redefine the DPRINTK() macro in netback.c to printk(KERN_ALERT)? This will very likely get you some tracing per packet about what is going wrong. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-01 16:56 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Fri, Jun 01, 2007 at 03:22:03PM +0100, Keir Fraser wrote:> On 1/6/07 15:05, "Jacek Konieczny" <jajcus@jajcus.net> wrote: > > > I recompiled the kernel again and again, with different options (of > > course, I am not able to test all possible configurations) and still no > > go. I am using vanilla 2.6.18 kernel patched with output of Xen''s > > ''make mkpatches'' and with squashfs (it touches nothing but the > > filesystem). > > > > Any idea what may be wrong? What else may I try? > > Any weird options on your dom0 or domU command lines?Nothing, just root and console.> Netback is sending error responses to your domUs, hence the messages you get > from netfront in domU. Have you tried a debug build of Xen, and also > redefine the DPRINTK() macro in netback.c to printk(KERN_ALERT)? This will > very likely get you some tracing per packet about what is going wrong.dom0 dmesg: device vif3.0 entered promiscuous mode xenbr0: port 5(vif3.0) entering learning state xenbr0: topology change detected, propagating xenbr0: port 5(vif3.0) entering forwarding state blkback: ring-ref 8, event-channel 6, protocol 1 (unspecified, assuming native) blkback: ring-ref 9, event-channel 7, protocol 1 (unspecified, assuming native) blkback: ring-ref 10, event-channel 8, protocol 1 (unspecified, assuming native) Bad status -1 from grant transfer to DOM3 Bad status -1 from grant transfer to DOM3 Bad status -1 from grant transfer to DOM3 Bad status -1 from grant transfer to DOM3 xm dmesg: (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001e9) or is dying (0) (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001ea) or is dying (0) (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001eb) or is dying (0) This is generated by xen 3.0.5-rc1 binary from the "Xen 3.1.0 32-bit PAE" package. Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-01 17:11 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Fri, Jun 01, 2007 at 06:56:46PM +0200, Jacek Konieczny wrote:> xm dmesg: > > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001e9) or is dying (0) > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001ea) or is dying (0) > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation headroom (40960,40960) or provided a bad grant ref (000001eb) or is dying (0) > > This is generated by xen 3.0.5-rc1 binary from the "Xen 3.1.0 32-bit > PAE" package.When booted my own 3.1.0 xen build there is nothing special in "xm dmesg", not even the "Transferee has no reservation" seen in the 3.0.5-rc1 output. The messages in domU and dom0 kernels dmesgs are still there. Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-01 17:38 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
If your dom0 and domU kernel are both from 3.1.0 then you should not see ''grant transfer'' messages. The transfer mechanism was replaced by copying several Xen versions ago. So I think your dom0 and domU kernels are actually of dubious origin. Are you sure you really are running the kernels that you think you are? -- Keir On 1/6/07 17:56, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> Bad status -1 from grant transfer to DOM3 > Bad status -1 from grant transfer to DOM3 > Bad status -1 from grant transfer to DOM3 > Bad status -1 from grant transfer to DOM3 > > xm dmesg: > > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation > headroom (40960,40960) or provided a bad grant ref (000001e9) or is dying (0) > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation > headroom (40960,40960) or provided a bad grant ref (000001ea) or is dying (0) > (XEN) grant_table.c:877:d0 gnttab_transfer: Transferee has no reservation > headroom (40960,40960) or provided a bad grant ref (000001eb) or is dying (0)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-01 19:01 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Fri, Jun 01, 2007 at 06:38:23PM +0100, Keir Fraser wrote:> If your dom0 and domU kernel are both from 3.1.0 then you should not see > ''grant transfer'' messages. The transfer mechanism was replaced by copying > several Xen versions ago.The transfer is the same thing as ''page flip''?> So I think your dom0 and domU kernels are actually > of dubious origin. Are you sure you really are running the kernels that you > think you are?How can I check what Xen version the kernel supports? The kernels in both dom0 and domU are, according to `uname -a`, both built today. Both 2.6.18 (Xen 3.0.4 was for 2.6.16, I have not do extra patching to apply older xen to newer kerenel). DomU kernel has old xen compatibility options enabled (currently 3.0.4 and later): # CONFIG_XEN_COMPAT_030002_AND_LATER is not set CONFIG_XEN_COMPAT_030004_AND_LATER=y # CONFIG_XEN_COMPAT_LATEST_ONLY is not set CONFIG_XEN_COMPAT=0x030004 Dom0 also has compatibility options enabled (3.0.2 and later, as copied from xen-provided configs): CONFIG_XEN_COMPAT_030002_AND_LATER=y # CONFIG_XEN_COMPAT_030004_AND_LATER is not set # CONFIG_XEN_COMPAT_LATEST_ONLY is not set CONFIG_XEN_COMPAT=0x030002 Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-02 10:13 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On 1/6/07 20:01, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> On Fri, Jun 01, 2007 at 06:38:23PM +0100, Keir Fraser wrote: >> If your dom0 and domU kernel are both from 3.1.0 then you should not see >> ''grant transfer'' messages. The transfer mechanism was replaced by copying >> several Xen versions ago. > > The transfer is the same thing as ''page flip''?Yes, that''s right.>> So I think your dom0 and domU kernels are actually >> of dubious origin. Are you sure you really are running the kernels that you >> think you are? > > How can I check what Xen version the kernel supports? > > The kernels in both dom0 and domU are, according to `uname -a`, both > built today. Both 2.6.18 (Xen 3.0.4 was for 2.6.16, I have not do extra > patching to apply older xen to newer kerenel).Well, that should certainly be new enough. When domU boots it should give a message like ''device eth0 has <foo> receive path'', where <foo> is copying or flipping. Which mode does your domU claim to be using? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-02 18:49 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Sat, Jun 02, 2007 at 11:13:49AM +0100, Keir Fraser wrote:> Well, that should certainly be new enough. When domU boots it should give a > message like ''device eth0 has <foo> receive path'', where <foo> is copying or > flipping. Which mode does your domU claim to be using?copying Forcing it to copying (or flipping) with xennet module option doesn''t change anything. Maybe domU is right, but dom0 tries to use flipping because of some error? Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-03 08:33 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On 2/6/07 19:49, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> On Sat, Jun 02, 2007 at 11:13:49AM +0100, Keir Fraser wrote: >> Well, that should certainly be new enough. When domU boots it should give a >> message like ''device eth0 has <foo> receive path'', where <foo> is copying or >> flipping. Which mode does your domU claim to be using? > > copying > > Forcing it to copying (or flipping) with xennet module option doesn''t > change anything. Maybe domU is right, but dom0 tries to use flipping > because of some error?Well, if that were the only problem then it''s surprising that forcing flip mode does not fix the bug. On the other hand, flip mode isn''t much tested these days. In dom0, what output do you get from ''xenstore-ls /local/domain/<domid>''? (where <domid> is the domain id of your domU). There should be a line for a node called request-rx-copy, with a value of 1. This is how domU declares to netback that it wants to use copying mode. If you''re happy modifying C code a bit, a good place to add some printk() tracing would be connect_rings() in drivers/xen/netback/xenbus.c. It''s that function which should read the request-rx-copy node and decide whether it is going to use copying or flipping mode. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-03 11:51 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Sun, Jun 03, 2007 at 09:33:59AM +0100, Keir Fraser wrote:> In dom0, what output do you get from ''xenstore-ls /local/domain/<domid>''? > (where <domid> is the domain id of your domU). There should be a line for a > node called request-rx-copy, with a value of 1. This is how domU declares to > netback that it wants to use copying mode.# xenstore-ls /local/domain/1 | grep request- request-rx-copy = "1"> If you''re happy modifying C code a bit, a good place to add some printk() > tracing would be connect_rings() in drivers/xen/netback/xenbus.c. It''s that > function which should read the request-rx-copy node and decide whether it is > going to use copying or flipping mode.I will try that on Monday thanks. Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-04 12:40 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Sun, Jun 03, 2007 at 09:33:59AM +0100, Keir Fraser wrote:> If you''re happy modifying C code a bit, a good place to add some printk() > tracing would be connect_rings() in drivers/xen/netback/xenbus.c. It''s that > function which should read the request-rx-copy node and decide whether it is > going to use copying or flipping mode.Ok, got it. It seems like a bug (feature?) of my compiler (gcc 4.2.0 and several earlier snapshots of gcc 4.2) or maybe some race condition (is it possible? anything else writes the netif structure?) The problem occurs in line 377 of drivers/xen/netback/xenbus.c: be->netif->copying_receiver = !!rx_copy; rx_copy is 1, but copying_receiver is set/left 0 after this instruction. Using "= rx_copy?1:0" didn''t work either. I changed it like this: diff -dur -x ''*~'' linux-2.6.18.orig/drivers/xen/netback/xenbus.c linux-2.6.18/drivers/xen/netback/xenbus.c --- linux-2.6.18.orig/drivers/xen/netback/xenbus.c 2007-06-04 07:48:40.000000000 +0000 +++ linux-2.6.18/drivers/xen/netback/xenbus.c 2007-06-04 11:40:22.000000000 +0000 @@ -374,8 +377,16 @@ dev->otherend); return err; } - be->netif->copying_receiver = !!rx_copy; - + if (rx_copy) { + DPRINTK("rx_copy=%u, setting copying_receiver to -1", rx_copy); + be->netif->copying_receiver = -1; + } + else { + DPRINTK("rx_copy=%u, setting copying_receiver to 0", rx_copy); + be->netif->copying_receiver = 0; + } + DPRINTK("be->netif->copying_receiver = %i", (int)be->netif->copying_receiver); + if (be->netif->dev->tx_queue_len != 0) { if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-notify", "%d", &val) < 0) ... and the problem is gone. Strange, but good enough for me. Thank you for your hints. Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-04 12:52 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On 4/6/07 13:40, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> Ok, got it. It seems like a bug (feature?) of my compiler (gcc 4.2.0 and > several earlier snapshots of gcc 4.2) or maybe some race condition (is > it possible? anything else writes the netif structure?) > > The problem occurs in line 377 of drivers/xen/netback/xenbus.c: > > be->netif->copying_receiver = !!rx_copy;My suspicion is that, strictly speaking, the assignment of 1 to copying_receiver is invalid because a single-bit bitfield can only hold the values -1 and 0. Older gcc perhaps mapped 1 to -1, but 4.2.0 is mapping 1 to 0 (or choosing to do that as an optimisation, since it has the choice, and hence can simplify the code to always write zero in this case). That sucks. Can you try changing the definition of copying_receiver in netback/common.h to be ''unsigned copying_receiver:1'' instead of int:1? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jacek Konieczny
2007-Jun-04 14:05 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On Mon, Jun 04, 2007 at 01:52:30PM +0100, Keir Fraser wrote:> Can you try changing the definition of copying_receiver in netback/common.h > to be ''unsigned copying_receiver:1'' instead of int:1?It works! See the attached patch (I have changed the other bitfield too). Thank you very much. Greets, Jacek _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Jun-04 14:08 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
On 4/6/07 15:05, "Jacek Konieczny" <jajcus@jajcus.net> wrote:> On Mon, Jun 04, 2007 at 01:52:30PM +0100, Keir Fraser wrote: >> Can you try changing the definition of copying_receiver in netback/common.h >> to be ''unsigned copying_receiver:1'' instead of int:1? > > It works! See the attached patch (I have changed the other bitfield > too).Great. Thanks for tracking this down! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2007-Jun-05 21:48 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
Keir Fraser <keir@xensource.com> wrote:> > My suspicion is that, strictly speaking, the assignment of 1 to > copying_receiver is invalid because a single-bit bitfield can only hold the > values -1 and 0. Older gcc perhaps mapped 1 to -1, but 4.2.0 is mapping 1 to > 0 (or choosing to do that as an optimisation, since it has the choice, and > hence can simplify the code to always write zero in this case). That sucks.That''s why we always use unsigned for bitfields in Linux. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Jun-05 21:57 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
> > My suspicion is that, strictly speaking, the assignment of 1 to > > copying_receiver is invalid because a single-bit bitfield can only hold > > the values -1 and 0. Older gcc perhaps mapped 1 to -1, but 4.2.0 is > > mapping 1 to 0 (or choosing to do that as an optimisation, since it has > > the choice, and hence can simplify the code to always write zero in this > > case). That sucks. > > That''s why we always use unsigned for bitfields in Linux.I found a load of bitfields that we ought to consider unsigned-ifying when I ran sparse over the tree. I guess I should try and get a patch sent out proactively, maybe that''ll head off similar problems in future. Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2007-Jun-06 00:30 UTC
Re: [Xen-devel] netfront: rx->offset: 12, size: 4294967295
> On 4/6/07 15:05, "Jacek Konieczny" <jajcus@jajcus.net> wrote: > > On Mon, Jun 04, 2007 at 01:52:30PM +0100, Keir Fraser wrote: > >> Can you try changing the definition of copying_receiver in > >> netback/common.h to be ''unsigned copying_receiver:1'' instead of int:1? > > > > It works! See the attached patch (I have changed the other bitfield > > too). > > Great. Thanks for tracking this down! > > -- KeirThere''s also a bitfield that needs changing in netback.c - patch attached. It''s not necessarily causing us any problems at the moment, but it''d be nice to to be bitten unexpectedly by these again :-) Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel