On Fri, Apr 26, 2013 at 05:21:06AM +0100, Steven Haigh wrote:>* On 26/04/2013 1:36 AM, Wei Liu wrote:* >* > On Thu, Apr 25, 2013 at 4:11 PM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:* >* >> On Thu, Apr 25, 2013 at 12:24:22PM +0100, Steven Haigh wrote:* >* >>> Hi all,* >* >>>* >* >>> I''ve noticed a couple of DomUs have networking freeze with the following* >* >>> getting printed to the Dom0''s /var/log/messages:* >* >>>* >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: Frag is bigger than* >* >>> frame.* >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: fatal error;* >* >>> disabling device* >* >>> Apr 25 12:09:25 hosting kernel: br0: port 5(vif.crc) entered disabled * >* >>> state* >* >>>* >* >>> I thought this was something to do with MAX_SKB_FRAGS - however the* >* >>> kernel I use has this increased to 19 - so in theory I shouldn''t hit* >* >>> this (as far as I know).* >* >>>* >* >>> Are there any other things that could trigger this?* >* >>>* >* >>* >* >> You''re seeing a netfront bug which is fixed in that series. And it is* >* >> not related to MAX_SKB_FRAGS but related to GSO.* >* >>* >* >> Could you try applying my patch set "Bundle fixes for Xen netfront /* >* >> netback" version 7. That series has been applied to DaveM''s net-next.* >* >>* >* >* >* > BTW with that series you should be able to get rid of the* >* > MAX_SKB_FRAGS -> 19 hack.* >* * >* This could be quite difficult. The DomU kernel is RHEL based - and not * >* easily changed without sending the patch upstream to RH - which may or * >* may not apply it.* >* * >* My google-fu has failed a little here - do you have a link to the * >* patches? Is it against Xen or the kernel? Further, is it something that * >* just altering the Dom0 part would resolve?* >* *They are for Linux kernel only. Xen is not involved. To get rid of your MAX_SKB_FRAGS hack, you need to patch Dom0 only. To fix "Frag is bigger than frame", you need to patch DomU. If that''s not possible at the moment, I remember seeing a thread about disabling guest GSO can workaround ths problem. You can give it a shot. Wei.>* -- * >* Steven Haigh* >* * >* Email: netwiz@xxxxxxxxx* >* Web: https://www.crc.id.au* >* Phone: (03) 9001 6090 - 0412 935 897* >* Fax: (03) 8338 0299*Wei, we just hit this bug as well on CentOS 5.9 with kernels 2.6.18-348.4.1.el5 and 2.6.18-348.6.1.el5, however I checked it on all domUs and dom0s GSO is off, only TSO is on. Would TSO still cause this issue? Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Jun 03, 2013 at 05:49:11PM -0700, Alex A wrote:> On Fri, Apr 26, 2013 at 05:21:06AM +0100, Steven Haigh wrote: > >* On 26/04/2013 1:36 AM, Wei Liu wrote:* > >* > On Thu, Apr 25, 2013 at 4:11 PM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:* > >* >> On Thu, Apr 25, 2013 at 12:24:22PM +0100, Steven Haigh wrote:* > >* >>> Hi all,* > >* >>>* > >* >>> I''ve noticed a couple of DomUs have networking freeze with the following* > >* >>> getting printed to the Dom0''s /var/log/messages:* > >* >>>* > >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: Frag is bigger than* > >* >>> frame.* > >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: fatal error;* > >* >>> disabling device* > >* >>> Apr 25 12:09:25 hosting kernel: br0: port 5(vif.crc) entered disabled * > >* >>> state* > >* >>>* > >* >>> I thought this was something to do with MAX_SKB_FRAGS - however the* > >* >>> kernel I use has this increased to 19 - so in theory I shouldn''t hit* > >* >>> this (as far as I know).* > >* >>>* > >* >>> Are there any other things that could trigger this?* > >* >>>* > >* >>* > >* >> You''re seeing a netfront bug which is fixed in that series. And it is* > >* >> not related to MAX_SKB_FRAGS but related to GSO.* > >* >>* > >* >> Could you try applying my patch set "Bundle fixes for Xen netfront /* > >* >> netback" version 7. That series has been applied to DaveM''s net-next.* > >* >>* > >* >* > >* > BTW with that series you should be able to get rid of the* > >* > MAX_SKB_FRAGS -> 19 hack.* > >* * > >* This could be quite difficult. The DomU kernel is RHEL based - and not * > >* easily changed without sending the patch upstream to RH - which may or * > >* may not apply it.* > >* * > >* My google-fu has failed a little here - do you have a link to the * > >* patches? Is it against Xen or the kernel? Further, is it something that * > >* just altering the Dom0 part would resolve?* > >* * > > They are for Linux kernel only. Xen is not involved. > > To get rid of your MAX_SKB_FRAGS hack, you need to patch Dom0 only. > > To fix "Frag is bigger than frame", you need to patch DomU. If that''s > not possible at the moment, I remember seeing a thread about disabling > guest GSO can workaround ths problem. You can give it a shot. > > > Wei. > > >* -- * > >* Steven Haigh* > >* * > >* Email: netwiz@xxxxxxxxx* > >* Web: https://www.crc.id.au* > >* Phone: (03) 9001 6090 - 0412 935 897* > >* Fax: (03) 8338 0299* > > > Wei, > > we just hit this bug as well on CentOS 5.9 with kernels > 2.6.18-348.4.1.el5 and 2.6.18-348.6.1.el5, however I checked it on all > domUs and dom0s GSO is off, only TSO is on. Would TSO still cause this > issue? >I really think the proper thing to do is to fix your backend instead of working around that problem -- the patch is available now and you''re running your customized kernel, right? Wei.> Alex
On Tue, Jun 4, 2013 at 1:44 AM, Wei Liu <wei.liu2@citrix.com> wrote:> On Mon, Jun 03, 2013 at 05:49:11PM -0700, Alex A wrote: > > On Fri, Apr 26, 2013 at 05:21:06AM +0100, Steven Haigh wrote: > > >* On 26/04/2013 1:36 AM, Wei Liu wrote:* > > >* > On Thu, Apr 25, 2013 at 4:11 PM, Wei Liu <wei.liu2@xxxxxxxxxx> > wrote:* > > >* >> On Thu, Apr 25, 2013 at 12:24:22PM +0100, Steven Haigh wrote:* > > >* >>> Hi all,* > > >* >>>* > > >* >>> I''ve noticed a couple of DomUs have networking freeze with the > following* > > >* >>> getting printed to the Dom0''s /var/log/messages:* > > >* >>>* > > >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: Frag is > bigger than* > > >* >>> frame.* > > >* >>> Apr 25 12:09:25 hosting kernel: vif vif-4-0 vif.crc: fatal error;* > > >* >>> disabling device* > > >* >>> Apr 25 12:09:25 hosting kernel: br0: port 5(vif.crc) entered > disabled * > > >* >>> state* > > >* >>>* > > >* >>> I thought this was something to do with MAX_SKB_FRAGS - however > the* > > >* >>> kernel I use has this increased to 19 - so in theory I shouldn''t > hit* > > >* >>> this (as far as I know).* > > >* >>>* > > >* >>> Are there any other things that could trigger this?* > > >* >>>* > > >* >>* > > >* >> You''re seeing a netfront bug which is fixed in that series. And it > is* > > >* >> not related to MAX_SKB_FRAGS but related to GSO.* > > >* >>* > > >* >> Could you try applying my patch set "Bundle fixes for Xen netfront > /* > > >* >> netback" version 7. That series has been applied to DaveM''s > net-next.* > > >* >>* > > >* >* > > >* > BTW with that series you should be able to get rid of the* > > >* > MAX_SKB_FRAGS -> 19 hack.* > > >* * > > >* This could be quite difficult. The DomU kernel is RHEL based - and > not * > > >* easily changed without sending the patch upstream to RH - which may > or * > > >* may not apply it.* > > >* * > > >* My google-fu has failed a little here - do you have a link to the * > > >* patches? Is it against Xen or the kernel? Further, is it something > that * > > >* just altering the Dom0 part would resolve?* > > >* * > > > > They are for Linux kernel only. Xen is not involved. > > > > To get rid of your MAX_SKB_FRAGS hack, you need to patch Dom0 only. > > > > To fix "Frag is bigger than frame", you need to patch DomU. If that''s > > not possible at the moment, I remember seeing a thread about disabling > > guest GSO can workaround ths problem. You can give it a shot. > > > > > > Wei. > > > > >* -- * > > >* Steven Haigh* > > >* * > > >* Email: netwiz@xxxxxxxxx* > > >* Web: https://www.crc.id.au* > > >* Phone: (03) 9001 6090 - 0412 935 897* > > >* Fax: (03) 8338 0299* > > > > > > Wei, > > > > we just hit this bug as well on CentOS 5.9 with kernels > > 2.6.18-348.4.1.el5 and 2.6.18-348.6.1.el5, however I checked it on all > > domUs and dom0s GSO is off, only TSO is on. Would TSO still cause this > > issue? > > > > I really think the proper thing to do is to fix your backend instead of > working around that problem -- the patch is available now and you''re > running your customized kernel, right? > > > Wei. > > > Alex >I completely agree that fixing the backend is the proper thing to do. Do you mean these patches? http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=697089dc13c52d668322ac6cb8548520de27ed0e http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9ecd1a75d977e2e8c48139c7d3efed183f898d94 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=2810e5b9a7731ca5fce22bfbe12c96e16ac44b6f http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=03393fd5cc2b6cdeec32b704ecba64dbb0feae3c If I''m not mistaken aren''t these patches against 2.6.3x kernel or 3.0.x? I''m running 2.6.18, so I would have to port those patches to 2.6.18 base, unless there exist same patches for 2.6.18? Also you are correct, we are running our custom compiled rhel kernels, that are based on rhel source rpms. Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, Jun 04, 2013 at 12:56:35PM -0700, Alex A wrote: [...]> > > > > > we just hit this bug as well on CentOS 5.9 with kernels > > > 2.6.18-348.4.1.el5 and 2.6.18-348.6.1.el5, however I checked it on all > > > domUs and dom0s GSO is off, only TSO is on. Would TSO still cause this > > > issue? > > > > > > > I really think the proper thing to do is to fix your backend instead of > > working around that problem -- the patch is available now and you''re > > running your customized kernel, right? > > > > > > Wei. > > > > > Alex > > > > > I completely agree that fixing the backend is the proper thing to do. Do > you mean these patches? > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=697089dc13c52d668322ac6cb8548520de27ed0e > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9ecd1a75d977e2e8c48139c7d3efed183f898d94 > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=2810e5b9a7731ca5fce22bfbe12c96e16ac44b6f > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=03393fd5cc2b6cdeec32b704ecba64dbb0feae3c > > > If I''m not mistaken aren''t these patches against 2.6.3x kernel or 3.0.x? > I''m running 2.6.18, so I would have to port those patches to 2.6.18 base, > unless there exist same patches for 2.6.18? > Also you are correct, we are running our custom compiled rhel kernels, that > are based on rhel source rpms. >Sorry I don''t understand. Do you mean your Dom0 is 2.6.18? My patches are against 3.10, the backporting is undergoing, however I don''t think they will be backported to 2.6.18. If you''re running 2.6.18 Dom0, presumably who backported XSA-39 will also backport those patches? Jan maintains 2.6.18 tree with minimum required patches applied to fix your problem (Frag bigger than frame), you might want to have a look at the last two patches in tree. http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/ Wei.> Alex
On Tue, Jun 4, 2013 at 1:08 PM, Wei Liu <wei.liu2@citrix.com> wrote:> On Tue, Jun 04, 2013 at 12:56:35PM -0700, Alex A wrote: > [...] > > > > > > > > we just hit this bug as well on CentOS 5.9 with kernels > > > > 2.6.18-348.4.1.el5 and 2.6.18-348.6.1.el5, however I checked it on > all > > > > domUs and dom0s GSO is off, only TSO is on. Would TSO still cause > this > > > > issue? > > > > > > > > > > I really think the proper thing to do is to fix your backend instead of > > > working around that problem -- the patch is available now and you''re > > > running your customized kernel, right? > > > > > > > > > Wei. > > > > > > > Alex > > > > > > > > > I completely agree that fixing the backend is the proper thing to do. Do > > you mean these patches? > > > > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=697089dc13c52d668322ac6cb8548520de27ed0e > > > > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=9ecd1a75d977e2e8c48139c7d3efed183f898d94 > > > > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=2810e5b9a7731ca5fce22bfbe12c96e16ac44b6f > > > > > http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=03393fd5cc2b6cdeec32b704ecba64dbb0feae3c > > > > > > If I''m not mistaken aren''t these patches against 2.6.3x kernel or 3.0.x? > > I''m running 2.6.18, so I would have to port those patches to 2.6.18 base, > > unless there exist same patches for 2.6.18? > > Also you are correct, we are running our custom compiled rhel kernels, > that > > are based on rhel source rpms. > > > > Sorry I don''t understand. Do you mean your Dom0 is 2.6.18? My patches > are against 3.10, the backporting is undergoing, however I don''t think > they will be backported to 2.6.18. If you''re running 2.6.18 Dom0, > presumably who backported XSA-39 will also backport those patches? > > Jan maintains 2.6.18 tree with minimum required patches applied to fix > your problem (Frag bigger than frame), you might want to have a look at > the last two patches in tree. > http://xenbits.xen.org/hg/linux-2.6.18-xen.hg/ > > > Wei. > > > Alex >Yes, my Dom0 is 2.6.18, it''s based on RHEL 5. They''re the ones you backported XSA-39, but they''re not indicating when they will backport your fix, they''ve made the bug private. I looked at Jan''s tree and found those two patches you mentioned, I will create my own patch based on those two and rebuild the kernels. Thank you! Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel