thr3ads.net - Xen users - Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Sylvain Munaut

2012-Aug-30 14:06 UTC

Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Hi,

A bit of explanation of what I''m trying to achieve :
We have a bunch of homogeneous nodes that have CPU + RAM + Storage and
we want to use that as some generic cluster. The idea is to have Xen
on all of these and run Ceph OSD in a domU on each to "export" the
local storage space to the entire cluster. And then use RBD to store /
access VM images from any of the machines.

We did setup a working ceph cluster and RBD works well as long as we
don''t access it from a dom0 that run a VM hosting a OSD.

When attaching a RBD image to a Dom0 that runs a VM hosting an OSD,
things get interesting. It seems works fine when accessed from the
Dom0 itself. But if we try to use that RBD image to boot another domU,
then things go _very_ wrong.

On the Dom0, I see a lot of message in dmesg:

"osd1 192.168.3.70:6802 socket closed" and I mean a _lot_ like
dozens per seconds.

On the DomU running the OSD, I see in the dmesg a bunch of:

"net eth0: rx->offset: 0, size: 4294967295"

And in the OSD log itself I see a lot of :

2012-08-30 13:26:48.683948 7f3be9bea700 1
CephxAuthorizeHandler::verify_authorizer isvalid=1
2012-08-30 13:26:48.684124 7f3be9bea700 0 bad crc in data 1035043868
!= exp 606937680
2012-08-30 13:26:48.684771 7f3be9bea700 0 --
192.168.3.70:6803/1032>> 192.168.3.30:0/4221752725 pipe(0x5ba5c00 sd=40 pgs=0 cs=0l=0).accept peer addr is really 192.168.3.30:0/4221752725 (socket is
192.168.3.30:40477/0)
2012-08-30 13:26:48.686520 7f3be9bea700 1
CephxAuthorizeHandler::verify_authorizer isvalid=1
2012-08-30 13:26:48.686723 7f3be9bea700 0 bad crc in data 1385604259
!= exp 606937680
2012-08-30 13:26:48.687306 7f3be9bea700 0 --
192.168.3.70:6803/1032>> 192.168.3.30:0/4221752725 pipe(0x5ba5200 sd=40 pgs=0 cs=0l=0).accept peer addr is really 192.168.3.30:0/4221752725 (socket is
192.168.3.30:40478/0)

The memory of the OSD keeps very quickly and ends up being killed by
the OOM Killer.

I tried turning off all the offloading options on the virtual network
interfaces like suggested in some old 2010 post from the Xen list, but
without any effect.

So something is going _very_ wrong here .... any suggestions from anyone ?

Note that when using the exact same setup on a dom0 that doesn''t run
any OSD in a domU, it works fine. Also, only the OSD running under
that dom0 is affected, the rest of the cluster is working nicely.

Cheers,

Sylvain

Smart Weblications GmbH - Florian Wiessner

2012-Aug-30 14:14 UTC

head link

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Am 30.08.2012 16:06, schrieb Sylvain Munaut:> Hi,
> 
> 
> So something is going _very_ wrong here .... any suggestions from anyone ?
> 
> Note that when using the exact same setup on a dom0 that doesn''t
run
> any OSD in a domU, it works fine. Also, only the OSD running under
> that dom0 is affected, the rest of the cluster is working nicely.
I would suggest trying kvm instead of xen...

     "net eth0: rx->offset: 0, size: 4294967295"

Seems that there is something wrong with networking code in xen?


-- 

Mit freundlichen Grüßen,

Florian Wiessner

Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila

fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de

--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz

Sylvain Munaut

2012-Aug-30 14:38 UTC

head link

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Hi,

> I would suggest trying kvm instead of xen...
There is a lot of different setup that might work, I''m not looking for
a different setup, I want to fix this one.
We already have a complete running infra using Xen and using an iscsi
NAS as backend for the VM image and Lustre as distributed FS and I
want to replace all that and use ceph instead.

>      "net eth0: rx->offset: 0, size: 4294967295"
>
> Seems that there is something wrong with networking code in xen?
The problem doesn''t appear with Ceph on its own or Xen on its own, so
it''s a weird interaction between them and it could very well be that
the network code of ceph does something unexpected triggering this
behavior in xen. At this point I wouldn''t exclude anything ...

Also even if the root fault lies with Xen, I think that the running
away memory issue triggered inside the OSD by this is worth fixing on
its own. A badly behaving client shouldn''t be able to DoS the OSD so
easily.


Cheers,

    Sylvain

Alex Elder

2012-Aug-30 14:43 UTC

head link

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

On 08/30/2012 09:06 AM, Sylvain Munaut wrote:> Hi,
I posted the following comments on IRC but am putting it here
just so it''s visible along with the original post.  All of the
following is deduced from about 10 minutes of scanning through
unfamiliar code, so I may be way off.  (I''ve edited my the
comments I posted for readability.)

The code is in drivers/net/xen-netfront.c, in the function
xennet_get_responses().

It appears that when pulling a xen_netif_rx_response struct off
of the response ring buffer, it finds one with status == -1.
Furthermore when net_ratelimit() is called it finds that it''s
time to limit the rate--I think because it came back with
negative status 10 times in a row.

That may mean that the poll routine will drop received messages
and log that as a receive error.

I suppose it''s possible after all that something in the ceph
messaging layer is not handling this case properly.

I can''t spend any more time on this at the moment but perhaps
we''ll get some insights from others in the mean time.

					-Alex
> A bit of explanation of what I''m trying to achieve :
> We have a bunch of homogeneous nodes that have CPU + RAM + Storage and
> we want to use that as some generic cluster. The idea is to have Xen
> on all of these and run Ceph OSD in a domU on each to "export"
the
> local storage space to the entire cluster. And then use RBD to store /
> access VM images from any of the machines.
> 
> We did setup a working ceph cluster and RBD works well as long as we
> don''t access it from a dom0 that run a VM hosting a OSD.
> 
> When attaching a RBD image to a Dom0 that runs a VM hosting an OSD,
> things get interesting. It seems works fine when accessed from the
> Dom0 itself. But if we try to use that RBD image to boot another domU,
> then things go _very_ wrong.
> 
> 
> On the Dom0, I see a lot of message in dmesg:
> 
>     "osd1 192.168.3.70:6802 socket closed"  and I mean a _lot_
like
> dozens per seconds.
> 
> 
> On the DomU running the OSD, I see in the dmesg a bunch of:
> 
>      "net eth0: rx->offset: 0, size: 4294967295"
> 
> 
> And in the OSD log itself I see a lot of :
> 
> 2012-08-30 13:26:48.683948 7f3be9bea700  1
> CephxAuthorizeHandler::verify_authorizer isvalid=1
> 2012-08-30 13:26:48.684124 7f3be9bea700  0 bad crc in data 1035043868
> != exp 606937680
> 2012-08-30 13:26:48.684771 7f3be9bea700  0 -- 192.168.3.70:6803/1032
>>> 192.168.3.30:0/4221752725 pipe(0x5ba5c00 sd=40 pgs=0 cs=0
> l=0).accept peer addr is really 192.168.3.30:0/4221752725 (socket is
> 192.168.3.30:40477/0)
> 2012-08-30 13:26:48.686520 7f3be9bea700  1
> CephxAuthorizeHandler::verify_authorizer isvalid=1
> 2012-08-30 13:26:48.686723 7f3be9bea700  0 bad crc in data 1385604259
> != exp 606937680
> 2012-08-30 13:26:48.687306 7f3be9bea700  0 -- 192.168.3.70:6803/1032
>>> 192.168.3.30:0/4221752725 pipe(0x5ba5200 sd=40 pgs=0 cs=0
> l=0).accept peer addr is really 192.168.3.30:0/4221752725 (socket is
> 192.168.3.30:40478/0)
> 
> The memory of the OSD keeps very quickly and ends up being killed by
> the OOM Killer.
> 
> I tried turning off all the offloading options on the virtual network
> interfaces like suggested in some old 2010 post from the Xen list, but
> without any effect.
> 
> 
> So something is going _very_ wrong here .... any suggestions from anyone ?
> 
> Note that when using the exact same setup on a dom0 that doesn''t
run
> any OSD in a domU, it works fine. Also, only the OSD running under
> that dom0 is affected, the rest of the cluster is working nicely.
> 
> 
> Cheers,
> 
>     Sylvain
> --
> To unsubscribe from this list: send the line "unsubscribe
ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

Fajar A. Nugraha

2012-Aug-30 15:04 UTC

head link

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

On Thu, Aug 30, 2012 at 9:43 PM, Alex Elder <elder@inktank.com>
wrote:> On 08/30/2012 09:06 AM, Sylvain Munaut wrote:
>> Hi,
>
> I posted the following comments on IRC but am putting it here
> just so it''s visible along with the original post.  All of the
> following is deduced from about 10 minutes of scanning through
> unfamiliar code, so I may be way off.  (I''ve edited my the
> comments I posted for readability.)
>
> The code is in drivers/net/xen-netfront.c, in the function
> xennet_get_responses().
IIRC that''s the xen PV driver.

A simple way to test whether xen pv net device is at fault is to use a
HVM domU but WITHOUT the pv drivers (there''s a kernel commandline
argument that lets you have both emulated and PV device in the domU),
using the emulated NIC instead (e.g. e1000).

... and further investigation result might be more useful to
@xen-devel instead of @xen-users.

-- 
Fajar

Sylvain Munaut

2012-Sep-03 09:24 UTC

head link

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

On Thu, Aug 30, 2012 at 5:04 PM, Fajar A. Nugraha <list@fajar.net>
wrote:> On Thu, Aug 30, 2012 at 9:43 PM, Alex Elder <elder@inktank.com>
wrote:
>> On 08/30/2012 09:06 AM, Sylvain Munaut wrote:
>>> Hi,
>>
>> I posted the following comments on IRC but am putting it here
>> just so it''s visible along with the original post.  All of the
>> following is deduced from about 10 minutes of scanning through
>> unfamiliar code, so I may be way off.  (I''ve edited my the
>> comments I posted for readability.)
>>
>> The code is in drivers/net/xen-netfront.c, in the function
>> xennet_get_responses().
>
> IIRC that''s the xen PV driver.
I''ve been digging a bit and it seems issuing a

ethtool -K vif1.0 tx off

in the dom0 prevents the issue.

Note that it needs to be in the dom0 on the VIF and not in the domU on
the eth0 interface like I originally tried. It''s also not enough to
disable gso and/or tso, you need to turn off all tx accel.

I will start another subject in xen-devel seems it appears to be a bug
in the PV driver.

Cheers,

    Sylvain

Maybe Matching Threads

Search for more reasonably related threads

Xen users - Aug 2012 - Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Re: Ceph + RBD + Xen: Complete collapse -> Network issue in domU / Bad data for OSD / OOM Kill

Maybe Matching Threads