I think I''ve hit a weird and mostly hidden bug in Xen, but I''m
not 100%
sure...
Here''s the setup - I have a OpenSuSE 11.2 based Dom0 (Xen 3.4.1). Dom0
is also acting as a router / firewall and it provides WAN connectivity
for DomU''s by means of IPSEC (OpenSwan). I use
''bridged'' networking for
DomU''s, there are several NIC''s as each DomU belongs to a
separate
subnet. Dom0''s bridge interfaces have an IP also belonging to
respective
subnet and this IP is used as a gateway for the subnet.
DomU''s are also OpenSuSE 11.2. I use ''cfengine'' to
centrally manage most
of the configuration and (custom) software distribution.
That''s where things go south - when I run cfengine''s
''cfagent'', it runs
and it works up to a point where it just hangs. I can interrupt it with
''CTRL-C'' or I can wait till it timeout''s (socket
timeout). Initially I
thought it''s cfengine''s problem, but then I noticed that a
similar thing
happens when I connect to a DomU with SSH and run ''ls -lR /'' -
it goes
through some directories but eventually it just stalls (and I have to
disconnect the SSH session to ''get out'').
Everytime such a ''hang'' happens I see some OpenSwan / ipsec
errors on Dom0:
klips_error:ipsec_xmit_encap_once: tried to skb_put 20, 16
available. This should never happen, please report.
The numbers vary somewhat (sometimes it''s 21, 17 instead 20,16).
I posted all my ''findings'' on OpenSwam mailing list thinking
it might be
an OpenSwan issue, but one of the developers said it doesn''t look like
''their'' issue and that I should talk to ''Xen
guys''. Here is the relevant
part of his reply:
>
> Yeah, this does not seem to be an openswan bug. The code in question is:
> (one instance of it):
>
> /* Set the data pointer */
> skb_reserve(n,skb->data-skb->head+headroom);
> /* Set the tail pointer and length */
> if(skb_tailroom(n) < skb->len) {
> printk(KERN_WARNING "klips_error:skb_copy_expand:
"
> "tried to skb_put %ld, %d available. This
> should never happen, please report.\n",
> (unsigned long int)skb->len,
> skb_tailroom(n));
> ipsec_kfree_skb(n);
> return NULL;
> }
>
> I would check with the xen people to see what might be going on.
So here I am, asking the ''Xen guys''.
Does anyone have any idea what might be going on?
Regards, Danilo
_______________________________________________
Xen-community mailing list
Xen-community@lists.xensource.com
http://lists.xensource.com/mailman/listinfo/xen-community