thr3ads.net - Xen devel - [Xen-devel] domU console buffer behaviour [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Ferenc Wagner

2008-Dec-29 20:14 UTC

[Xen-devel] domU console buffer behaviour

Hi,

After the 1 MB domU console buffer fills up, what is the expected
behaviour on further output?  It clearly does not behave as a ring
(under Xen 3.2.1, dom0 2.6.18, domU 2.6.26) as the following shows.

On domU with its console disconnected (no xm console running in dom0):

# i=0; while :; do printf ''%15d\n'' $i >/dev/console;
i=$((i+1)); echo $i; done
0
1
[...]
72317
72318
^Cbash: printf: write error: Interrupted system call

Then on dom0 capturing the output of xm console gives a file of
1048576 (exactly 1 MiB) long that goes like:

            158  -
            159   |
[...]             +- 65408 = 65536-128 lines
          65564   |
          65565  -
          72174  -
          72175   |
[...]             |
          72284   |
          72285   +- 128 lines
          72302   | -
          72303   |  |
[...]             |  + 16 lines
          72316   |  |
          72317  -  -

(Tested with perl -ne ''print $_ if $_ != $p+1; $p=$_'')

That is, the first 158 lines were dropped and the last 128 lines were
overwritten.  As formatting is such that each line is 16 bytes, it
means that the last 2 kiB was sort of following the guest console
output, but in a broken way: lines between 72285--72302 were lost.
Only the last 16 lines (256 bytes) are really fresh.

All of this feels rather fishy.  Can anybody provide some insight,
please?  Is this expected?  Is my config broken?  Or is this a bug?

Why I''m after this: last night all our pvops domUs froze hard with no
sign of anything going wrong.  On two separate dom0s, within a couple
of hours.  The two surviving domUs are those which generate almost no
console output.  (I guess I should keep those buffers emptied, but I
found no best practices on that so left the thing unconfigured.)
-- 
Thanks,
Feri.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Dec-29 21:00 UTC

head link

Re: [Xen-devel] domU console buffer behaviour

On 29/12/2008 20:14, "Ferenc Wagner" <wferi@niif.hu> wrote:
> That is, the first 158 lines were dropped and the last 128 lines were
> overwritten.  As formatting is such that each line is 16 bytes, it
> means that the last 2 kiB was sort of following the guest console
> output, but in a broken way: lines between 72285--72302 were lost.
> Only the last 16 lines (256 bytes) are really fresh.
> 
> All of this feels rather fishy.  Can anybody provide some insight,
> please?  Is this expected?  Is my config broken?  Or is this a bug?
I think the console daemon tries to discard contiguous chunks of data,
rather than odd characters here and there. How effective it really is
I''m
not sure, but certainly you can expect the discards to be in
reasonable-sized chunks and also to be pretty random. Why do you think this
has something to do with pv_ops lockups?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ferenc Wagner

2008-Dec-29 21:54 UTC

head link

[Xen-devel] Re: domU console buffer behaviour

Keir Fraser <keir.fraser@eu.citrix.com> writes:
> On 29/12/2008 20:14, "Ferenc Wagner" <wferi@niif.hu> wrote:
>
>> That is, the first 158 lines were dropped and the last 128 lines were
>> overwritten.  As formatting is such that each line is 16 bytes, it
>> means that the last 2 kiB was sort of following the guest console
>> output, but in a broken way: lines between 72285--72302 were lost.
>> Only the last 16 lines (256 bytes) are really fresh.
>> 
>> All of this feels rather fishy.  Can anybody provide some insight,
>> please?  Is this expected?  Is my config broken?  Or is this a bug?
>
> I think the console daemon tries to discard contiguous chunks of
> data, rather than odd characters here and there. How effective it
> really is I''m not sure, but certainly you can expect the discards
to
> be in reasonable-sized chunks and also to be pretty random.
Huh, now I''m no closer to have an idea about the expected behaviour.
What does the console daemon try to achieve?  Does the randomness stem
from the scheduling irregularity?  On which side of the daemon is the
1 MB buffer?
> Why do you think this has something to do with pv_ops lockups?
That''s just the only trace I can start with.  On kernel lockups, I
usually look for clues in the console output.  Now I found it garbled.
Either it is normal and I should look elsewhere, or it is a buffer
handling bug, possibly overwriting some memory and causing havoc
later.  I know that''s a long shot...  But even SysRq didn''t
work, so I
have nothing more to work with.  I''m looking at
tools/console/daemon/io.c
now.
-- 
Thanks,
Feri.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Dec-29 22:01 UTC

head link

Re: [Xen-devel] Re: domU console buffer behaviour

On 29/12/2008 21:54, "Ferenc Wagner" <wferi@niif.hu> wrote:
>> I think the console daemon tries to discard contiguous chunks of
>> data, rather than odd characters here and there. How effective it
>> really is I''m not sure, but certainly you can expect the
discards to
>> be in reasonable-sized chunks and also to be pretty random.
> 
> Huh, now I''m no closer to have an idea about the expected
behaviour.
> What does the console daemon try to achieve?  Does the randomness stem
> from the scheduling irregularity?  On which side of the daemon is the
> 1 MB buffer?
The daemon greedily takes characters from a small ring buffer shared with
the guest, and places them in the much bigger 1MB buffer. The guest can
expect characters to not sit around in the shared ring, and it is possible
that a guest could lock up if that were to happen (although in your example
only the user process writing to the console should hang in that
[impossible] case). Rather than stopping reading characters when the 1MB
buffer fills, the daemon instead discards character sequences from the 1MB
buffer.
>> Why do you think this has something to do with pv_ops lockups?
> 
> That''s just the only trace I can start with.  On kernel lockups, I
> usually look for clues in the console output.  Now I found it garbled.
> Either it is normal and I should look elsewhere, or it is a buffer
> handling bug, possibly overwriting some memory and causing havoc
> later.  I know that''s a long shot...  But even SysRq
didn''t work, so I
> have nothing more to work with.  I''m looking at
tools/console/daemon/io.c
> now.
It does sound like a long shot! Did you expect those VMs to be producing a
lot of console output?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ferenc Wagner

2008-Dec-30 01:47 UTC

head link

Re: [Xen-devel] Re: domU console buffer behaviour

Keir Fraser <keir.fraser@eu.citrix.com> writes:
> On 29/12/2008 21:54, "Ferenc Wagner" <wferi@niif.hu> wrote:
>
>>> I think the console daemon tries to discard contiguous chunks of
>>> data, rather than odd characters here and there. How effective it
>>> really is I''m not sure, but certainly you can expect the
discards to
>>> be in reasonable-sized chunks and also to be pretty random.
>> 
>> Huh, now I''m no closer to have an idea about the expected
behaviour.
>> What does the console daemon try to achieve?  Does the randomness stem
>> from the scheduling irregularity?  On which side of the daemon is the
>> 1 MB buffer?
>
> The daemon greedily takes characters from a small ring buffer shared with
> the guest, and places them in the much bigger 1MB buffer. The guest can
> expect characters to not sit around in the shared ring, and it is possible
> that a guest could lock up if that were to happen (although in your example
> only the user process writing to the console should hang in that
> [impossible] case).
Does your above "hang" mean a total lockup or rather a delay until the
dom0 schedules xenconsoled to remove those characters from the shared
ring?

By the example I could probe the console buffer behaviour only, I
never managed to freeze the guest.  In real life user processes
practically never write to the guest console, it''s dominated by
iptables (kernel packet filter) logs.
> Rather than stopping reading characters when the 1MB buffer fills,
> the daemon instead discards character sequences from the 1MB buffer.
>From the test it looks like the daemon discards from the end of thebuffer, but then why are the first 158 lines missing?

(As a side note, wouldn''t it be more useful if the big buffer was a
ring as well, discarding input at the beginning?)
>>> Why do you think this has something to do with pv_ops lockups?
>> 
>> That''s just the only trace I can start with.  On kernel
lockups, I
>> usually look for clues in the console output.  Now I found it garbled.
>> Either it is normal and I should look elsewhere, or it is a buffer
>> handling bug, possibly overwriting some memory and causing havoc
>> later.  I know that''s a long shot...  But even SysRq
didn''t work, so I
>> have nothing more to work with.  I''m looking at
tools/console/daemon/io.c
>> now.
>
> It does sound like a long shot! Did you expect those VMs to be producing a
> lot of console output?
They definitely filled up their 1 MB buffers in a couple of days.
Usually they produce about 100 characters per minute, possibly with
some peaks if they get scanned, but I wouldn''t call this "a
lot".

Now your input reminded me of something else.

Dom0s show considerable latency.  They run heartbeat to form a HA pair,
and hearbeat complains regularly.  To quote some recent extreme value:

    WARN: Gmain_timeout_dispatch: Dispatch function for send local status
    took too long to execute: 320 ms (> 50 ms) (GSource: 0x811b930)

And heartbeat even runs on locked pages!  So other processes probably
experience longer delays, which could cause problem if Xen is
sensitive to such things (but there''s no significant swapping on the
dom0s).  Btw. the above line was logged when 3 VMs were locked up
(constantly running from the dom0 POV, doing nothing otherwise).

Also, dom0 loses characters on its serial ports (console and heartbeat
medium).  This is probably a manifestation of this latency, isn''t it?

Anyway, I''ll look at this some more tomorrow.  I wonder why
xenconsoled on a machine running lots of VMs isn''t much bigger (as VSZ
in ps output) than on another running nothing but dom0.  Isn''t this
1 MB/VM allocated by xenconsoled?
-- 
Thanks,
Feri.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Dec-30 09:00 UTC

head link

Re: [Xen-devel] Re: domU console buffer behaviour

On 30/12/2008 01:47, "Ferenc Wagner" <wferi@niif.hu> wrote:
> Anyway, I''ll look at this some more tomorrow.  I wonder why
> xenconsoled on a machine running lots of VMs isn''t much bigger (as
VSZ
> in ps output) than on another running nothing but dom0.  Isn''t
this
> 1 MB/VM allocated by xenconsoled?
I''m not sure if merely malloc()ing 1MB adds that to VSZ, or if the
memory
also has to be written to, which it may not have entirely been for every
domain.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2008 - domU console buffer behaviour

[Xen-devel] domU console buffer behaviour

Re: [Xen-devel] domU console buffer behaviour

[Xen-devel] Re: domU console buffer behaviour

Re: [Xen-devel] Re: domU console buffer behaviour

Re: [Xen-devel] Re: domU console buffer behaviour

Re: [Xen-devel] Re: domU console buffer behaviour