Hi, After the 1 MB domU console buffer fills up, what is the expected behaviour on further output? It clearly does not behave as a ring (under Xen 3.2.1, dom0 2.6.18, domU 2.6.26) as the following shows. On domU with its console disconnected (no xm console running in dom0): # i=0; while :; do printf ''%15d\n'' $i >/dev/console; i=$((i+1)); echo $i; done 0 1 [...] 72317 72318 ^Cbash: printf: write error: Interrupted system call Then on dom0 capturing the output of xm console gives a file of 1048576 (exactly 1 MiB) long that goes like: 158 - 159 | [...] +- 65408 = 65536-128 lines 65564 | 65565 - 72174 - 72175 | [...] | 72284 | 72285 +- 128 lines 72302 | - 72303 | | [...] | + 16 lines 72316 | | 72317 - - (Tested with perl -ne ''print $_ if $_ != $p+1; $p=$_'') That is, the first 158 lines were dropped and the last 128 lines were overwritten. As formatting is such that each line is 16 bytes, it means that the last 2 kiB was sort of following the guest console output, but in a broken way: lines between 72285--72302 were lost. Only the last 16 lines (256 bytes) are really fresh. All of this feels rather fishy. Can anybody provide some insight, please? Is this expected? Is my config broken? Or is this a bug? Why I''m after this: last night all our pvops domUs froze hard with no sign of anything going wrong. On two separate dom0s, within a couple of hours. The two surviving domUs are those which generate almost no console output. (I guess I should keep those buffers emptied, but I found no best practices on that so left the thing unconfigured.) -- Thanks, Feri. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/12/2008 20:14, "Ferenc Wagner" <wferi@niif.hu> wrote:> That is, the first 158 lines were dropped and the last 128 lines were > overwritten. As formatting is such that each line is 16 bytes, it > means that the last 2 kiB was sort of following the guest console > output, but in a broken way: lines between 72285--72302 were lost. > Only the last 16 lines (256 bytes) are really fresh. > > All of this feels rather fishy. Can anybody provide some insight, > please? Is this expected? Is my config broken? Or is this a bug?I think the console daemon tries to discard contiguous chunks of data, rather than odd characters here and there. How effective it really is I''m not sure, but certainly you can expect the discards to be in reasonable-sized chunks and also to be pretty random. Why do you think this has something to do with pv_ops lockups? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser <keir.fraser@eu.citrix.com> writes:> On 29/12/2008 20:14, "Ferenc Wagner" <wferi@niif.hu> wrote: > >> That is, the first 158 lines were dropped and the last 128 lines were >> overwritten. As formatting is such that each line is 16 bytes, it >> means that the last 2 kiB was sort of following the guest console >> output, but in a broken way: lines between 72285--72302 were lost. >> Only the last 16 lines (256 bytes) are really fresh. >> >> All of this feels rather fishy. Can anybody provide some insight, >> please? Is this expected? Is my config broken? Or is this a bug? > > I think the console daemon tries to discard contiguous chunks of > data, rather than odd characters here and there. How effective it > really is I''m not sure, but certainly you can expect the discards to > be in reasonable-sized chunks and also to be pretty random.Huh, now I''m no closer to have an idea about the expected behaviour. What does the console daemon try to achieve? Does the randomness stem from the scheduling irregularity? On which side of the daemon is the 1 MB buffer?> Why do you think this has something to do with pv_ops lockups?That''s just the only trace I can start with. On kernel lockups, I usually look for clues in the console output. Now I found it garbled. Either it is normal and I should look elsewhere, or it is a buffer handling bug, possibly overwriting some memory and causing havoc later. I know that''s a long shot... But even SysRq didn''t work, so I have nothing more to work with. I''m looking at tools/console/daemon/io.c now. -- Thanks, Feri. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29/12/2008 21:54, "Ferenc Wagner" <wferi@niif.hu> wrote:>> I think the console daemon tries to discard contiguous chunks of >> data, rather than odd characters here and there. How effective it >> really is I''m not sure, but certainly you can expect the discards to >> be in reasonable-sized chunks and also to be pretty random. > > Huh, now I''m no closer to have an idea about the expected behaviour. > What does the console daemon try to achieve? Does the randomness stem > from the scheduling irregularity? On which side of the daemon is the > 1 MB buffer?The daemon greedily takes characters from a small ring buffer shared with the guest, and places them in the much bigger 1MB buffer. The guest can expect characters to not sit around in the shared ring, and it is possible that a guest could lock up if that were to happen (although in your example only the user process writing to the console should hang in that [impossible] case). Rather than stopping reading characters when the 1MB buffer fills, the daemon instead discards character sequences from the 1MB buffer.>> Why do you think this has something to do with pv_ops lockups? > > That''s just the only trace I can start with. On kernel lockups, I > usually look for clues in the console output. Now I found it garbled. > Either it is normal and I should look elsewhere, or it is a buffer > handling bug, possibly overwriting some memory and causing havoc > later. I know that''s a long shot... But even SysRq didn''t work, so I > have nothing more to work with. I''m looking at tools/console/daemon/io.c > now.It does sound like a long shot! Did you expect those VMs to be producing a lot of console output? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser <keir.fraser@eu.citrix.com> writes:> On 29/12/2008 21:54, "Ferenc Wagner" <wferi@niif.hu> wrote: > >>> I think the console daemon tries to discard contiguous chunks of >>> data, rather than odd characters here and there. How effective it >>> really is I''m not sure, but certainly you can expect the discards to >>> be in reasonable-sized chunks and also to be pretty random. >> >> Huh, now I''m no closer to have an idea about the expected behaviour. >> What does the console daemon try to achieve? Does the randomness stem >> from the scheduling irregularity? On which side of the daemon is the >> 1 MB buffer? > > The daemon greedily takes characters from a small ring buffer shared with > the guest, and places them in the much bigger 1MB buffer. The guest can > expect characters to not sit around in the shared ring, and it is possible > that a guest could lock up if that were to happen (although in your example > only the user process writing to the console should hang in that > [impossible] case).Does your above "hang" mean a total lockup or rather a delay until the dom0 schedules xenconsoled to remove those characters from the shared ring? By the example I could probe the console buffer behaviour only, I never managed to freeze the guest. In real life user processes practically never write to the guest console, it''s dominated by iptables (kernel packet filter) logs.> Rather than stopping reading characters when the 1MB buffer fills, > the daemon instead discards character sequences from the 1MB buffer.>From the test it looks like the daemon discards from the end of thebuffer, but then why are the first 158 lines missing? (As a side note, wouldn''t it be more useful if the big buffer was a ring as well, discarding input at the beginning?)>>> Why do you think this has something to do with pv_ops lockups? >> >> That''s just the only trace I can start with. On kernel lockups, I >> usually look for clues in the console output. Now I found it garbled. >> Either it is normal and I should look elsewhere, or it is a buffer >> handling bug, possibly overwriting some memory and causing havoc >> later. I know that''s a long shot... But even SysRq didn''t work, so I >> have nothing more to work with. I''m looking at tools/console/daemon/io.c >> now. > > It does sound like a long shot! Did you expect those VMs to be producing a > lot of console output?They definitely filled up their 1 MB buffers in a couple of days. Usually they produce about 100 characters per minute, possibly with some peaks if they get scanned, but I wouldn''t call this "a lot". Now your input reminded me of something else. Dom0s show considerable latency. They run heartbeat to form a HA pair, and hearbeat complains regularly. To quote some recent extreme value: WARN: Gmain_timeout_dispatch: Dispatch function for send local status took too long to execute: 320 ms (> 50 ms) (GSource: 0x811b930) And heartbeat even runs on locked pages! So other processes probably experience longer delays, which could cause problem if Xen is sensitive to such things (but there''s no significant swapping on the dom0s). Btw. the above line was logged when 3 VMs were locked up (constantly running from the dom0 POV, doing nothing otherwise). Also, dom0 loses characters on its serial ports (console and heartbeat medium). This is probably a manifestation of this latency, isn''t it? Anyway, I''ll look at this some more tomorrow. I wonder why xenconsoled on a machine running lots of VMs isn''t much bigger (as VSZ in ps output) than on another running nothing but dom0. Isn''t this 1 MB/VM allocated by xenconsoled? -- Thanks, Feri. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/12/2008 01:47, "Ferenc Wagner" <wferi@niif.hu> wrote:> Anyway, I''ll look at this some more tomorrow. I wonder why > xenconsoled on a machine running lots of VMs isn''t much bigger (as VSZ > in ps output) than on another running nothing but dom0. Isn''t this > 1 MB/VM allocated by xenconsoled?I''m not sure if merely malloc()ing 1MB adds that to VSZ, or if the memory also has to be written to, which it may not have entirely been for every domain. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel