Hello! I''m currently running some domUs on some dom0s. Everything is Debian Lenny which comes with Xen-3.2.1 and kernel 2.6.26. The domUs are running without any problems. Now I want to use live migration - which does not really work with Debian Lenny as dom0. Pulled xen-unstable c/s 20346 with linux-2.6-pvops (2.6.31.4) and installed it on an extra box. But: it is not possible to start any of the known-to-run-very-well domUs on it. The domUs use a iSCSI based root file system. They connect to the iSCSI-target, start to boot - but at some (random) point during boot, they loose the iSCSI connection with a timeout and therefore stop running. WHY DOES THE BEHAVIOR OF A domU CHANGE WHEN THE dom0 IS EXCHANGED? Virtualization is IMHO exactly the abstraction layer which should hide such the dependency to any underlaying system. Yes - it is reproducible: started the domUs hundreds of times on Xen-3.2.1 boxes without any problems; tried to start the domUs on the xen-unstable system tens of times always with problem described above. I''m really stuck with this. Any help, remark, question, idea is welcome! Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Oct-22 11:01 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On Thu, Oct 22, 2009 at 09:47:17AM +0200, Andreas Florath wrote:> Hello! > > I''m currently running some domUs on some dom0s. Everything is Debian > Lenny which comes with Xen-3.2.1 and kernel 2.6.26. The domUs are > running without any problems. > Now I want to use live migration - which does not really work with > Debian Lenny as dom0. Pulled xen-unstable c/s 20346 with > linux-2.6-pvops (2.6.31.4) and installed it on an extra box.Uh oh. Is there some reason why you didn''t get the latest stable Xen release? which is Xen 3.4.1. Also on production you might want to run linux-2.6.18-xen.hg on dom0, or then the OpenSUSE forward-ported (non pv_ops) dom0 patches for 2.6.31.4. See: http://wiki.xensource.com/xenwiki/XenDom0Kernels> But: it is not possible to start any of the known-to-run-very-well > domUs on it. The domUs use a iSCSI based root file system. They > connect to the iSCSI-target, start to boot - but at some (random) > point during boot, they loose the iSCSI connection with a timeout and > therefore stop running.Maybe you should try with LVM or file-backed domU disks first.> WHY DOES THE BEHAVIOR OF A domU CHANGE WHEN THE dom0 IS EXCHANGED?Because there''s something clearly wrong; the iSCSI connection shouldn''t drop. Debug that and solve the problem.> Virtualization is IMHO exactly the abstraction layer which should hide > such the dependency to any underlaying system.Exactly. You also need to have a working dom0 for that.> Yes - it is reproducible: started the domUs hundreds of times on > Xen-3.2.1 boxes without any problems; tried to start the domUs on the > xen-unstable system tens of times always with problem described above. > > I''m really stuck with this. Any help, remark, question, idea is welcome! >-- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-22 11:55 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! Thanks a lot for your ideas! Zitat von Pasi Kärkkäinen <pasik@iki.fi>:> > Is there some reason why you didn''t get the latest stable Xen release? > which is Xen 3.4.1.Yes - I need some features from xen-unstable.> Also on production you might want to run linux-2.6.18-xen.hg on dom0, or > then the OpenSUSE forward-ported (non pv_ops) dom0 patches for 2.6.31.4. > > See: http://wiki.xensource.com/xenwiki/XenDom0KernelsThanks for the hint - I''ll also try with other dom0 kernels.> Maybe you should try with LVM or file-backed domU disks first.In the meantime I run some tests with file-backed domUs - which all work fine.>> WHY DOES THE BEHAVIOR OF A domU CHANGE WHEN THE dom0 IS EXCHANGED? > > Because there''s something clearly wrong; the iSCSI connection shouldn''t > drop. > > Debug that and solve the problem.This is what I''m currently trying ;-) What I recognized is, that the domU on xen-unstable is somewhat slower than on the working xen-3.2.1 system. Had a closer look at the boot messages and found one difference: xen-3.2.1 system: [ 0.000000] Xen reported: 2793.000 MHz processor. xen-unstable: [ 0.000000] Xen reported: 1000.000 MHz processor. Except from /proc/cpuinfo of the dom0 (both versions): model name : Intel(R) Pentium(R) 4 CPU 2.80GHz cpu MHz : 2793.002 I''m completely unsure if this has something to do with the problem - but at least it can explain something. Can anybody please tell me, if this is a bug or a feature - or maybe even a configuration parameter? Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-22 21:08 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On 10/22/09 04:55, Andreas Florath wrote:> What I recognized is, that the domU on xen-unstable is somewhat slower > than on the working xen-3.2.1 system. Had a closer look at the boot > messages and found one difference: > > xen-3.2.1 system: > [ 0.000000] Xen reported: 2793.000 MHz processor. > > xen-unstable: > [ 0.000000] Xen reported: 1000.000 MHz processor. > > Except from /proc/cpuinfo of the dom0 (both versions): > model name : Intel(R) Pentium(R) 4 CPU 2.80GHz > cpu MHz : 2793.002 > > I''m completely unsure if this has something to do with the problem - > but at least it can explain something. > > Can anybody please tell me, if this is a bug or a feature - or maybe > even a configuration parameter?xen-unstable currently always traps and emulates rdtsc, so that tsc appears to be running at 1GHz. Set "tsc_native = 1" in your config file to return to the old behaviour. It shouldn''t have a dramatic effect on your domain''s performance (perhaps a few percent improvement). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-23 11:28 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! Thanks for your hint - but it does not change anything for my problem switching this on or off. Spend some more time on investigations: My current assumption is, that there is some problem with the network driver (which would explain the iSCSI timeout errors). My test-setup: xen-unstable, dom0 with 2.6.31.4 pvops kernel and one domU running Debian Lenny (based on 2.6.26). In the domU a ''nuttcp -S'' is running. Output from a dom0 shell: # nuttcp -T 10 -t 192.168.84.31 ; date ; nuttcp -T 10 -r 192.168.84.31 ; date 1344.3750 MB / 10.06 sec = 1120.9618 Mbps 1 %TX 17 %RX Fri Oct 23 13:00:40 CEST 2009 ^C *** transfer interrupted *** ^C Fri Oct 23 13:02:52 CEST 2009 The first run transfers data from the dom0 to the domU - which works. The other way (transferring data from domU to dom0) never stops (I pressed Ctrl-C after about two minutes). [It''s getting ever more curious: when I run the ''-r'' version with a ''strace'' it runs fine - with a speed about 400Mbps. So when I have a closer look at the problem, the problem vanishes - looks a bit like Schrödinger''s cat ;-).] I run another test with sftp transferring data from domU to dom0 - which gives me (when I''m lucky) about 20kB/s - with a lot ''stalled'' in between. Were there some changes in the backend network device which could explain this behavior (e.g. rate limiting, additional features, ...)? Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-23 16:01 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
A short update: Error message in the domU: netfront: rx->offset: 0, size: 4294967295 (comes over and over...) The iSCSI error message is: [ 30.649610] connection4:0: ping timeout of 5 secs expired, last rx 4294897458, last ping 4294898708, now 4294899958 [ 30.649651] connection4:0: detected conn error (1011) Typically after a while the following stack trace appears: [ 520.766002] BUG: soft lockup - CPU#0 stuck for 61s! [vgchange:1145] [ 520.766002] Modules linked in: psmouse usbhid hid ff_memless uhci_hcd ohci_hcd ehci_hcd usbkbd usbcore ext3 jbd mbcache raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod crc32c libcrc32c iscsi_tcp libiscsi scsi_transport_iscsi sd_mod scsi_mod [ 520.766002] [ 520.766002] Pid: 1145, comm: vgchange Not tainted (2.6.26-2-xen-686 #1) [ 520.766002] EIP: 0061:[<c01013a7>] EFLAGS: 00000246 CPU: 0 [ 520.766002] EIP is at 0xc01013a7 [ 520.766002] EAX: 00000000 EBX: deadbeef ECX: deadbeef EDX: 000005bb [ 520.766002] ESI: 00000103 EDI: c30c7e68 EBP: c32b89bc ESP: c30c7e54 [ 520.766002] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 [ 520.766002] CR0: 8005003b CR2: 080bf1f1 CR3: 032ce000 CR4: 00000664 [ 520.766002] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 520.766002] DR6: ffff0ff0 DR7: 00000400 [ 520.766002] [<c023cd5e>] ? xen_poll_irq+0x57/0x65 [ 520.766002] [<c023f547>] ? xen_spin_wait+0xcb/0xff [ 520.766002] [<c02cb9df>] ? _spin_lock+0x31/0x38 [ 520.766002] [<d10faabe>] ? start_this_handle+0x247/0x397 [jbd] [ 520.766002] [<d10fac8b>] ? journal_start+0x7d/0xa9 [jbd] [ 520.766002] [<d1130fa0>] ? ext3_dirty_inode+0x21/0x63 [ext3] [ 520.766002] [<c018a4c5>] ? __mark_inode_dirty+0x21/0x148 [ 520.766002] [<c0181d01>] ? touch_atime+0xc7/0xce [ 520.766002] [<c014e142>] ? generic_file_mmap+0x2a/0x3e [ 520.766002] [<c01617d3>] ? mmap_region+0x1c7/0x392 [ 520.766002] [<c0161cbc>] ? do_mmap_pgoff+0x243/0x296 [ 520.766002] [<c0107312>] ? sys_mmap2+0x86/0xa0 [ 520.766002] [<c0103f76>] ? syscall_call+0x7/0xb [ 520.766002] [<c02c0000>] ? pci_bus_size_bridges+0x1ae/0x3a0 Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-27 14:44 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! It seams that the root cause for this is a bug in the kernel 2.6.31.4 pv_ops. Tried a bisection and ended at version d2af313fcf20275d4008725f32f1974bde06ec9e, which is good, i.e. works without the described problems and 9cf89da8cb207818c029975f261ea672addbb801 which is bad. There are some other problems in version 0148adde28789aa72f6210add9ac34625309535b which does even not boot, so it''s somewhat tricky to do a bisection when there is more than one problem :-( If it helps to find the bug, I can just try some other versions in between by random. Note that the merge after the version which works for me (3dd81018a392941fcc722ee521de344527481eb8) changed things in the network layer. Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-27 18:29 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On 10/27/09 07:44, Andreas Florath wrote:> It seams that the root cause for this is a bug in the kernel 2.6.31.4 > pv_ops. Tried a bisection and ended at version > d2af313fcf20275d4008725f32f1974bde06ec9e, which is good, i.e. works > without the described problems and > 9cf89da8cb207818c029975f261ea672addbb801 which is bad.Thanks for investigating this.> There are some other problems in version > 0148adde28789aa72f6210add9ac34625309535b which does even not boot, so > it''s somewhat tricky to do a bisection when there is more than one > problem :-(Unfortunately bisecting on the pvops tree is a bit awkward because individual topic branches don''t necessarily boot on their own.> If it helps to find the bug, I can just try some other versions in > between by random. > > Note that the merge after the version which works for me > (3dd81018a392941fcc722ee521de344527481eb8) changed things in the > network layer.Yes, that''s the netchannel2 merge. Do you have it enabled in your config? If it is not enabled then it definitely shouldn''t have any effect on your system. If it is enabled then it could be buggy because it is very new code. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-29 11:14 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! The yesterdays patch fixed the tcp performance problem. Run some tests: Setup: - 1- dom0 kernel 2.6.26 hardware A - 2- dom0 kernel 2.6.26 hardware B - 3- domU kernel 2.6.26 running on -1- - 4- domU kernel 2.6.26 running on -2- - 5- dom0 kernel 2.6.31.4 current pv_ops hardware C - 6- dom0 kernel 2.6.31.4 current pv_ops hardware D - 7- domU kernel 2.6.26 running on -5- - 8- domU kernel 2.6.26 running on -6- All hardware boxes are exactly the same hardware: P4 2,8GHz with 1G RAM and Gigabit Ethernet nuttcp Results (average of three independent tests without any other load): -> / <- - 1- <-> - 2- : 255 Mbps / 219 Mbps - 1- <-> - 3- : 52 Mbps / 75 Mbps - 1- <-> - 4- : 183 Mbps / 146 Mbps - 3- <-> - 4- : 289 Mbps / 271 Mbps - 5- <-> - 6- : 693 Mbps / 675 Mbps - 5- <-> - 7- : 1045 Mbps / 1227 Mbps - 5- <-> - 8- : 397 Mbps / 668 Mbps - 7- <-> - 8- : 385 Mbps / 365 Mbps So the speed increased.> > Yes, that''s the netchannel2 merge. Do you have it enabled in your > config? If it is not enabled then it definitely shouldn''t have any > effect on your system. If it is enabled then it could be buggy because > it is very new code.netlink2 seams enabled (is set in the kernel config). Looks that this is done by default. The only remaining issue are now the millions of log messages: netfront: rx->offset: 0, size: 4294967295 which occur only on the 2.6.31.4 pv_ops systems - and, what I think at the moment, only for iSCSI traffic. Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Oct-29 22:11 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On 10/29/09 04:14, Andreas Florath wrote:> Hello! > > The yesterdays patch fixed the tcp performance problem. > > Run some tests: > > Setup: > > - 1- dom0 kernel 2.6.26 hardware A > - 2- dom0 kernel 2.6.26 hardware B > - 3- domU kernel 2.6.26 running on -1- > - 4- domU kernel 2.6.26 running on -2- > > - 5- dom0 kernel 2.6.31.4 current pv_ops hardware C > - 6- dom0 kernel 2.6.31.4 current pv_ops hardware D > - 7- domU kernel 2.6.26 running on -5- > - 8- domU kernel 2.6.26 running on -6- > > All hardware boxes are exactly the same hardware: > P4 2,8GHz with 1G RAM and Gigabit Ethernet > > nuttcp > > Results (average of three independent tests without > any other load): > > -> / <- > > - 1- <-> - 2- : 255 Mbps / 219 Mbps > - 1- <-> - 3- : 52 Mbps / 75 Mbps > - 1- <-> - 4- : 183 Mbps / 146 Mbps > - 3- <-> - 4- : 289 Mbps / 271 Mbps > > - 5- <-> - 6- : 693 Mbps / 675 Mbps > - 5- <-> - 7- : 1045 Mbps / 1227 Mbps > - 5- <-> - 8- : 397 Mbps / 668 Mbps > - 7- <-> - 8- : 385 Mbps / 365 Mbps > > So the speed increased. > >> >> Yes, that''s the netchannel2 merge. Do you have it enabled in your >> config? If it is not enabled then it definitely shouldn''t have any >> effect on your system. If it is enabled then it could be buggy because >> it is very new code. > netlink2 seams enabled (is set in the kernel config). Looks that this > is done by default.That''s an error; it definitely shouldn''t be on by default. What happens if you disable it?> The only remaining issue are now the millions of log messages: > netfront: rx->offset: 0, size: 4294967295 > which occur only on the 2.6.31.4 pv_ops systems - and, what > I think at the moment, only for iSCSI traffic.If this is still present when you disable netchannel2 then I''ll revert the merge. Either way, Steven, could you look at this? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Oct-30 09:24 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello!> That''s an error; it definitely shouldn''t be on by default. What > happens if you disable it?Updated to: commit 7ffdaf2a53889c55d76fa83b6fdc45daa2552927 Merge: 7b927d6... b9a24b7... Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Thu Oct 29 15:10:35 2009 -0700 Merge branch ''xen/netchannel2'' into xen/master * xen/netchannel2: xen/netchannel2: don''t enable by default Checked the kernel config: netchannel2 is now disabled.> >> The only remaining issue are now the millions of log messages: >> netfront: rx->offset: 0, size: 4294967295 >> which occur only on the 2.6.31.4 pv_ops systems - and, what >> I think at the moment, only for iSCSI traffic. > > If this is still present when you disable netchannel2 then I''ll revert > the merge.It''s still there - even with netchannel2 disabled. Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2009-Oct-30 11:11 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
> >> The only remaining issue are now the millions of log messages: > >> netfront: rx->offset: 0, size: 4294967295 > >> which occur only on the 2.6.31.4 pv_ops systems - and, what > >> I think at the moment, only for iSCSI traffic. > > > > If this is still present when you disable netchannel2 then I''ll revert > > the merge. > It''s still there - even with netchannel2 disabled.Does the attached patch help at all? Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
xen@flonatel.org
2009-Nov-02 08:59 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! The patch helps to eliminate the ''rx->offset'' warnings / errors. But at the moment I have the impression, that the iSCSI setup is not as stable as it was. Before going into more detail here, I''ll reinstall everything from scratch and send you later the results. Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Florath
2009-Nov-03 10:29 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
Hello! Looks good for me: reinstalled everything with patched version - iSCSI functionality and performance is ok. No errors / warnings about ''netfront: rx->offset'' any more. Thanks! Kind regards - Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-03 22:52 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On 11/03/09 02:29, Andreas Florath wrote:> Hello! > > Looks good for me: reinstalled everything with patched version - iSCSI > functionality and performance is ok. No errors / warnings about > ''netfront: rx->offset'' any more.OK, good to know, thanks. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2009-Nov-04 09:04 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
> Looks good for me: reinstalled everything with patched version - iSCSI > functionality and performance is ok. No errors / warnings about > ''netfront: rx->offset'' any more.Great. Thanks for testing this. Could you apply the patch to your tree, please, Jeremy? Sorry for making a mess. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-04 21:04 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
On 11/04/09 01:04, Steven Smith wrote:>> Looks good for me: reinstalled everything with patched version - iSCSI >> functionality and performance is ok. No errors / warnings about >> ''netfront: rx->offset'' any more. >> > Great. Thanks for testing this. > > Could you apply the patch to your tree, please, Jeremy? >It should already be there. I applied it shortly after you posted it, since it looked reasonable independent of whether it solved Andreas''s problem. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Smith
2009-Nov-05 12:22 UTC
Re: [Xen-devel] Strange error in domU after dom0 update
> >> Looks good for me: reinstalled everything with patched version - iSCSI > >> functionality and performance is ok. No errors / warnings about > >> ''netfront: rx->offset'' any more. > >> > > Great. Thanks for testing this. > > > > Could you apply the patch to your tree, please, Jeremy? > It should already be there.Ah, so it is. Thanks. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel