Lionel Raynaud
2009-Apr-07 02:24 UTC
[Xen-users] Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Xen Users, We have experienced recently few issues on Xen 3.3.1 for which we would appreciate if one of you can shed some light. First of all, our system configuration is: - a dual Xeon 2.5GHz with 16Gb (8 cores) - Xen 3.3.1 from latest xensources distributed with Linux Kernel 2.6.18.8-xen - Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3 - There are 6 HVM DomUs running, 5 with sporadic issues (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows 2003. - The 5 Fedora-10 domUs have the latest package upgrades, including a Linux kernel 2.6.27.19-170.2.35.fc10.x86_64. They have 2 vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal SATA - Dom0s VPCU is pinned to core 0 (dom0_vcpus_pin) - DomUs are visibly sharing core 1 to 7, (xm vcpu_List) although no config was done to map them to specific Cpu/cores Now here are our observations: (1) The Fedora-10 domUs described above are randomly and partially (see below) freezing after running for some hours. - If there is a pre-existing ssh session on a hung domU, some commands such as ls, ps, tail f <file>,free can be executed while commands such top, vmstat will hang OR sometimes no command at all - Xentop display of 0% activity on a hung domU although I have observed a 100% once on another hung one - There is nothing significant on domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm- - Nagios running on dom0 doesnt really picked this condition up as the hung domUs are still able to answer ping or able to answer Nagios ssh checkin; note that ssh to a hung domU doesnt work although Nagios basic tcp port answers on 22 - Their time is completely off (see next observation below) with or without ntpd running - I had the occasion to run free on few of them and it appears that they had enough free memory, i.e. not swapping at all ð I dont want to speculate on the potential root cause nevertheless what can be the next most effective troubleshooting steps? o Force a domU system dump? And then? o Deep dive into dom0 logs although a quick browsing wasnt successful? o Disable most of the processes on one of these domU to identify if a user proc can cause this issue (may be very time consuming)? o Set the run-level to 3 instead of 5? o etc (2) The 5 Fedora-10s domUs are not keeping their time in sync We have read different pages concerning time management for a Linux domU but we havent found yet something concluding and/or havent been able to set this up properly. The facts are: - Our dom0 runs ntpd and is perfectly synchronized on external public ntp sources - We tried initially to run ntpd on the Fedora-10 domUs, configured on external public sources, which has proven to be unsuccessful; the time is usually off by few minutes - We tried without ntpd, this should be the proper configuration according to our readings as the domUs hardware clock should sync up on their dom0s hw clock alas still unsuccessful. In this case, the domUs end up significantly lagging behind their dom0s time - We have read on few occasion that there is a parameter to set with echo 1 > /proc/sys/xen/independent_wallclock in order to run ntpd on a domU, but /proc/sys/xen doesnt exist on these Fedora-10 domUs. Is it an expected behavior? Should we assume the setting independent_wallclock is only for PV domUs?! - Note that one of the domUs is a Windows 2003 server 32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the default Windows time service, no ntpd installed (3) The 5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as PV. This may be a misunderstanding from our side, however, a dmesg on the 5 Fedora-10 domUs, shows the message: Booting paravirtualized kernel on bare hardware We just installed an HVM centos 5.3 domU, and this time the kernel boot message Booting doesnt appear. Therefore, can we conclude that the presumed HVM Fedora-10 domUs are in fact PV domUs? Should a /proc/sys/xen be present on a PV domU or on any type of domUs? Thank you, Lionel Raynaud. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Lionel Raynaud
2009-Apr-08 11:35 UTC
[Xen-users] Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Xen Users, We have experienced recently few issues on Xen 3.3.1 for which we would appreciate if one of you can shed some light. First of all, our system configuration is: - a dual Xeon 2.5GHz with 16Gb (8 cores) - Xen 3.3.1 from latest xensources distributed with Linux Kernel 2.6.18.8-xen - Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3 - There are 6 HVM DomUs running, 5 with sporadic issues (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows 2003. - The 5 Fedora-10 domUs have the latest package upgrades, including a Linux kernel 2.6.27.19-170.2.35.fc10.x86_64. They have 2 vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal SATA - Dom0s VPCU is pinned to core 0 (dom0_vcpus_pin) - DomUs are visibly sharing core 1 to 7, (xm vcpu_List) although no config was done to map them to specific Cpu/cores Now here are our observations: (1) The Fedora-10 domUs described above are randomly and partially (see below) freezing after running for some hours. - If there is a pre-existing ssh session on a hung domU, some commands such as ls, ps, tail f <file>,free can be executed while commands such top, vmstat will hang OR sometimes no command at all - Xentop display of 0% activity on a hung domU although I have observed a 100% once on another hung one - There is nothing significant on domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm- - Nagios running on dom0 doesnt really picked this condition up as the hung domUs are still able to answer ping or able to answer Nagios ssh checkin; note that ssh to a hung domU doesnt work although Nagios basic tcp port answers on 22 - Their time is completely off (see next observation below) with or without ntpd running - I had the occasion to run free on few of them and it appears that they had enough free memory, i.e. not swapping at all ð I dont want to speculate on the potential root cause nevertheless what can be the next most effective troubleshooting steps? o Force a domU system dump? And then? o Deep dive into dom0 logs although a quick browsing wasnt successful? o Disable most of the processes on one of these domU to identify if a user proc can cause this issue (may be very time consuming)? o Set the run-level to 3 instead of 5? o etc (2) The 5 Fedora-10s domUs are not keeping their time in sync We have read different pages concerning time management for a Linux domU but we havent found yet something concluding and/or havent been able to set this up properly. The facts are: - Our dom0 runs ntpd and is perfectly synchronized on external public ntp sources - We tried initially to run ntpd on the Fedora-10 domUs, configured on external public sources, which has proven to be unsuccessful; the time is usually off by few minutes - We tried without ntpd, this should be the proper configuration according to our readings as the domUs hardware clock should sync up on their dom0s hw clock alas still unsuccessful. In this case, the domUs end up significantly lagging behind their dom0s time - We have read on few occasion that there is a parameter to set with echo 1 > /proc/sys/xen/independent_wallclock in order to run ntpd on a domU, but /proc/sys/xen doesnt exist on these Fedora-10 domUs. Is it an expected behavior? Should we assume the setting independent_wallclock is only for PV domUs?! - Note that one of the domUs is a Windows 2003 server 32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the default Windows time service, no ntpd installed (3) The 5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as PV. This may be a misunderstanding from our side, however, a dmesg on the 5 Fedora-10 domUs, shows the message: Booting paravirtualized kernel on bare hardware We just installed an HVM centos 5.3 domU, and this time the kernel boot message Booting doesnt appear. Therefore, can we conclude that the presumed HVM Fedora-10 domUs are in fact PV domUs? Should a /proc/sys/xen be present on a PV domU or on any type of domUs? Thank you, Lionel Raynaud. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Lionel Raynaud
2009-Apr-11 23:38 UTC
[Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Xen Users, We have experienced recently few issues on Xen 3.3.1 for which we would appreciate if one of you can shed some light. First of all, our system configuration is: - a dual Xeon 2.5GHz with 16Gb (8 cores) - Xen 3.3.1 from latest xensources distributed with Linux Kernel 2.6.18.8-xen - Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3 - There are 6 HVM DomUs running, 5 with sporadic issues (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows 2003. - The 5 Fedora-10 domUs have the latest package upgrades, including a Linux kernel 2.6.27.19-170.2.35.fc10.x86_64. They have 2 vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal SATA - Dom0s VPCU is pinned to core 0 (dom0_vcpus_pin) - DomUs are visibly sharing core 1 to 7, (xm vcpu_List) although no config was done to map them to specific Cpu/cores Now here are our observations: (1) The Fedora-10 domUs described above are randomly and partially (see below) freezing after running for some hours. - If there is a pre-existing ssh session on a hung domU, some commands such as ls, ps, tail f <file>,free can be executed while commands such top, vmstat will hang OR sometimes no command at all - Xentop display of 0% activity on a hung domU although I have observed a 100% once on another hung one - There is nothing significant on domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm- - Nagios running on dom0 doesnt really picked this condition up as the hung domUs are still able to answer ping or able to answer Nagios ssh checkin; note that ssh to a hung domU doesnt work although Nagios basic tcp port answers on 22 - Their time is completely off (see next observation below) with or without ntpd running - I had the occasion to run free on few of them and it appears that they had enough free memory, i.e. not swapping at all ð I dont want to speculate on the potential root cause nevertheless what can be the next most effective troubleshooting steps? o Force a domU system dump? And then? o Deep dive into dom0 logs although a quick browsing wasnt successful? o Disable most of the processes on one of these domU to identify if a user proc can cause this issue (may be very time consuming)? o Set the run-level to 3 instead of 5? o etc (2) The 5 Fedora-10s domUs are not keeping their time in sync We have read different pages concerning time management for a Linux domU but we havent found yet something concluding and/or havent been able to set this up properly. The facts are: - Our dom0 runs ntpd and is perfectly synchronized on external public ntp sources - We tried initially to run ntpd on the Fedora-10 domUs, configured on external public sources, which has proven to be unsuccessful; the time is usually off by few minutes - We tried without ntpd, this should be the proper configuration according to our readings as the domUs hardware clock should sync up on their dom0s hw clock alas still unsuccessful. In this case, the domUs end up significantly lagging behind their dom0s time - We have read on few occasion that there is a parameter to set with echo 1 > /proc/sys/xen/independent_wallclock in order to run ntpd on a domU, but /proc/sys/xen doesnt exist on these Fedora-10 domUs. Is it an expected behavior? Should we assume the setting independent_wallclock is only for PV domUs?! - Note that one of the domUs is a Windows 2003 server 32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the default Windows time service, no ntpd installed (3) The 5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as PV. This may be a misunderstanding from our side, however, a dmesg on the 5 Fedora-10 domUs, shows the message: Booting paravirtualized kernel on bare hardware We just installed an HVM centos 5.3 domU, and this time the kernel boot message Booting doesnt appear. Therefore, can we conclude that the presumed HVM Fedora-10 domUs are in fact PV domUs? Should a /proc/sys/xen be present on a PV domU or on any type of domUs? Thank you, Lionel Raynaud. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Peter Booth
2009-Apr-12 15:48 UTC
Re: [Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Two approaches: 1. Try to get to root cause of problem 2. Work around the problem because you''re not paid to do (1) WORK-AROUND Build a vm with same kernel version as Dom0 ROOT CAUSE 1. Install sar(systat) on one of your DomU''s collecting data at 1 second interval 2. If you dont see much activity on this VM then run some Linux compiles or the Volano benchmark to keep the VM busy 3. Look at the sar dat with ksar. What happens when you get this "hang" ALSO Can you see if you have a /proc/xen or /sys/xen? Peter On Apr 11, 2009, at 7:38 PM, Lionel Raynaud wrote:> > Xen Users, > > We have experienced recently few issues on Xen 3.3.1 for which we > would appreciate if one of you can shed some light. > > First of all, our system configuration is: > - a dual Xeon 2.5GHz with 16Gb (8 cores) > - Xen 3.3.1 from latest xensources distributed with Linux > Kernel 2.6.18.8-xen > - Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3 > - There are 6 HVM DomUs running, 5 with sporadic issues > (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a > Windows 2003. > - The 5 Fedora-10 domUs have the latest package upgrades, > including a Linux kernel 2.6.27.19-170.2.35.fc10.x86_64. They have 2 > vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space > stored on an internal SATA > - Dom0’s VPCU is pinned to core 0 (dom0_vcpus_pin) > - DomUs are visibly sharing core 1 to 7, (xm vcpu_List) > although no config was done to map them to specific Cpu/cores > > Now here are our observations: > > (1) The Fedora-10 domUs described above are randomly and > partially (see below) freezing after running for some hours. > - If there is a pre-existing ssh session on a hung domU, > some commands such as ‘ls’, ‘ps’, ‘tail –f <file>’,’free’ can be > executed while commands such ‘top’, ‘vmstat’ will hang OR sometimes > no command at all > - Xentop display of 0% activity on a hung domU although I > have observed a 100% once on another hung one > - There is nothing significant on domU:/var/log/messages > and nothing as well on dom0:/var/log/xen/qemu-dm-… > - Nagios running on dom0 doesn’t really picked this > condition up as the hung domUs are still able to answer ping or able > to answer Nagios ssh checkin; note that ssh to a hung domU doesn’t > work although Nagios basic tcp port answers on 22 > - Their time is completely off (see next observation below) > with or without ntpd running > - I had the occasion to run ‘free’ on few of them and it > appears that they had enough free memory, i.e. not swapping at all > > ð I don’t want to speculate on the potential root cause > nevertheless what can be the next most effective troubleshooting > steps? > o Force a domU system dump? And then? > o Deep dive into dom0 logs although a quick browsing wasn’t > successful? > o Disable most of the processes on one of these domU to identify > if a user proc can cause this issue (may be very time consuming)? > o Set the run-level to 3 instead of 5? > o etc > > (2) The 5 Fedora-10s domUs are not keeping their time in sync > > We have read different pages concerning time management for a Linux > domU but we haven’t found yet something concluding and/or haven’t > been able to set this up properly. The facts are: > > - Our dom0 runs ntpd and is perfectly synchronized on > external public ntp sources > - We tried initially to run ntpd on the Fedora-10 domUs, > configured on external public sources, which has proven to be > unsuccessful; the time is usually off by few minutes > - We tried without ntpd, this should be the proper > configuration according to our readings as the domUs’ hardware clock > should sync up on their dom0’s hw clock alas still unsuccessful. In > this case, the domUs end up significantly lagging behind their > dom0’s time > - We have read on few occasion that there is a parameter to > set with echo 1 > /proc/sys/xen/independent_wallclock in order to > run ntpd on a domU, but /proc/sys/xen doesn’t exist on these > Fedora-10 domUs. Is it an expected behavior? Should we assume the > setting independent_wallclock is only for PV domUs?! > - Note that one of the domUs is a Windows 2003 server 32- > bits and is perfectly on time, i.e. in sync with its dom0. It does > run the default Windows time service, no ntpd installed > > (3) The 5 Fedora-10 domUs have been installed as HVM domU but > their kernels see them as PV. This may be a misunderstanding from > our side, however, a dmesg on the 5 Fedora-10 domUs, shows the > message: > “Booting paravirtualized kernel on bare hardware” > > We just installed an HVM centos 5.3 domU, and this time the kernel > boot message “Booting …” doesn’t appear. > > Therefore, can we conclude that the presumed HVM Fedora-10 domUs are > in fact PV domUs? > Should a /proc/sys/xen be present on a PV domU or on any type of > domUs? > > Thank you, > Lionel Raynaud. > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Lionel Raynaud
2009-Apr-14 18:29 UTC
RE: [Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV
Thanks Peter, the resolution is simply not to use Fedora-10 on a DomU for production-like systems; Fedora-10 aiming to be on the cutting edge as you certainly know. Nevertheless, we are now using centos-5.3 domUs which are manifestly running flawlessly. As a follow up on your question, I have few domUs running CentOS 5.3, SLES 11, and Fedora-10; all of them are HVM (Fully virtualized) but none of them have a /proc/sys/xen or /sys/xen. Is it the expected behavior for HVM domUs? I.e. only PV domU would get a /proc/sys/xen created. Is there a specific module to load in order to have this ../xen dir created? Thank you, Lionel Raynaud. From: Peter Booth [mailto:peter_booth@mac.com] Sent: Sunday, April 12, 2009 11:49 AM To: Lionel Raynaud Cc: xen-users@lists.xensource.com Subject: Re: [Xen-users] RE: Fedora 10-VM hung, Time issue and /dev/proc/sys/xen missing, HVM VMs recognized as PV Two approaches: 1. Try to get to root cause of problem 2. Work around the problem because you''re not paid to do (1) WORK-AROUND Build a vm with same kernel version as Dom0 ROOT CAUSE 1. Install sar(systat) on one of your DomU''s collecting data at 1 second interval 2. If you dont see much activity on this VM then run some Linux compiles or the Volano benchmark to keep the VM busy 3. Look at the sar dat with ksar. What happens when you get this "hang" ALSO Can you see if you have a /proc/xen or /sys/xen? Peter On Apr 11, 2009, at 7:38 PM, Lionel Raynaud wrote: Xen Users, We have experienced recently few issues on Xen 3.3.1 for which we would appreciate if one of you can shed some light. First of all, our system configuration is: - a dual Xeon 2.5GHz with 16Gb (8 cores) - Xen 3.3.1 from latest xensources distributed with Linux Kernel 2.6.18.8-xen - Dom0 is a Centos 5.2 upgraded few days ago to Centos 5.3 - There are 6 HVM DomUs running, 5 with sporadic issues (see below) are Fedora-10 x86_64 and 1 domU (no issue so far) is a Windows 2003. - The 5 Fedora-10 domUs have the latest package upgrades, including a Linux kernel 2.6.27.19-170.2.35.fc10.x86_64. They have 2 vCPU each, between 512MB to 1Gb of memory, and 30Gb of disk space stored on an internal SATA - Dom0s VPCU is pinned to core 0 (dom0_vcpus_pin) - DomUs are visibly sharing core 1 to 7, (xm vcpu_List) although no config was done to map them to specific Cpu/cores Now here are our observations: (1) The Fedora-10 domUs described above are randomly and partially (see below) freezing after running for some hours. - If there is a pre-existing ssh session on a hung domU, some commands such as ls, ps, tail f <file>,free can be executed while commands such top, vmstat will hang OR sometimes no command at all - Xentop display of 0% activity on a hung domU although I have observed a 100% once on another hung one - There is nothing significant on domU:/var/log/messages and nothing as well on dom0:/var/log/xen/qemu-dm- - Nagios running on dom0 doesnt really picked this condition up as the hung domUs are still able to answer ping or able to answer Nagios ssh checkin; note that ssh to a hung domU doesnt work although Nagios basic tcp port answers on 22 - Their time is completely off (see next observation below) with or without ntpd running - I had the occasion to run free on few of them and it appears that they had enough free memory, i.e. not swapping at all ð I dont want to speculate on the potential root cause nevertheless what can be the next most effective troubleshooting steps? o Force a domU system dump? And then? o Deep dive into dom0 logs although a quick browsing wasnt successful? o Disable most of the processes on one of these domU to identify if a user proc can cause this issue (may be very time consuming)? o Set the run-level to 3 instead of 5? o etc (2) The 5 Fedora-10s domUs are not keeping their time in sync We have read different pages concerning time management for a Linux domU but we havent found yet something concluding and/or havent been able to set this up properly. The facts are: - Our dom0 runs ntpd and is perfectly synchronized on external public ntp sources - We tried initially to run ntpd on the Fedora-10 domUs, configured on external public sources, which has proven to be unsuccessful; the time is usually off by few minutes - We tried without ntpd, this should be the proper configuration according to our readings as the domUs hardware clock should sync up on their dom0s hw clock alas still unsuccessful. In this case, the domUs end up significantly lagging behind their dom0s time - We have read on few occasion that there is a parameter to set with echo 1 > /proc/sys/xen/independent_wallclock in order to run ntpd on a domU, but /proc/sys/xen doesnt exist on these Fedora-10 domUs. Is it an expected behavior? Should we assume the setting independent_wallclock is only for PV domUs?! - Note that one of the domUs is a Windows 2003 server 32-bits and is perfectly on time, i.e. in sync with its dom0. It does run the default Windows time service, no ntpd installed (3) The 5 Fedora-10 domUs have been installed as HVM domU but their kernels see them as PV. This may be a misunderstanding from our side, however, a dmesg on the 5 Fedora-10 domUs, shows the message: Booting paravirtualized kernel on bare hardware We just installed an HVM centos 5.3 domU, and this time the kernel boot message Booting doesnt appear. Therefore, can we conclude that the presumed HVM Fedora-10 domUs are in fact PV domUs? Should a /proc/sys/xen be present on a PV domU or on any type of domUs? Thank you, Lionel Raynaud. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.0.238 / Virus Database: 270.11.35/2034 - Release Date: 04/13/09 05:51:00 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users