Hello, when for example the Dom0 needs to be rebooted manually and several DomU''s are running, the Dom0 will crash during shutdown, that just a Manual Hardware Reset will help to reboot. Current HVM is: Xen 4.1.3 Current Kernel: 2.6.32-5-xen-amd64 Is there a know issue, or what might could case this issue. When rebooting the Dom0 without having at least one DomU running, then the reboot is successful. Can someone help me? Thanks. Cheers, Maik
Casey DeLorme
2012-Sep-03 16:27 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
This could have changed, but I am fairly certain rebooting Dom0 reboots Xen, hence any running DomU''s will have to be shutdown. A 2010 reference: http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html You could always try running aliasing the shutdown command to shutdown all running domU''s by name, but otherwise forcing them to shutdown as Dom0 does could damage your data. On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer <maik.brauer@mbs-systems.net>wrote:> Hello, > > when for example the Dom0 needs to be rebooted manually and several DomU''s > are running, > the Dom0 will crash during shutdown, that just a Manual Hardware Reset > will help to reboot. > > Current HVM is: Xen 4.1.3 > Current Kernel: 2.6.32-5-xen-amd64 > > Is there a know issue, or what might could case this issue. > When rebooting the Dom0 without having at least one DomU running, then the > reboot is successful. > Can someone help me? > Thanks. > > Cheers, > Maik > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Hi, that is true, I understand that, but at least if the Dom0 is rebooting and will destroy the DomU''s, the Dom0 itself should not crash and hang that a Hard-reset is needed to reboot. Is there not any shutdown control, which can shutdown the DomU''s from the Dom0 automatically in case of reboot or shutdown? When I manually shutdown all DomU''s before rebooting Dom0, then the Dom0 will reboot and come up again. On Sep 3, 2012, at 6:27 PM, Casey DeLorme wrote:> This could have changed, but I am fairly certain rebooting Dom0 reboots Xen, hence any running DomU''s will have to be shutdown. > > A 2010 reference: > http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html > > You could always try running aliasing the shutdown command to shutdown all running domU''s by name, but otherwise forcing them to shutdown as Dom0 does could damage your data. > > On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer <maik.brauer@mbs-systems.net> wrote: > Hello, > > when for example the Dom0 needs to be rebooted manually and several DomU''s are running, > the Dom0 will crash during shutdown, that just a Manual Hardware Reset will help to reboot. > > Current HVM is: Xen 4.1.3 > Current Kernel: 2.6.32-5-xen-amd64 > > Is there a know issue, or what might could case this issue. > When rebooting the Dom0 without having at least one DomU running, then the reboot is successful. > Can someone help me? > Thanks. > > Cheers, > Maik > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Casey DeLorme
2012-Sep-03 17:10 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
As stated, you can alias shutdown to do exactly what you need, it can be as simple as a series of hard-coded operations to a complex custom shell script that parses your domains and closes each with feedback. I don''t know of a specific xl toolstack solution, but if you are using the xm toolstack you can try this: http://www.novell.com/support/kb/doc.php?id=3029956 Keep in mind that without PV on HVM drivers or paravirtualized DomU''s you''ll have to use the `destroy` command from Dom0, which is not a graceful shutdown. On Mon, Sep 3, 2012 at 12:41 PM, Maik Brauer <maik.brauer@mbs-systems.net>wrote:> Hi, > > that is true, I understand that, but at least if the Dom0 is rebooting and > will destroy the DomU''s, the > Dom0 itself should not crash and hang that a Hard-reset is needed to > reboot. > > Is there not any shutdown control, which can shutdown the DomU''s from the > Dom0 automatically in case of reboot or shutdown? > When I manually shutdown all DomU''s before rebooting Dom0, then the Dom0 > will reboot and come up again. > > > On Sep 3, 2012, at 6:27 PM, Casey DeLorme wrote: > > This could have changed, but I am fairly certain rebooting Dom0 reboots > Xen, hence any running DomU''s will have to be shutdown. > > A 2010 reference: > > http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html > > You could always try running aliasing the shutdown command to shutdown all > running domU''s by name, but otherwise forcing them to shutdown as Dom0 does > could damage your data. > > On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer <maik.brauer@mbs-systems.net>wrote: > >> Hello, >> >> when for example the Dom0 needs to be rebooted manually and several >> DomU''s are running, >> the Dom0 will crash during shutdown, that just a Manual Hardware Reset >> will help to reboot. >> >> Current HVM is: Xen 4.1.3 >> Current Kernel: 2.6.32-5-xen-amd64 >> >> Is there a know issue, or what might could case this issue. >> When rebooting the Dom0 without having at least one DomU running, then >> the reboot is successful. >> Can someone help me? >> Thanks. >> >> Cheers, >> Maik >> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xen.org >> http://lists.xen.org/xen-users >> > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-04 08:11 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
Could you not top post please, it makes it rather hard to follow the flow of the conversation. On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:> As stated, you can alias shutdown to do exactly what you need, it can > be as simple as a series of hard-coded operations to a complex custom > shell script that parses your domains and closes each with feedback.Xen ships the "xendomains" initscript which can halt guest on shutdown as well as automatically start specific guests on boot. It can also be configured to suspend/resume them or (I think) migrate them away. For diagnosing the crash itself more details will be required than were provided in the original post. Please see http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. At a minimum we would need a capture (serial console or photo) of the crash backtrace. Ian.
On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:> Could you not top post please, it makes it rather hard to follow the > flow of the conversation. > On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: >> As stated, you can alias shutdown to do exactly what you need, it can >> be as simple as a series of hard-coded operations to a complex custom >> shell script that parses your domains and closes each with feedback. > > Xen ships the "xendomains" initscript which can halt guest on shutdown > as well as automatically start specific guests on boot. It can also be > configured to suspend/resume them or (I think) migrate them away. > > For diagnosing the crash itself more details will be required than were > provided in the original post. Please see > http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. > At a minimum we would need a capture (serial console or photo) of the > crash backtrace. > > Ian. > >I found out that it hangs during re-boot of dom0 when having more Network interfaces involved, like: vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] in case you use just one or having the basic line in place, it is working: vif = [ '''' ] The system stops after initiating the reboot at the following line in the console: System will restart........... In the Logfile of /var/log/message you can find this as the last line: Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. In the /var/log/daemong.log you can find this message: Sep 8 15:44:37 rootsrv01 acpid: exiting Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exiting Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f Cheers, Maik> > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On Sep 8, 2012, at 4:50 PM, Maik Brauer wrote:> > On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: > >> Could you not top post please, it makes it rather hard to follow the >> flow of the conversation. >> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: >>> As stated, you can alias shutdown to do exactly what you need, it can >>> be as simple as a series of hard-coded operations to a complex custom >>> shell script that parses your domains and closes each with feedback. >> >> Xen ships the "xendomains" initscript which can halt guest on shutdown >> as well as automatically start specific guests on boot. It can also be >> configured to suspend/resume them or (I think) migrate them away. >> >> For diagnosing the crash itself more details will be required than were >> provided in the original post. Please see >> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. >> At a minimum we would need a capture (serial console or photo) of the >> crash backtrace. >> >> Ian. >> >> > I found out that it hangs during re-boot of dom0 when having more Network interfaces involved, like: > vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] > > in case you use just one or having the basic line in place, it is working: > vif = [ '''' ] > > The system stops after initiating the reboot at the following line in the console: System will restart........... > > In the Logfile of /var/log/message you can find this as the last line: > Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot > Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state > Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state > Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state > Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state > Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state > Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state > Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. > Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. > > In the /var/log/daemong.log you can find this message: > Sep 8 15:44:37 rootsrv01 acpid: exiting > Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exiting > Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f > > Cheers, > Maik >Hi Ian, I found an interesting article in the internet, having exact the same issue: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630984 Can you confirm.>> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xen.org >> http://lists.xen.org/xen-users > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-10 08:39 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: > > > Could you not top post please, it makes it rather hard to follow the > > flow of the conversation. > > On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: > >> As stated, you can alias shutdown to do exactly what you need, it can > >> be as simple as a series of hard-coded operations to a complex custom > >> shell script that parses your domains and closes each with feedback. > > > > Xen ships the "xendomains" initscript which can halt guest on shutdown > > as well as automatically start specific guests on boot. It can also be > > configured to suspend/resume them or (I think) migrate them away. > > > > For diagnosing the crash itself more details will be required than were > > provided in the original post. Please see > > http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. > > At a minimum we would need a capture (serial console or photo) of the > > crash backtrace. > > > > Ian. > > > > > I found out that it hangs during re-boot of dom0 when having more > Network interfaces involved, like: > vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', > ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', > ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ]6 interfaces total, 3 of which have a random mac on each reboot and all get put on the default bridge? Is your default script vif-bridge or something else? Have you modified any of these scripts?> in case you use just one or having the basic line in place, it is > working: > vif = [ '''' ] > > The system stops after initiating the reboot at the following line in the console: System will restart...........So this is a hang, not a crash as suggested originally? If it is a hang then you might have some luck using hte magic sysrq keys to print lists of blocked tasks. I''m not sure in Squeeze but you might need to enable this as described in Documentation/sysrq.txt in the Linux source. Blocked tasks are listed with SysRQ-''w''. If you have serial console then ''t'' will list all task, but that list can be quite long so it is useless without a serial console.> In the Logfile of /var/log/message you can find this as the last line: > Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot > Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state > Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state > Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state > Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state > Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state > Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state > Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. > Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. > > In the /var/log/daemong.log you can find this message: > Sep 8 15:44:37 rootsrv01 acpid: exiting > Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exitingAll the above (both message and daemon.log) look like normal parts of shutting down to me.> Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000fThis might be worth following up on. I would do this by adding near the top of vif-setup and/or vif-bridge (or whichever script you use): exec 1>>/var/log/vif-setup.log exec 2>&1 I would then also annotate all through vif-bridge in the offline path with echo statements showing how far it got and what command was to be run next. Ian.
On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote:> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote: >> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: >> >>> Could you not top post please, it makes it rather hard to follow the >>> flow of the conversation. >>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: >>>> As stated, you can alias shutdown to do exactly what you need, it can >>>> be as simple as a series of hard-coded operations to a complex custom >>>> shell script that parses your domains and closes each with feedback. >>> >>> Xen ships the "xendomains" initscript which can halt guest on shutdown >>> as well as automatically start specific guests on boot. It can also be >>> configured to suspend/resume them or (I think) migrate them away. >>> >>> For diagnosing the crash itself more details will be required than were >>> provided in the original post. Please see >>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. >>> At a minimum we would need a capture (serial console or photo) of the >>> crash backtrace. >>> >>> Ian. >>> >>> >> I found out that it hangs during re-boot of dom0 when having more >> Network interfaces involved, like: >> vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', >> ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', >> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] > > 6 interfaces total, 3 of which have a random mac on each reboot and all > get put on the default bridge?No, not really. The bridge is different for each interface. We have VLAN20, VLAN40, etc as bridges. These one will be created as well at the beginning when the system starts up (create_bridges.sh): /usr/sbin/brctl addbr VLAN11 /usr/sbin/brctl addbr VLAN12 /usr/sbin/brctl addbr VLAN20 /usr/sbin/brctl addbr VLAN30 /usr/sbin/brctl addbr VLAN40 /sbin/ifconfig VLAN11 down -arp up /sbin/ifconfig VLAN12 down -arp up /sbin/ifconfig VLAN20 down -arp up /sbin/ifconfig VLAN30 down -arp up /sbin/ifconfig VLAN40 down -arp up> > Is your default script vif-bridge or something else? Have you modified > any of these scripts?No I didn''t modify anything. Still the original script.> >> in case you use just one or having the basic line in place, it is >> working: >> vif = [ '''' ] >> >> The system stops after initiating the reboot at the following line in the console: System will restart........... > > So this is a hang, not a crash as suggested originally?Yes, you are right. It is just a hang.> > If it is a hang then you might have some luck using hte magic sysrq keys > to print lists of blocked tasks. I''m not sure in Squeeze but you might > need to enable this as described in Documentation/sysrq.txt in the Linux > source. > > Blocked tasks are listed with SysRQ-''w''. If you have serial console then > ''t'' will list all task, but that list can be quite long so it is useless > without a serial console.List is empty. SysRQ -w and SysRQ-t shows nothing at all. There is nothing running anymore. It shows periodically: INFO: task xenwatch:12 blocked for more than 120 seconds Seems that the xenwatch is blocking the reboot here, is that assumption correct? But strange enough that I can''t see any process anymore with the SysRQ -t or SysRQ -w>> In the Logfile of /var/log/message you can find this as the last line: >> Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot >> Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state >> Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state >> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state >> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state >> Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state >> Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state >> Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. >> Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. >> >> In the /var/log/daemong.log you can find this message: >> Sep 8 15:44:37 rootsrv01 acpid: exiting >> Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exiting > > All the above (both message and daemon.log) look like normal parts of > shutting down to me. > >> Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f > > This might be worth following up on.When putting a "sleep 5" in stop section of the /etc/init.d/xendomains: case "$1" in start) start rc_status if test -f $LOCKFILE; then rc_status -v; fi ;; stop) stop rc_status -v sleep 5 ;; then the system shuts down as expected and is rebooting properly. In the daemon.log file I couldn''t find the error: Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f anymore. It seems that it disappeared after putting a delay inside. Could it be a race condition here during shutdown, with the udev-daemon??> > I would do this by adding near the top of vif-setup and/or vif-bridge > (or whichever script you use): > exec 1>>/var/log/vif-setup.log > exec 2>&1 > > I would then also annotate all through vif-bridge in the offline path > with echo statements showing how far it got and what command was to be > run next. > > Ian. > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-10 15:10 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Mon, 2012-09-10 at 16:00 +0100, Maik Brauer wrote:> On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote: > > > On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote: > >> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: > >> > >>> Could you not top post please, it makes it rather hard to follow the > >>> flow of the conversation. > >>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: > >>>> As stated, you can alias shutdown to do exactly what you need, it can > >>>> be as simple as a series of hard-coded operations to a complex custom > >>>> shell script that parses your domains and closes each with feedback. > >>> > >>> Xen ships the "xendomains" initscript which can halt guest on shutdown > >>> as well as automatically start specific guests on boot. It can also be > >>> configured to suspend/resume them or (I think) migrate them away. > >>> > >>> For diagnosing the crash itself more details will be required than were > >>> provided in the original post. Please see > >>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. > >>> At a minimum we would need a capture (serial console or photo) of the > >>> crash backtrace. > >>> > >>> Ian. > >>> > >>> > >> I found out that it hangs during re-boot of dom0 when having more > >> Network interfaces involved, like: > >> vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', > >> ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', > >> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] > > > > 6 interfaces total, 3 of which have a random mac on each reboot and all > > get put on the default bridge? > > No, not really. The bridge is different for each interface.You have three lots of '''' which will all go onto the same bridge AFAICT (whichever one is determined to be the default)> > If it is a hang then you might have some luck using hte magic sysrq keys > > to print lists of blocked tasks. I''m not sure in Squeeze but you might > > need to enable this as described in Documentation/sysrq.txt in the Linux > > source. > > > > Blocked tasks are listed with SysRQ-''w''. If you have serial console then > > ''t'' will list all task, but that list can be quite long so it is useless > > without a serial console. > > List is empty. SysRQ -w and SysRQ-t shows nothing at all.You might need to increase the log verbosity with SysRQ-9 first?> There is nothing running anymore. > It shows periodically: INFO: task xenwatch:12 blocked for more than 120 secondsWhat is the very last thing printed before this?> Seems that the xenwatch is blocking the reboot here, is that assumption correct? But strange enough that I can''t > see any process anymore with the SysRQ -t or SysRQ -wThe xenwatch thread ought to count as a process for at least the purposes of SysRQ-t if not -w.> > >> In the Logfile of /var/log/message you can find this as the last line: > >> Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot > >> Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state > >> Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state > >> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state > >> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state > >> Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state > >> Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state > >> Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. > >> Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. > >> > >> In the /var/log/daemong.log you can find this message: > >> Sep 8 15:44:37 rootsrv01 acpid: exiting > >> Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exiting > > > > All the above (both message and daemon.log) look like normal parts of > > shutting down to me. > > > >> Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f > > > > This might be worth following up on. > > When putting a "sleep 5" in stop section of the /etc/init.d/xendomains: > case "$1" in > start) > start > rc_status > if test -f $LOCKFILE; then rc_status -v; fi > ;; > > stop) > stop > rc_status -v > sleep 5 > ;; > > then the system shuts down as expected and is rebooting properly. > In the daemon.log file I couldn''t find the error: Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f > anymore. It seems that it disappeared after putting a delay inside. Could it be a race condition here during shutdown, with the udev-daemon??It could be a race with the guests actually shuting down vs the rest of the initscripts running. Really the initscript ought to wait, the default at least with the script shipped with xen is to do so, by using shutdown --wait. can you confirm whether or not this is happening for you? Possibly someone is trying to talk to xenstore after xenstored has exited -- I expect that would cause the sorts of blocked for 120 messages you are seeing.
On Sep 10, 2012, at 5:10 PM, Ian Campbell wrote:> On Mon, 2012-09-10 at 16:00 +0100, Maik Brauer wrote: >> On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote: >> >>> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote: >>>> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote: >>>> >>>>> Could you not top post please, it makes it rather hard to follow the >>>>> flow of the conversation. >>>>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote: >>>>>> As stated, you can alias shutdown to do exactly what you need, it can >>>>>> be as simple as a series of hard-coded operations to a complex custom >>>>>> shell script that parses your domains and closes each with feedback. >>>>> >>>>> Xen ships the "xendomains" initscript which can halt guest on shutdown >>>>> as well as automatically start specific guests on boot. It can also be >>>>> configured to suspend/resume them or (I think) migrate them away. >>>>> >>>>> For diagnosing the crash itself more details will be required than were >>>>> provided in the original post. Please see >>>>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance. >>>>> At a minimum we would need a capture (serial console or photo) of the >>>>> crash backtrace. >>>>> >>>>> Ian. >>>>> >>>>> >>>> I found out that it hangs during re-boot of dom0 when having more >>>> Network interfaces involved, like: >>>> vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', >>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', >>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] >>> >>> 6 interfaces total, 3 of which have a random mac on each reboot and all >>> get put on the default bridge? >> >> No, not really. The bridge is different for each interface. > > You have three lots of '''' which will all go onto the same bridge AFAICT > (whichever one is determined to be the default)That is right. As long as I put nothing inside that it should be a different script to execute, it will use default for ''''> >>> If it is a hang then you might have some luck using hte magic sysrq keys >>> to print lists of blocked tasks. I''m not sure in Squeeze but you might >>> need to enable this as described in Documentation/sysrq.txt in the Linux >>> source. >>> >>> Blocked tasks are listed with SysRQ-''w''. If you have serial console then >>> ''t'' will list all task, but that list can be quite long so it is useless >>> without a serial console. >> >> List is empty. SysRQ -w and SysRQ-t shows nothing at all. > > You might need to increase the log verbosity with SysRQ-9 first?I did and now I got more Information. But due to the amount of data which slips over the console screen I am not able to record properly. Can you advice what to do here?> >> There is nothing running anymore. >> It shows periodically: INFO: task xenwatch:12 blocked for more than 120 seconds > > What is the very last thing printed before this?There is nothing before. Just that message pops up periodically.> >> Seems that the xenwatch is blocking the reboot here, is that assumption correct? But strange enough that I can''t >> see any process anymore with the SysRQ -t or SysRQ -w > > The xenwatch thread ought to count as a process for at least the > purposes of SysRQ-t if not -w.Could be, but due to the amount it slips over the screen, that I am not able to read it line by line. Please advice a procedure to record.> >> >>>> In the Logfile of /var/log/message you can find this as the last line: >>>> Sep 8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system reboot >>>> Sep 8 15:44:31 rootsrv01 kernel: [ 73.716246] VLAN20: port 1(vif2.3) entering forwarding state >>>> Sep 8 15:44:31 rootsrv01 kernel: [ 74.500111] VLAN40: port 1(vif2.5) entering forwarding state >>>> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317431] VLAN20: port 1(vif2.3) entering disabled state >>>> Sep 8 15:44:34 rootsrv01 kernel: [ 77.317490] VLAN20: port 1(vif2.3) entering disabled state >>>> Sep 8 15:44:36 rootsrv01 kernel: [ 79.368685] VLAN40: port 1(vif2.5) entering disabled state >>>> Sep 8 15:44:36 rootsrv01 kernel: [ 79.369156] VLAN40: port 1(vif2.5) entering disabled state >>>> Sep 8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped. >>>> Sep 8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com"] exiting on signal 15. >>>> >>>> In the /var/log/daemong.log you can find this message: >>>> Sep 8 15:44:37 rootsrv01 acpid: exiting >>>> Sep 8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, un-registering and exiting >>> >>> All the above (both message and daemon.log) look like normal parts of >>> shutting down to me. >>> >>>> Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f >>> >>> This might be worth following up on. >> >> When putting a "sleep 5" in stop section of the /etc/init.d/xendomains: >> case "$1" in >> start) >> start >> rc_status >> if test -f $LOCKFILE; then rc_status -v; fi >> ;; >> >> stop) >> stop >> rc_status -v >> sleep 5 >> ;; >> >> then the system shuts down as expected and is rebooting properly. >> In the daemon.log file I couldn''t find the error: Sep 8 15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected exit with status 0x000f >> anymore. It seems that it disappeared after putting a delay inside. Could it be a race condition here during shutdown, with the udev-daemon?? > > It could be a race with the guests actually shuting down vs the rest of > the initscripts running. > > Really the initscript ought to wait, the default at least with the > script shipped with xen is to do so, by using shutdown --wait. can you > confirm whether or not this is happening for you?At least I can see that the shutdown --wait is in the scripts. So it seems that the init script is waiting. But independent from that, something must be still in use. Which block the reboot process.> > Possibly someone is trying to talk to xenstore after xenstored has > exited -- I expect that would cause the sorts of blocked for 120 > messages you are seeing. >Could be, but we need to find out what is blocking the shutdown. I do not know what else I can do in order to measure and collect data for investigation. Let me know what else I can do? You can easiliy reproduce this issue, when using more that 3 Network devices. I installed that now on several machines at home and I have on all the same issue when using more than 2-3 network Interfaces.> > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-12 08:13 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:> >>>> I found out that it hangs during re-boot of dom0 when having more > >>>> Network interfaces involved, like: > >>>> vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', > >>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', > >>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] > >>> > >>> 6 interfaces total, 3 of which have a random mac on each reboot and all > >>> get put on the default bridge? > >> > >> No, not really. The bridge is different for each interface. > > > > You have three lots of '''' which will all go onto the same bridge AFAICT > > (whichever one is determined to be the default) > > That is right. As long as I put nothing inside that it should be a > different script to execute, it will use default for ''''The default is "vif-bridge". Have you changed the default? If not then your configuration as shown will put three interfaces on the *same* bridge. Is this really what you want? You claim above that the bridge is different for each interface, but unless you have changed something somewhere then this is not the case. Since you are having problems it is important to identify everything which you have changed from the defaults.> >> List is empty. SysRQ -w and SysRQ-t shows nothing at all. > > > > You might need to increase the log verbosity with SysRQ-9 first? > > I did and now I got more Information. But due to the amount of data which slips over the console screen I am not able > to record properly. Can you advice what to do here?Like I said "that list can be quite long so it is useless without a serial console": http://wiki.xen.org/wiki/Xen_Serial_Console Depending on your distro you might also find this info in the logs under /var/log somewhere.> > > >> There is nothing running anymore. > >> It shows periodically: INFO: task xenwatch:12 blocked for more than 120 seconds > > > > What is the very last thing printed before this? > > There is nothing before.So the output is silent from boot until this message comes up? That seems unlikely, since there should be plenty of messages from the shutdown process itself if nothing else. What is the last message one the screen before this one? In fact what is the entire last screenfull of output?> > Really the initscript ought to wait, the default at least with the > > script shipped with xen is to do so, by using shutdown --wait. can you > > confirm whether or not this is happening for you? > > At least I can see that the shutdown --wait is in the scripts. So it seems that the init script is waiting. > But independent from that, something must be still in use. Which block the reboot process. > > > > Possibly someone is trying to talk to xenstore after xenstored has > > exited -- I expect that would cause the sorts of blocked for 120 > > messages you are seeing. > > > Could be, but we need to find out what is blocking the shutdown. I do not know what else I can do in order to measure and collect > data for investigation.Did you add debugging to the hotplug scripts like I suggested a couple of mails back? If you run the xendomains script by hand and then *immediately* after it exits run "xl list" have the domains actually gone? You could even stick some calls to xl list into the script itself and verify that the domains are indeed shutting down as expected. BTW Are you using xl or xend? Ian.
Ian Campbell
2012-Sep-12 10:23 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:> You can easiliy reproduce this issue, when using more that 3 Network > devices.I installed a PV guest on Debian Wheezy using 5 vifs all on the same bridge and could not reproduce this, the domain was successfully shutdown on reboot. I also replaced the Debian xendomains script with the one from xen-4.1-testing (since Debian''s differs). Still no repro. Ian.
On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: >> You can easiliy reproduce this issue, when using more that 3 Network >> devices. > > I installed a PV guest on Debian Wheezy using 5 vifs all on the same > bridge and could not reproduce this, the domain was successfully > shutdown on reboot. > > I also replaced the Debian xendomains script with the one from > xen-4.1-testing (since Debian''s differs). Still no repro. >Hi Ian, please try it with the stable squeeze one. Because I am currently not using an unstable version. I have customers running on that machines so I can''t experiment. Can you please try with Debian 64bit?> Ian. > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On Sep 12, 2012, at 10:13 AM, Ian Campbell wrote:> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: >>>>>> I found out that it hangs during re-boot of dom0 when having more >>>>>> Network interfaces involved, like: >>>>>> vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'', '''', '''', >>>>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'', '''', >>>>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ] >>>>> >>>>> 6 interfaces total, 3 of which have a random mac on each reboot and all >>>>> get put on the default bridge? >>>> >>>> No, not really. The bridge is different for each interface. >>> >>> You have three lots of '''' which will all go onto the same bridge AFAICT >>> (whichever one is determined to be the default) >> >> That is right. As long as I put nothing inside that it should be a >> different script to execute, it will use default for '''' > > The default is "vif-bridge". Have you changed the default?No I did not change this default script. Everything is at is has been delivered in the Xen Source package.> > If not then your configuration as shown will put three interfaces on the > *same* bridge. Is this really what you want?No, because it will "not" put everything on the same bridge, because the default setting is "routed mode" due to the fact that my providers network configuration has changed the routing. Therefore in xend-config.sxp we have the following disabled: #(network-script network-route) #(vif-script vif-route) and the next one enabled: (network-script network-route) (vif-script vif-route) So basically you can see them as placeholder for the eth1, eth2 and eth4 devices in the Guest domU. For the other 2 interfaces it is different. They should be bridged (different from default). Therefore I have to put the "script=vif-bridge" in the config as shown above. See below the output of brctl show: bridge name bridge id STP enabled interfaces VLAN11 8000.000000000000 no VLAN12 8000.000000000000 no VLAN20 8000.feffffffffff no vif2.3 VLAN30 8000.000000000000 no VLAN40 8000.feffffffffff no vif2.5> > You claim above that the bridge is different for each interface, but > unless you have changed something somewhere then this is not the case. > Since you are having problems it is important to identify everything > which you have changed from the defaults.No, I am saying that the bridge name is different. Not that the script is different. I am just creating isolated bridges VLAN20, VLAN30, VLAN40, and so on in order to connect special network interfaces together from different domU''s.> >>>> List is empty. SysRQ -w and SysRQ-t shows nothing at all. >>> >>> You might need to increase the log verbosity with SysRQ-9 first? >> >> I did and now I got more Information. But due to the amount of data which slips over the console screen I am not able >> to record properly. Can you advice what to do here? > > Like I said "that list can be quite long so it is useless > without a serial console": http://wiki.xen.org/wiki/Xen_Serial_ConsoleThis will be a challenge.> > Depending on your distro you might also find this info in the logs > under /var/log somewhere. >There is not really useful information available. It will not print the info we need. I checked it already.>>> >>>> There is nothing running anymore. >>>> It shows periodically: INFO: task xenwatch:12 blocked for more than 120 seconds >>> >>> What is the very last thing printed before this? >> >> There is nothing before. > > So the output is silent from boot until this message comes up? That > seems unlikely, since there should be plenty of messages from the > shutdown process itself if nothing else.Yes there a plenty of messages. Let me put some lines below: Stopping NFS Daemon Stopping portmap daemon. Deconfiguring network interfaces Listening on LPF/eth0/00:1c:42:77:7a:29 Sending on LPF/eth0/00:1c:42:77:7a:29 DHCPRELEASE on eth0 Cleaning up ifdown. Saving system clock. Deactivating swap. Will now restart. [ 720.213710] INFO: task xenwatch:12 blocked for more than 120 seconds [ 720.213.753] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message [ 840.212.745] INFO: task reboot:3347 blocked for more than 120 seconds [ 840.212.785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message (last INFO messages will repeat infinitely)> > What is the last message one the screen before this one? In fact what is > the entire last screenfull of output? >See above.>>> Really the initscript ought to wait, the default at least with the >>> script shipped with xen is to do so, by using shutdown --wait. can you >>> confirm whether or not this is happening for you? >> >> At least I can see that the shutdown --wait is in the scripts. So it seems that the init script is waiting. >> But independent from that, something must be still in use. Which block the reboot process. >>> >>> Possibly someone is trying to talk to xenstore after xenstored has >>> exited -- I expect that would cause the sorts of blocked for 120 >>> messages you are seeing. >>> >> Could be, but we need to find out what is blocking the shutdown. I do not know what else I can do in order to measure and collect >> data for investigation. > > Did you add debugging to the hotplug scripts like I suggested a couple > of mails back?No I didn''t up to now.> > If you run the xendomains script by hand and then *immediately* after it > exits run "xl list" have the domains actually gone? You could even stick > some calls to xl list into the script itself and verify that the domains > are indeed shutting down as expected.root@xenserver:/etc/xen/scripts# xm list Name ID Mem VCPUs State Time(s) Domain-0 0 880 1 r----- 82.4 dnssrv01-v6 3 128 1 -b---- 4.5 root@xenserver:/etc/xen/scripts# /etc/init.d/xendomains stop Shutting down Xen domains: dnssrv01-v6(save)... [done]. root@xenserver:/etc/xen/scripts# xm list Name ID Mem VCPUs State Time(s) Domain-0 0 880 1 r----- 89.4 root@xenserver:/etc/xen/scripts#> > BTW Are you using xl or xend?I am using xend.> > Ian. > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-13 05:38 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote: > > > On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: > >> You can easiliy reproduce this issue, when using more that 3 Network > >> devices. > > > > I installed a PV guest on Debian Wheezy using 5 vifs all on the same > > bridge and could not reproduce this, the domain was successfully > > shutdown on reboot. > > > > I also replaced the Debian xendomains script with the one from > > xen-4.1-testing (since Debian''s differs). Still no repro. > > > Hi Ian, please try it with the stable squeeze one. Because I am currently not using an unstable version. > I have customers running on that machines so I can''t experiment. Can you please try with Debian 64bit?In your initial mail you said you were running a 4.1.3 hypervisor. Squeeze has 4.0. Which is it? Ian.> > > > Ian. > > > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@lists.xen.org > > http://lists.xen.org/xen-users > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-13 05:51 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote: > > > On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: > >> You can easiliy reproduce this issue, when using more that 3 Network > >> devices. > > > > I installed a PV guest on Debian Wheezy using 5 vifs all on the same > > bridge and could not reproduce this, the domain was successfully > > shutdown on reboot. > > > > I also replaced the Debian xendomains script with the one from > > xen-4.1-testing (since Debian''s differs). Still no repro. > > > Hi Ian, please try it with the stable squeeze one. Because I am currently not using an unstable version. > I have customers running on that machines so I can''t experiment.What about one of the several machines you have installed at home which show this issue?
Ian Campbell
2012-Sep-13 05:51 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Wed, 2012-09-12 at 22:31 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 10:13 AM, Ian Campbell wrote: > > The default is "vif-bridge". Have you changed the default? > > No I did not change this default script. Everything is at is has been delivered in the Xen Source package.No, it isn''t, you say below that you have changed xend-config.sxp> > > > If not then your configuration as shown will put three interfaces on the > > *same* bridge. Is this really what you want? > > No, because it will "not" put everything on the same bridge, because the default setting is "routed mode" due to the fact that > my providers network configuration has changed the routing. Therefore in xend-config.sxp we have the following disabled: > #(network-script network-route) > #(vif-script vif-route) > > and the next one enabled: > (network-script network-route) > (vif-script vif-route)So you have changed the default then, haven''t you! You have edited xend-config.sxp to change the default vif-script and network-script! What else have you changed from the defaults?> So basically you can see them as placeholder for the eth1, eth2 and eth4 devices in the Guest domU. > For the other 2 interfaces it is different. They should be bridged (different from default).Do you mean routed here? Do you understand the difference between routing and bridging?> Therefore I have to > put the "script=vif-bridge" in the config as shown above. See below the output of brctl show: > bridge name bridge id STP enabled interfaces > VLAN11 8000.000000000000 no > VLAN12 8000.000000000000 no > VLAN20 8000.feffffffffff no vif2.3 > VLAN30 8000.000000000000 no > VLAN40 8000.feffffffffff no vif2.5 > > > > > You claim above that the bridge is different for each interface, but > > unless you have changed something somewhere then this is not the case. > > Since you are having problems it is important to identify everything > > which you have changed from the defaults. > > No, I am saying that the bridge name is different. Not that the script is different. I am just creating isolated bridges > VLAN20, VLAN30, VLAN40, and so on in order to connect special network interfaces together from different domU''s.Except it turns out that half your interfaces aren''t even using bridging and aren''t using vif-bridge at all! Please, it is important to give all the facts and to be precise when you are asking people to debug a remote system.> >>>> List is empty. SysRQ -w and SysRQ-t shows nothing at all. > >>> > >>> You might need to increase the log verbosity with SysRQ-9 first? > >> > >> I did and now I got more Information. But due to the amount of data > which slips over the console screen I am not able > >> to record properly. Can you advice what to do here?Did you try SysRQ-w -- given the point at which your logs show the hang this should provide a much smaller amount of output than Sysrq-l and be much more manageable. In particular it will be useful to know what the "xenwatch" and "reboot" processes are waiting for.> > Did you add debugging to the hotplug scripts like I suggested a couple > > of mails back? > > No I didn''t up to now.Please do. For both the vif-bridge and vif-route scripts. Ian.
On Sep 13, 2012, at 7:38 AM, Ian Campbell wrote:> On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote: >> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote: >> >>> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: >>>> You can easiliy reproduce this issue, when using more that 3 Network >>>> devices. >>> >>> I installed a PV guest on Debian Wheezy using 5 vifs all on the same >>> bridge and could not reproduce this, the domain was successfully >>> shutdown on reboot. >>> >>> I also replaced the Debian xendomains script with the one from >>> xen-4.1-testing (since Debian''s differs). Still no repro. >>> >> Hi Ian, please try it with the stable squeeze one. Because I am currently not using an unstable version. >> I have customers running on that machines so I can''t experiment. Can you please try with Debian 64bit? > > In your initial mail you said you were running a 4.1.3 hypervisor. > Squeeze has 4.0. Which is it?I am using Debian 5.0.7 (SQUEEZE), with the 4.1.3 Hypervisor and 2.6.32-5-xen-amd64 kernel. I am not using the Standard Packaged Hypervisor (4.0) shipped with Debian SQUEEZE.> > Ian. > >> >> >>> Ian. >>> >>> >>> _______________________________________________ >>> Xen-users mailing list >>> Xen-users@lists.xen.org >>> http://lists.xen.org/xen-users >> >> >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xen.org >> http://lists.xen.org/xen-users > > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
On Sep 13, 2012, at 7:51 AM, Ian Campbell wrote:> On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote: >> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote: >> >>> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote: >>>> You can easiliy reproduce this issue, when using more that 3 Network >>>> devices. >>> >>> I installed a PV guest on Debian Wheezy using 5 vifs all on the same >>> bridge and could not reproduce this, the domain was successfully >>> shutdown on reboot. >>> >>> I also replaced the Debian xendomains script with the one from >>> xen-4.1-testing (since Debian''s differs). Still no repro. >>> >> Hi Ian, please try it with the stable squeeze one. Because I am currently not using an unstable version. >> I have customers running on that machines so I can''t experiment. > > What about one of the several machines you have installed at home which > show this issue? >I will try to install it on a WHEEZY machine for testing. But this will not help me, because at the end I need it working on Production machines. And I do not want to install this WHEEZY in production.> > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xen.org > http://lists.xen.org/xen-users
Ian Campbell
2012-Sep-13 08:58 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:> I will try to install it on a WHEEZY machine for testing. But this will not help me, because > at the end I need it working on Production machines. And I do not want to install this WHEEZY in production.Why not install a test system with exactly the same software as you use in production?
On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote: >> I will try to install it on a WHEEZY machine for testing. But this will not help me, because >> at the end I need it working on Production machines. And I do not want to install this WHEEZY in production. > > Why not install a test system with exactly the same software as you use > in production? >Yes, as I said, I will install a test system with the software I want to use. But even this is working afterwards, I will not use this in production, because wheezy is still not stable.>
Ian Campbell
2012-Sep-13 09:28 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote:> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote: > > > On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote: > >> I will try to install it on a WHEEZY machine for testing. But this will not help me, because > >> at the end I need it working on Production machines. And I do not want to install this WHEEZY in production. > > > > Why not install a test system with exactly the same software as you use > > in production? > > > Yes, as I said, I will install a test system with the software I want to use. But even this is working afterwards, I will > not use this in production, because wheezy is still not stable.Why Wheezy? What we need here is a system which has the same software as you use in production, which reproduces the issue and which you can play with and experiment with as much as you like without disturbing your customers. I''m not asking you to install Wheezy here, although if you think it will help and you can reproduce the issue with that configuration then go ahead. Please just be sure to be very clear about which exact environment any specific results you report were obtained in. Ian.
On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote:> On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote: >> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote: >> >>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote: >>>> I will try to install it on a WHEEZY machine for testing. But this will not help me, because >>>> at the end I need it working on Production machines. And I do not want to install this WHEEZY in production. >>> >>> Why not install a test system with exactly the same software as you use >>> in production? >>> >> Yes, as I said, I will install a test system with the software I want to use. But even this is working afterwards, I will >> not use this in production, because wheezy is still not stable. > > Why Wheezy? What we need here is a system which has the same software as > you use in production, which reproduces the issue and which you can play > with and experiment with as much as you like without disturbing your > customers. >I have that already as mentioned in my last threads. What I can do, is that I can provide the VM on a webserver for you, where you can download the VM. Than you have the environment where you can reproduce the issue. Is that OK for you. I am using Parallels for the Test-Setup System.> I''m not asking you to install Wheezy here, although if you think it will > help and you can reproduce the issue with that configuration then go > ahead. > > Please just be sure to be very clear about which exact environment any > specific results you report were obtained in. > > Ian. >
Ian Campbell
2012-Sep-13 09:51 UTC
Re: Dom0 crashed when rebooting whilst DomU are running
On Thu, 2012-09-13 at 10:44 +0100, Maik Brauer wrote:> On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote: > > > On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote: > >> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote: > >> > >>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote: > >>>> I will try to install it on a WHEEZY machine for testing. But this will not help me, because > >>>> at the end I need it working on Production machines. And I do not want to install this WHEEZY in production. > >>> > >>> Why not install a test system with exactly the same software as you use > >>> in production? > >>> > >> Yes, as I said, I will install a test system with the software I want to use. But even this is working afterwards, I will > >> not use this in production, because wheezy is still not stable. > > > > Why Wheezy? What we need here is a system which has the same software as > > you use in production, which reproduces the issue and which you can play > > with and experiment with as much as you like without disturbing your > > customers. > > > I have that already as mentioned in my last threads.The reason we are having this sub thread is that you said: "I have customers running on that machines so I can''t experiment.". If you have a test machine where you can reproduce the issue which you can experiment with then please feel free to use it.> What I can do, is that I can > provide the VM on a webserver for you, where you can download the VM. > Than you have the environment where you can reproduce the issue. Is that OK for you. > I am using Parallels for the Test-Setup System.I''m afraid I don''t have time any more to do anything other than suggest avenues for you to investigate and experiments for you to try. Ian.
On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote:> On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote: >> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote: >> >>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote: >>>> I will try to install it on a WHEEZY machine for testing. But this will not help me, because >>>> at the end I need it working on Production machines. And I do not want to install this WHEEZY in production. >>> >>> Why not install a test system with exactly the same software as you use >>> in production? >>> >> Yes, as I said, I will install a test system with the software I want to use. But even this is working afterwards, I will >> not use this in production, because wheezy is still not stable. > > Why Wheezy? What we need here is a system which has the same software as > you use in production, which reproduces the issue and which you can play > with and experiment with as much as you like without disturbing your > customers.I tried it on several VM''s and Dedicated Servers with SQUEEZE and the issue is still persistent.> > I''m not asking you to install Wheezy here, although if you think it will > help and you can reproduce the issue with that configuration then go > ahead.When using the Back-ported Kernel in SQUEEZE (vmlinuz-3.2.0-0.bpo.3-amd64) there is no issue anymore. So it seems that it is happening with the last SQUEEZE Kernel (vmlinuz-2.6.32-5-xen-amd64). This is definitely something in relation with this kernel.> > Please just be sure to be very clear about which exact environment any > specific results you report were obtained in. > > Ian. >