Keir Fraser
2004-May-14 06:35 UTC
[Xen-devel] Re: system lockup when starting secondary domains
> Good news: > I can now mount LUNs over iSCSI using the Adaptec HW initiator running > Adaptec''s driver in DOM0. > > The bad news is that when I try to export the LUN to another domain, the > machine stops responding. I''ve attached the kernel config for both dom0 > and the non-privileged domains as well as the configuration file I''m > using. > > Please let me know of anything I can do to help track this down. > > > Trivia: > DOM0 stops responding to ping after this. The second domain will start > responding to ping at some point - but ssh does not appear to be > starting. > > The LUN contains the same contents as the local IDE drive except for > /etc/sysconfig/network-scripts/ifcfg-eth0 and /etc/fstab.When you create a new domain, it''s virtual interface gets bridged to eth0. Unfortunately this means that eth0 loses IP abilities. The fix for now is to run a script something like the following before creating the first domain: /sbin/ifconfig nbe-br 128.232.38.20 netmask 255.255.240.0 up /usr/sbin/brctl addif nbe-br eth0 /sbin/ip r d 128.232.32.0/20 dev eth0 /sbin/ip r a 128.232.32.0/20 dev nbe-br /sbin/ip r d default via 128.232.32.1 dev eth0 /sbin/ip r a default via 128.232.32.1 dev nbe-br i.e., attach your IP/netmask to device nbe-br. Also, any routes that reference eth0 should be replaced with one that refers to nbe-br. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-May-14 20:56 UTC
Re: [Xen-devel] Re: system lockup when starting secondary domains
Thanks. Next question. On the console of the domain I''m creating I see: Checking root filesystem [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 /dev/sda3 is mounted. e2fsck: Cannot continue, aborting. [FAILED] *** An error occurred during the file system check. *** Dropping you to a shell; the system will reboot I''m going to disable fsck to work around this - but what am I likely doing wrong? -Kip> /usr/sbin/brctl addif nbe-br eth0 > /sbin/ip r d 128.232.32.0/20 dev eth0 > /sbin/ip r a 128.232.32.0/20 dev nbe-br > /sbin/ip r d default via 128.232.32.1 dev eth0 > /sbin/ip r a default via 128.232.32.1 dev nbe-br > > i.e., attach your IP/netmask to device nbe-br. Also, any routes that > reference eth0 should be replaced with one that refers to nbe-br. > > -- Keir > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Kip Macy
2004-May-14 21:02 UTC
never mind was Re: [Xen-devel] Re: system lockup when starting secondary domains
root has to be read only: cmdline_root = "root=/dev/sda3 ro" I had it set to rw -Kip On Fri, 14 May 2004, Kip Macy wrote:> Thanks. Next question. > > On the console of the domain I''m creating I see: > > Checking root filesystem > [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 > /dev/sda3 is mounted. e2fsck: Cannot continue, aborting. > > > [FAILED] > > *** An error occurred during the file system check. > *** Dropping you to a shell; the system will reboot > > I''m going to disable fsck to work around this - but what am I likely > doing wrong? > > > -Kip > > > > > /usr/sbin/brctl addif nbe-br eth0 > > /sbin/ip r d 128.232.32.0/20 dev eth0 > > /sbin/ip r a 128.232.32.0/20 dev nbe-br > > /sbin/ip r d default via 128.232.32.1 dev eth0 > > /sbin/ip r a default via 128.232.32.1 dev nbe-br > > > > i.e., attach your IP/netmask to device nbe-br. Also, any routes that > > reference eth0 should be replaced with one that refers to nbe-br. > > > > -- Keir > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: SourceForge.net Broadband > > Sign-up now for SourceForge Broadband and get the fastest > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Please remind me what the proper way is to interact with non-privileged consoles. Without any telnet negotation no control characters get transmitted and there is no notion of what type of terminal the client is running. -Kip On Fri, 14 May 2004, Kip Macy wrote:> Thanks. Next question. > > On the console of the domain I''m creating I see: > > Checking root filesystem > [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda3 > /dev/sda3 is mounted. e2fsck: Cannot continue, aborting. > > > [FAILED] > > *** An error occurred during the file system check. > *** Dropping you to a shell; the system will reboot > > I''m going to disable fsck to work around this - but what am I likely > doing wrong? > > > -Kip > > > > > /usr/sbin/brctl addif nbe-br eth0 > > /sbin/ip r d 128.232.32.0/20 dev eth0 > > /sbin/ip r a 128.232.32.0/20 dev nbe-br > > /sbin/ip r d default via 128.232.32.1 dev eth0 > > /sbin/ip r a default via 128.232.32.1 dev nbe-br > > > > i.e., attach your IP/netmask to device nbe-br. Also, any routes that > > reference eth0 should be replaced with one that refers to nbe-br. > > > > -- Keir > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: SourceForge.net Broadband > > Sign-up now for SourceForge Broadband and get the fastest > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Please remind me what the proper way is to interact with non-privileged > consoles. Without any telnet negotation no control characters get > transmitted and there is no notion of what type of terminal the client > is running."xencons <machine> <port>" is what we use. Any raw terminal program should work. xend should probably have support to spot a telnet client and do the necessary negotiation to put the client into raw character mode. Ian ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
That works great. I just did a shutdown -r now in the non-privileged domain and now I''m seeing an endless stream of the messages below. DOM0 is now unresponsive. Any suggestions? Thanks for your help. KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) KERNEL: assertion (skb==NULL || before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) KERNEL: assertion (skb==NULL || before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) KERNEL: assertion (skb==NULL || before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) KERNEL: assertion (skb==NULL || before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) On Fri, 14 May 2004, Ian Pratt wrote:> > Please remind me what the proper way is to interact with non-privileged > > consoles. Without any telnet negotation no control characters get > > transmitted and there is no notion of what type of terminal the client > > is running. > > "xencons <machine> <port>" is what we use. Any raw terminal > program should work. > > xend should probably have support to spot a telnet client and do > the necessary negotiation to put the client into raw character > mode. > > Ian > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
The machine now locks up while spitting out the error message below when the non-privileged domain is initially *started*. Let me know what information you need. -Kip On Fri, 14 May 2004, Kip Macy wrote:> > That works great. > > I just did a shutdown -r now in the non-privileged domain and now I''m > seeing an endless stream of the messages below. DOM0 is now > unresponsive. Any suggestions? > > Thanks for your help. > > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > > On Fri, 14 May 2004, Ian Pratt wrote: > > > > Please remind me what the proper way is to interact with non-privileged > > > consoles. Without any telnet negotation no control characters get > > > transmitted and there is no notion of what type of terminal the client > > > is running. > > > > "xencons <machine> <port>" is what we use. Any raw terminal > > program should work. > > > > xend should probably have support to spot a telnet client and do > > the necessary negotiation to put the client into raw character > > mode. > > > > Ian > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: SourceForge.net Broadband > > Sign-up now for SourceForge Broadband and get the fastest > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
And this is the last thing that the non-priv domain prints out before the network goes dead: Binding to the NIS domain: [ OK ] Listening for an NIS domain server. Starting automount: On Fri, 14 May 2004, Kip Macy wrote:> The machine now locks up while spitting out the error message below when > the non-privileged domain is initially *started*. > > Let me know what information you need. > > -Kip > > On Fri, 14 May 2004, Kip Macy wrote: > > > > > That works great. > > > > I just did a shutdown -r now in the non-privileged domain and now I''m > > seeing an endless stream of the messages below. DOM0 is now > > unresponsive. Any suggestions? > > > > Thanks for your help. > > > > > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > > (flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603) > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > > > > > On Fri, 14 May 2004, Ian Pratt wrote: > > > > > > Please remind me what the proper way is to interact with non-privileged > > > > consoles. Without any telnet negotation no control characters get > > > > transmitted and there is no notion of what type of terminal the client > > > > is running. > > > > > > "xencons <machine> <port>" is what we use. Any raw terminal > > > program should work. > > > > > > xend should probably have support to spot a telnet client and do > > > the necessary negotiation to put the client into raw character > > > mode. > > > > > > Ian > > > > > > > > > ------------------------------------------------------- > > > This SF.Net email is sponsored by: SourceForge.net Broadband > > > Sign-up now for SourceForge Broadband and get the fastest > > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: SourceForge.net Broadband > > Sign-up now for SourceForge Broadband and get the fastest > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> The machine now locks up while spitting out the error message below when > the non-privileged domain is initially *started*. > > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt ||I''ve never seen anything like this. Did you build the kernel yourself? What version of gcc? (We use 3.2.2 as per RH9) Can you reproduce with one of our nightly builds? The TCP stack is clearly seriously confused. It''s hard to imagine how Xen could cause this. Ian ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Do your nightly builds have xenolinux binaries for nodev xen? -Kip On Fri, 14 May 2004, Ian Pratt wrote:> > The machine now locks up while spitting out the error message below when > > the non-privileged domain is initially *started*. > > > > > KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540) > > > KERNEL: assertion (skb==NULL || before(tp->copied_seq, > > > TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290) > > > KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || > > I''ve never seen anything like this. Did you build the kernel > yourself? What version of gcc? (We use 3.2.2 as per RH9) > > Can you reproduce with one of our nightly builds? > > The TCP stack is clearly seriously confused. It''s hard to imagine > how Xen could cause this. > > Ian >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Does xc_linux_save.c need to change for ngio? The following command: ./xc_dom_control.py suspend 4 /tmp/xen-vm0.core never completes. This all I see in the output of strace (many times over) mlock(0xbffff170, 72) = 0 ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 munlock(0xbffff170, 72) = 0 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) mlock(0xbffff170, 72) = 0 ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 munlock(0xbffff170, 72) = 0 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) mlock(0xbffff170, 72) = 0 ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 -Kip ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
It looks like a change in the DOM0 interface:>>> xc.domain_getinfo()[{''cpu_time'': 297676674252L, ''stopped'': 0, ''name'': ''Domain-0'', ''mem_kb'': 257112, ''dom'': 0L, ''running'': 1, ''maxmem_kb'': 262144, ''cpu'': 0}, {''cpu_time'': 9552521165L, ''stopped'': 0, ''name'': ''This is VM 2'', ''mem_kb'': 65536, ''dom'': 6L, ''running'': 0, ''maxmem_kb'': 65536, ''cpu'': 0}] but the domain is in fact stopped. -Kip On Fri, 14 May 2004, Kip Macy wrote:> Does xc_linux_save.c need to change for ngio? > > The following command: > ./xc_dom_control.py suspend 4 /tmp/xen-vm0.core > > never completes. > > This all I see in the output of strace (many times over) > > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > munlock(0xbffff170, 72) = 0 > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > munlock(0xbffff170, 72) = 0 > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > > > > -Kip > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Suspend/resume won''t work with ngio at the moment. It''ll be a few weeks at least before we tackle merging the two features. -- Keir> Does xc_linux_save.c need to change for ngio? > > The following command: > ./xc_dom_control.py suspend 4 /tmp/xen-vm0.core > > never completes. > > This all I see in the output of strace (many times over) > > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > munlock(0xbffff170, 72) = 0 > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > munlock(0xbffff170, 72) = 0 > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > mlock(0xbffff170, 72) = 0 > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > > > > -Kip > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ok - thanks. Is this because all the dependencies for stopping a domain are not in place? Or are the interfaces in flux in general? What I''m trying to do is get the bits in place for core debugging of non- privileged domains. -Kip On Sat, 15 May 2004, Keir Fraser wrote:> > Suspend/resume won''t work with ngio at the moment. It''ll be a few > weeks at least before we tackle merging the two features. > > -- Keir > > > Does xc_linux_save.c need to change for ngio? > > > > The following command: > > ./xc_dom_control.py suspend 4 /tmp/xen-vm0.core > > > > never completes. > > > > This all I see in the output of strace (many times over) > > > > mlock(0xbffff170, 72) = 0 > > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > > munlock(0xbffff170, 72) = 0 > > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > > mlock(0xbffff170, 72) = 0 > > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > > munlock(0xbffff170, 72) = 0 > > select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > > mlock(0xbffff170, 72) = 0 > > ioctl(3, SNDCTL_DSP_RESET, 0xbffff130) = 0 > > > > > > > > -Kip > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: SourceForge.net Broadband > > Sign-up now for SourceForge Broadband and get the fastest > > 6.0/768 connection for only $19.95/mo for the first 3 months! > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xen-devel > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Ok - thanks. Is this because all the dependencies for stopping a domain > are not in place? Or are the interfaces in flux in general? What I''m > trying to do is get the bits in place for core debugging of non- > privileged domains.xend doesn''t do setup/teardown of i/o connections properly yet. What''s there is a very basic lashup to create very simple configurations -- but not enough to suspend/resume them. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
questions: 1) Does this extend as far as not being able to stop without destroying at all? kmacy@curly r ./xc_dom_control.py list Dom Name Mem(kb) CPU State Time(ms) 0 Domain-0 257128 0 r- 28516 1 This is VM 2 64920 0 -- 5227 kmacy@curly r ./xc_dom_control.py stop 1 return code 0 kmacy@curly r ./xc_dom_control.py list Dom Name Mem(kb) CPU State Time(ms) 0 Domain-0 257052 0 r- 28769 1 This is VM 2 64996 0 -- 5245 I can still interact with the domain over its console. ===========================================================2) I take it that many of the following are expected right now when destroying a domain with I/O in flight: (XEN) DOM0: (file=memory.c, line=935) Unknown domain ''2'' (file=main.c, line=266) Failed MMU update transferring to DOM2 =======================================================3) I just did the following: [root@xen-vm0 ~]$ while (1) while? dd if=/dev/zero of=/tmp/bwout count=1024 bs=1024k while? end 1024+0 records in 1024+0 records out 1024+0 records in 1024+0 records out 1024+0 records in 1024+0 records out 1024+0 records in 1024+0 records out 1024+0 records in 1024+0 records out and then I saw this on the machine console: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process python __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process syslogd __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process sendmail __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0xf0/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process ypbind __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process ypbind __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process sshd __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process sshd __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process tcsh __alloc_pages: 0-order allocation failed (gfp=0xf0/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) __alloc_pages: 0-order allocation failed (gfp=0xf0/0) __alloc_pages: 0-order allocation failed (gfp=0xf0/0) __alloc_pages: 0-order allocation failed (gfp=0xf0/0) __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process crond __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process crond (XEN) (file=traps.c, line=469) GPF (0004): fc520e08 -> fc52e2d2 __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process portmap __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) VM: killing process umount I guess memory management is a work in progress? Thanks. -Kip On Sat, 15 May 2004, Keir Fraser wrote:> > Ok - thanks. Is this because all the dependencies for stopping a domain > > are not in place? Or are the interfaces in flux in general? What I''m > > trying to do is get the bits in place for core debugging of non- > > privileged domains. > > xend doesn''t do setup/teardown of i/o connections properly yet. What''s > there is a very basic lashup to create very simple configurations -- > but not enough to suspend/resume them. > > -- Keir > > > ------------------------------------------------------- > This SF.Net email is sponsored by: SourceForge.net Broadband > Sign-up now for SourceForge Broadband and get the fastest > 6.0/768 connection for only $19.95/mo for the first 3 months! > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xen-devel >------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> questions: > 1) > Does this extend as far as not being able to stop without destroying at > all? > I can still interact with the domain over its console.I checked in a fix for this a couple of hours ago -- it was stopping a domain from dying except via a forced destroy from DOM0 (e.g., /sbin/reboot within the domain itself wouldn''t work). So a stop request should now stop the domain. Won''t be much use though as I''m pretty sure it won''t start up again happily!> ===========================================================> 2) > I take it that many of the following are expected right now when > destroying a domain with I/O in flight: > > (XEN) DOM0: (file=memory.c, line=935) Unknown domain ''2'' > (file=main.c, line=266) Failed MMU update transferring to DOM2Yep, I see this. As I said: xend can just about set up a basic interface between a guest and a device-driver backend. It''s not got functionality for tearing the interface down properly, which leaves the backend driver in a confused state, getting you a bunch of (fairly harmless) errors.> [root@xen-vm0 ~]$ while (1) > while? dd if=/dev/zero of=/tmp/bwout count=1024 bs=1024k > while? end > 1024+0 records in > 1024+0 records out > 1024+0 records in > 1024+0 records out > 1024+0 records in > 1024+0 records out > 1024+0 records in > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) > VM: killing process umount > I guess memory management is a work in progress?This is within DOM1 (i.e., not DOM0) right? If so, I guess that doing this ''dd'' test within DOM0 doesn''t get you similar messages? This is rather unexpected -- if you could add a stack backtrace to the out-of-memory path in the page allocator (page_alloc.c in Xenolinux) an d post me that with the kernel image (vmlinux) then I''ll see what I can work out. I guess I haven''t tested all that hard so there might be a memory leak. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
The dd is running in DOM1. The OOM killer is getting run in DOM0. There is clearly a memory leak in the block I/O path. DOM0 is curly and DOM1 is xen-vm0. A large amount of memory has already been leaked: kmacy@curly cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 262565888 205619200 56946688 0 23339008 28123136 =[root@xen-vm0 ~]$ dd if=/dev/zero of=/tmp/bwout bs=1024k count=256 =kmacy@curly cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 262565888 214687744 47878144 0 23339008 28123136 =[root@xen-vm0 ~]$ dd if=/dev/zero of=/tmp/bwout count=256 bs=1024k 256+0 records in 256+0 records out =kmacy@curly cat /proc/meminfo | head -3 total: used: free: shared: buffers: cached: Mem: 262565888 223727616 38838272 0 23339008 28123136 =[root@xen-vm0 ~]$ dd if=/dev/zero of=/tmp/bwout count=256 bs=1024k 256+0 records in 256+0 records out =kmacy@curly cat /proc/meminfo | head -2 total: used: free: shared: buffers: cached: Mem: 262565888 232873984 29691904 0 23339008 28123136 So ~40MB is leaked for every 1GB transferred. I can give you a stack backtrace of the memory allocation failure in DOM0 if you like, but as far as I can tell the horse has long since left the barn at that point.> This is within DOM1 (i.e., not DOM0) right? If so, I guess that doing > this ''dd'' test within DOM0 doesn''t get you similar messages? > > This is rather unexpected -- if you could add a stack backtrace to the > out-of-memory path in the page allocator (page_alloc.c in Xenolinux) > an d post me that with the kernel image (vmlinux) then I''ll see what I > can work out. I guess I haven''t tested all that hard so there might be > a memory leak.On a side note - I don''t need suspend/restore, I just need coredump and almost immediately after that PTRACE_STOP. So long as I can stop the domain long enough to write out its state I have what I need. ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> The dd is running in DOM1. The OOM killer is getting run in DOM0. > There is clearly a memory leak in the block I/O path.Now fixed. It turned out to be rather blatant.> On a side note - I don''t need suspend/restore, I just need coredump and > almost immediately after that PTRACE_STOP. So long as I can stop the > domain long enough to write out its state I have what I need.A pause operation will be coming up soon, as part of a cleanup of the scheduler interface in Xen. This will fix the problem that there''s currently no way to stop a domain without having it suspend itself. -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: SourceForge.net Broadband Sign-up now for SourceForge Broadband and get the fastest 6.0/768 connection for only $19.95/mo for the first 3 months! http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel