Hi all, Running Xen 4.2.3 with all the current XSA fixes. Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a (null) state - which I can''t seem to kill / destroy / clean up from. # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1579 2 r----- 322927.2 (null) 1 0 1 --psrd 14075.9 (null) 2 0 1 --psrd 58467.6 (null) 3 0 0 --ps-d 11604.8 (null) 4 0 2 --p--d 24186.1 (null) 5 0 2 --ps-d 22831.0 The config is very simple: # cat /etc/xen/remotedesktop.vm name = "remotedesktop.vm" memory = 1536 vcpus = 2 cpus = "1-3" cpu_weight = 128 disk = [ ''phy:/dev/vg_raid1/remotedesktop.vm,hda,w'' , ''file:/root/win7x86.iso,hdc:cdrom,r'' ] vif = [ ''mac=98:95:00:07:07:07, bridge=br203, vifname=vm.rdp'' ] builder = "hvm" usbdevice = "tablet" vnc = 1 vnclisten = "10.1.1.1" vncdisplay = 1 # port 5901 vncpasswd = '''' localtime = 1 viridian = 1 xen_platform_pci= 1 It seems that this happens no matter what I do. Has anyone come across this before? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 23/11/13 16:14, Steven Haigh wrote:> Hi all, > > Running Xen 4.2.3 with all the current XSA fixes. > > Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a > (null) state - which I can''t seem to kill / destroy / clean up from. > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 1579 2 r----- > 322927.2 > (null) 1 0 1 --psrd > 14075.9 > (null) 2 0 1 --psrd > 58467.6 > (null) 3 0 0 --ps-d > 11604.8 > (null) 4 0 2 --p--d > 24186.1 > (null) 5 0 2 --ps-d > 22831.0 > > The config is very simple: > # cat /etc/xen/remotedesktop.vm > name = "remotedesktop.vm" > memory = 1536 > vcpus = 2 > cpus = "1-3" > cpu_weight = 128 > disk = [ ''phy:/dev/vg_raid1/remotedesktop.vm,hda,w'' , > ''file:/root/win7x86.iso,hdc:cdrom,r'' ] > vif = [ ''mac=98:95:00:07:07:07, bridge=br203,vifname=vm.rdp'' ]> builder = "hvm" > usbdevice = "tablet" > vnc = 1 > vnclisten = "10.1.1.1" > vncdisplay = 1 # port 5901 > vncpasswd = '''' > localtime = 1 > viridian = 1 > xen_platform_pci= 1 > > It seems that this happens no matter what I do. > > Has anyone come across this before?When you have a system in this state, can you run xl debug-keys q xl dmesg > xen-dmesg.log And provide the log file. Most likely, there will be one unfreed page keeping the domain around as a zombie. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Saturday, November 23, 2013, 5:18:43 PM, you wrote:> On 23/11/13 16:14, Steven Haigh wrote: >> Hi all, >> >> Running Xen 4.2.3 with all the current XSA fixes. >> >> Whenever I shutdown / reboot a Windows HVM DomU, it ends up going into a >> (null) state - which I can''t seem to kill / destroy / clean up from. >> >> # xl list >> Name ID Mem VCPUs State >> Time(s) >> Domain-0 0 1579 2 r----- >> 322927.2 >> (null) 1 0 1 --psrd >> 14075.9 >> (null) 2 0 1 --psrd >> 58467.6 >> (null) 3 0 0 --ps-d >> 11604.8 >> (null) 4 0 2 --p--d >> 24186.1 >> (null) 5 0 2 --ps-d >> 22831.0 >> >> The config is very simple: >> # cat /etc/xen/remotedesktop.vm >> name = "remotedesktop.vm" >> memory = 1536 >> vcpus = 2 >> cpus = "1-3" >> cpu_weight = 128 >> disk = [ ''phy:/dev/vg_raid1/remotedesktop.vm,hda,w'' , >> ''file:/root/win7x86.iso,hdc:cdrom,r'' ] >> vif = [ ''mac=98:95:00:07:07:07, bridge=br203, > vifname=vm.rdp'' ] >> builder = "hvm" >> usbdevice = "tablet" >> vnc = 1 >> vnclisten = "10.1.1.1" >> vncdisplay = 1 # port 5901 >> vncpasswd = '''' >> localtime = 1 >> viridian = 1 >> xen_platform_pci= 1 >> >> It seems that this happens no matter what I do. >> >> Has anyone come across this before?> When you have a system in this state, can you run> xl debug-keys qxl dmesg >> xen-dmesg.log> And provide the log file.> Most likely, there will be one unfreed page keeping the domain around as > a zombie.> ~AndrewWould it be possible to leave the domainname to something else as "(null)" when such a state occurs, the xendomains script f.e. seems to interpret this literally and bails out without shutting down any other domains. -- Sander
On Sun, Nov 24, Steven Haigh wrote:> Running Xen 4.2.3 with all the current XSA fixes.How exactly did you start the guests? Does ''ps faxu'' show qemu processes for the listed domain_ids? What is the ''xenstore-ls -f | sort'' output? Olaf
On 24/11/13 06:27, Olaf Hering wrote:> On Sun, Nov 24, Steven Haigh wrote: > >> Running Xen 4.2.3 with all the current XSA fixes. > > How exactly did you start the guests?The DomUs were started with: xl create /etc/xen/<configfile>> Does ''ps faxu'' show qemu processes for the listed domain_ids? > What is the ''xenstore-ls -f | sort'' output?I''ll have to check this when I manage to reproduce it. So far, I have been unable to get a reliable way to reproduce it. I managed to get a system to do it every time a HVM DomU was shutdown OR restarted - but after a reboot of the Dom0 I can''t get it into that state again. As soon as I can get a system in this state again, I''ll leave it to see what information I can extract. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 24/11/13 06:38, Steven Haigh wrote:> On 24/11/13 06:27, Olaf Hering wrote: >> On Sun, Nov 24, Steven Haigh wrote: >> >>> Running Xen 4.2.3 with all the current XSA fixes. >> >> How exactly did you start the guests? > > The DomUs were started with: xl create /etc/xen/<configfile> > >> Does ''ps faxu'' show qemu processes for the listed domain_ids? >> What is the ''xenstore-ls -f | sort'' output? > > I''ll have to check this when I manage to reproduce it. So far, I have > been unable to get a reliable way to reproduce it. I managed to get a > system to do it every time a HVM DomU was shutdown OR restarted - but > after a reboot of the Dom0 I can''t get it into that state again. > > As soon as I can get a system in this state again, I''ll leave it to see > what information I can extract.Ha! As always, as soon as I send this, I notice its happened on a Dom0. # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1579 2 r----- 2731.3 planner.vm 1 1013 1 -b---- 189.3 (null) 2 0 1 --psrd 301.1 tracker.vm 3 1013 2 -b---- 834.4 Attached is the output of: # xl debug-keys q # xl dmesg > xen-dmesg.log # gzip xen-dmesg.log> Does ''ps faxu'' show qemu processes for the listed domain_ids?I only see a qemu process for the running DomUs - no dead or extra ones.> What is the ''xenstore-ls -f | sort'' output?Attached as xenstore-ls.log.gz -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 23/11/13 19:56, Steven Haigh wrote:> On 24/11/13 06:38, Steven Haigh wrote: >> On 24/11/13 06:27, Olaf Hering wrote: >>> On Sun, Nov 24, Steven Haigh wrote: >>> >>>> Running Xen 4.2.3 with all the current XSA fixes. >>> >>> How exactly did you start the guests? >> >> The DomUs were started with: xl create /etc/xen/<configfile> >> >>> Does ''ps faxu'' show qemu processes for the listed domain_ids? >>> What is the ''xenstore-ls -f | sort'' output? >> >> I''ll have to check this when I manage to reproduce it. So far, I have >> been unable to get a reliable way to reproduce it. I managed to get a >> system to do it every time a HVM DomU was shutdown OR restarted - but >> after a reboot of the Dom0 I can''t get it into that state again. >> >> As soon as I can get a system in this state again, I''ll leave it to see >> what information I can extract. > > Ha! As always, as soon as I send this, I notice its happened on a Dom0. > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 1579 2 r----- > 2731.3 > planner.vm 1 1013 1 -b---- > 189.3 > (null) 2 0 1 --psrd > 301.1 > tracker.vm 3 1013 2 -b---- > 834.4 > > Attached is the output of: > # xl debug-keys q > # xl dmesg > xen-dmesg.log > # gzip xen-dmesg.logOk - from dmesg. (XEN) General information for domain 2: (XEN) refcnt=1 dying=2 pause_count=2 (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 dirty_cpus={} max_pages=262400 (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 (XEN) paging assistance: hap refcounts translate external ... (XEN) Memory pages belonging to domain 2: (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 (XEN) PoD entries=0 cachesize=0 So there are indeed two outstanding pages causing this domain to become a zombie. They are normal pages, with 1 outstanding ref. Can you collect "xl debug-keys g" as well? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 24/11/13 07:03, Andrew Cooper wrote:> On 23/11/13 19:56, Steven Haigh wrote: >> On 24/11/13 06:38, Steven Haigh wrote: >>> On 24/11/13 06:27, Olaf Hering wrote: >>>> On Sun, Nov 24, Steven Haigh wrote: >>>> >>>>> Running Xen 4.2.3 with all the current XSA fixes. >>>> >>>> How exactly did you start the guests? >>> >>> The DomUs were started with: xl create /etc/xen/<configfile> >>> >>>> Does ''ps faxu'' show qemu processes for the listed domain_ids? >>>> What is the ''xenstore-ls -f | sort'' output? >>> >>> I''ll have to check this when I manage to reproduce it. So far, I have >>> been unable to get a reliable way to reproduce it. I managed to get a >>> system to do it every time a HVM DomU was shutdown OR restarted - but >>> after a reboot of the Dom0 I can''t get it into that state again. >>> >>> As soon as I can get a system in this state again, I''ll leave it to see >>> what information I can extract. >> >> Ha! As always, as soon as I send this, I notice its happened on a Dom0. >> >> # xl list >> Name ID Mem VCPUs State >> Time(s) >> Domain-0 0 1579 2 r----- >> 2731.3 >> planner.vm 1 1013 1 -b---- >> 189.3 >> (null) 2 0 1 --psrd >> 301.1 >> tracker.vm 3 1013 2 -b---- >> 834.4 >> >> Attached is the output of: >> # xl debug-keys q >> # xl dmesg > xen-dmesg.log >> # gzip xen-dmesg.log > > Ok - from dmesg. > > (XEN) General information for domain 2: > (XEN) refcnt=1 dying=2 pause_count=2 > (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 > dirty_cpus={} max_pages=262400 > (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 > (XEN) paging assistance: hap refcounts translate external > ... > (XEN) Memory pages belonging to domain 2: > (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 > (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 > (XEN) PoD entries=0 cachesize=0 > > > So there are indeed two outstanding pages causing this domain to become > a zombie. They are normal pages, with 1 outstanding ref. > > Can you collect "xl debug-keys g" as well?Sure - attached. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 23/11/13 20:09, Steven Haigh wrote:> On 24/11/13 07:03, Andrew Cooper wrote: >> On 23/11/13 19:56, Steven Haigh wrote: >>> On 24/11/13 06:38, Steven Haigh wrote: >>>> On 24/11/13 06:27, Olaf Hering wrote: >>>>> On Sun, Nov 24, Steven Haigh wrote: >>>>> >>>>>> Running Xen 4.2.3 with all the current XSA fixes. >>>>> >>>>> How exactly did you start the guests? >>>> >>>> The DomUs were started with: xl create /etc/xen/<configfile> >>>> >>>>> Does ''ps faxu'' show qemu processes for the listed domain_ids? >>>>> What is the ''xenstore-ls -f | sort'' output? >>>> >>>> I''ll have to check this when I manage to reproduce it. So far, I have >>>> been unable to get a reliable way to reproduce it. I managed to get a >>>> system to do it every time a HVM DomU was shutdown OR restarted - but >>>> after a reboot of the Dom0 I can''t get it into that state again. >>>> >>>> As soon as I can get a system in this state again, I''ll leave it to see >>>> what information I can extract. >>> >>> Ha! As always, as soon as I send this, I notice its happened on a Dom0. >>> >>> # xl list >>> Name ID Mem VCPUs State >>> Time(s) >>> Domain-0 0 1579 2 r----- >>> 2731.3 >>> planner.vm 1 1013 1 -b---- >>> 189.3 >>> (null) 2 0 1 --psrd >>> 301.1 >>> tracker.vm 3 1013 2 -b---- >>> 834.4 >>> >>> Attached is the output of: >>> # xl debug-keys q >>> # xl dmesg > xen-dmesg.log >>> # gzip xen-dmesg.log >> >> Ok - from dmesg. >> >> (XEN) General information for domain 2: >> (XEN) refcnt=1 dying=2 pause_count=2 >> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 >> dirty_cpus={} max_pages=262400 >> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 >> (XEN) paging assistance: hap refcounts translate external >> ... >> (XEN) Memory pages belonging to domain 2: >> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 >> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 >> (XEN) PoD entries=0 cachesize=0 >> >> >> So there are indeed two outstanding pages causing this domain to become >> a zombie. They are normal pages, with 1 outstanding ref. >> >> Can you collect "xl debug-keys g" as well? > > Sure - attached.(XEN) -------- active -------- -------- shared -------- (XEN) [ref] localdom mfn pin localdom gmfn flags (XEN) grant-table for remote domain: 2 (v1) (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19 (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19 Ok - so domain 2 has two outstanding grants. This explains why it is a zombie. Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but seemingly unmapped. I will have to defer to someone who knows the grant code better. Is it possible for a domain to be a zombie just because it has two grants it hasn''t manually invalidated? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:> On 23/11/13 20:09, Steven Haigh wrote: >> On 24/11/13 07:03, Andrew Cooper wrote: >>> On 23/11/13 19:56, Steven Haigh wrote: >>>> On 24/11/13 06:38, Steven Haigh wrote: >>>>> On 24/11/13 06:27, Olaf Hering wrote: >>>>>> On Sun, Nov 24, Steven Haigh wrote: >>>>>> >>>>>>> Running Xen 4.2.3 with all the current XSA fixes. >>>>>> >>>>>> How exactly did you start the guests? >>>>> >>>>> The DomUs were started with: xl create /etc/xen/<configfile> >>>>> >>>>>> Does ''ps faxu'' show qemu processes for the listed domain_ids? >>>>>> What is the ''xenstore-ls -f | sort'' output? >>>>> >>>>> I''ll have to check this when I manage to reproduce it. So far, I have >>>>> been unable to get a reliable way to reproduce it. I managed to get a >>>>> system to do it every time a HVM DomU was shutdown OR restarted - but >>>>> after a reboot of the Dom0 I can''t get it into that state again. >>>>> >>>>> As soon as I can get a system in this state again, I''ll leave it to see >>>>> what information I can extract. >>>> >>>> Ha! As always, as soon as I send this, I notice its happened on a Dom0. >>>> >>>> # xl list >>>> Name ID Mem VCPUs State >>>> Time(s) >>>> Domain-0 0 1579 2 r----- >>>> 2731.3 >>>> planner.vm 1 1013 1 -b---- >>>> 189.3 >>>> (null) 2 0 1 --psrd >>>> 301.1 >>>> tracker.vm 3 1013 2 -b---- >>>> 834.4 >>>> >>>> Attached is the output of: >>>> # xl debug-keys q >>>> # xl dmesg > xen-dmesg.log >>>> # gzip xen-dmesg.log >>> >>> Ok - from dmesg. >>> >>> (XEN) General information for domain 2: >>> (XEN) refcnt=1 dying=2 pause_count=2 >>> (XEN) nr_pages=2 xenheap_pages=0 shared_pages=0 paged_pages=0 >>> dirty_cpus={} max_pages=262400 >>> (XEN) handle=ef58ef1a-784d-4e59-8079-42bdee87f219 vm_assist=00000000 >>> (XEN) paging assistance: hap refcounts translate external >>> ... >>> (XEN) Memory pages belonging to domain 2: >>> (XEN) DomPage 00000000000866e0: caf=00000001, taf=0000000000000000 >>> (XEN) DomPage 00000000000866e1: caf=00000001, taf=0000000000000000 >>> (XEN) PoD entries=0 cachesize=0 >>> >>> >>> So there are indeed two outstanding pages causing this domain to become >>> a zombie. They are normal pages, with 1 outstanding ref. >>> >>> Can you collect "xl debug-keys g" as well? >> >> Sure - attached. > > (XEN) -------- active -------- -------- shared -------- > (XEN) [ref] localdom mfn pin localdom gmfn flags > (XEN) grant-table for remote domain: 2 (v1) > (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19 > (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19 > > Ok - so domain 2 has two outstanding grants. This explains why it is a > zombie. > > Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but > seemingly unmapped. >I didn''t go through the whole thread, is there any chance you upgraded your Dom0 kernel? It is possible that you miss some upstream patches. Check out <527B8465.6050901@citrix.com> Wei.> I will have to defer to someone who knows the grant code better. Is it > possible for a domain to be a zombie just because it has two grants it > hasn''t manually invalidated? > > ~Andrew > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
On 25/11/2013 4:14 AM, Wei Liu wrote:> On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> (XEN) -------- active -------- -------- shared -------- >> (XEN) [ref] localdom mfn pin localdom gmfn flags >> (XEN) grant-table for remote domain: 2 (v1) >> (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19 >> (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19 >> >> Ok - so domain 2 has two outstanding grants. This explains why it is a >> zombie. >> >> Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but >> seemingly unmapped. >> > > I didn''t go through the whole thread, is there any chance you upgraded > your Dom0 kernel? > > It is possible that you miss some upstream patches.The Dom0 kernel is currently 3.11.7 on the system I''ve seen the problem on after only a few hours of uptime. I''m in the middle of pushing 3.11.9 to that system. I use the vanilla kernel from kernel.org for all my Dom0 systems. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote:> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, > the xendomains script f.e. seems to interpret this literally and bails out without shutting down > any other domains.I guess it should be a one liner, so please submit a patch. Not sure what alternative string should be used, since you would want to avoid clashing with any potential real domain''s name. From that PoV it might be better to teach xendomains to ignore such domains. Ian.
On 25/11/13 21:36, Ian Campbell wrote:> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: >> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, >> the xendomains script f.e. seems to interpret this literally and bails out without shutting down >> any other domains. > > I guess it should be a one liner, so please submit a patch. Not sure > what alternative string should be used, since you would want to avoid > clashing with any potential real domain''s name. > > From that PoV it might be better to teach xendomains to ignore such > domains.This is how I actually found this problem in the first place - xendomains (I rewrite the default script) waited until the failsafe timeout before it rebooted the system. I could filter out DomUs that have (null) as the ''name'' - but I wasn''t sure the correct course of action here. -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Monday, November 25, 2013, 11:36:05 AM, you wrote:> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: >> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, >> the xendomains script f.e. seems to interpret this literally and bails out without shutting down >> any other domains.> I guess it should be a one liner, so please submit a patch. Not sure > what alternative string should be used, since you would want to avoid > clashing with any potential real domain''s name.I didn''t immediately spot the place where it was set to "null". Yes that''s a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don''t do that" category.> From that PoV it might be better to teach xendomains to ignore such > domains.From what i remember i also couldn''t use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name). Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk''s and echoing to the user ?> Ian.
On 25/11/13 21:50, Sander Eikelenboom wrote:> > Monday, November 25, 2013, 11:36:05 AM, you wrote: > >> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: >>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, >>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down >>> any other domains. > >> I guess it should be a one liner, so please submit a patch. Not sure >> what alternative string should be used, since you would want to avoid >> clashing with any potential real domain''s name. > > I didn''t immediately spot the place where it was set to "null". > Yes that''s a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don''t do that" category. > >> From that PoV it might be better to teach xendomains to ignore such >> domains. > > From what i remember i also couldn''t use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name). > Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk''s and echoing to the user ?Correct - once a domain enters the (null) state, you cannot use ''xl destroy'' to kill the domain. As in my first post, the domain ID still exists, but it cannot be used. Is this a toolset bug? -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, 2013-11-25 at 21:56 +1100, Steven Haigh wrote:> On 25/11/13 21:50, Sander Eikelenboom wrote: > > > > Monday, November 25, 2013, 11:36:05 AM, you wrote: > > > >> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: > >>> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, > >>> the xendomains script f.e. seems to interpret this literally and bails out without shutting down > >>> any other domains. > > > >> I guess it should be a one liner, so please submit a patch. Not sure > >> what alternative string should be used, since you would want to avoid > >> clashing with any potential real domain''s name. > > > > I didn''t immediately spot the place where it was set to "null". > > Yes that''s a problem, though domainnaming has more restrictions (like using "0" (or any other number that is also a domain-id) as domainname) in the "just don''t do that" category. > > > >> From that PoV it might be better to teach xendomains to ignore such > >> domains. > > > > From what i remember i also couldn''t use "xl destroy" on such a domain (though i probably should by using the domain number instead of the name). > > Perhaps the toolscripts should just uses the domain-id numbers instead of names for anything except printk''s and echoing to the user ? > > Correct - once a domain enters the (null) state, you cannot use ''xl > destroy'' to kill the domain. As in my first post, the domain ID still > exists, but it cannot be used. Is this a toolset bug?No. It is not possible for the toolstack to kill a domain which is in this state. If it were the domain would have died, but a memory reference is keeping it alive and there is nothing the toolstack can do about that. Ian.
On 25/11/13 10:56, Steven Haigh wrote:> On 25/11/13 21:50, Sander Eikelenboom wrote: >> >> Monday, November 25, 2013, 11:36:05 AM, you wrote: >> >>> On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: >>>> Would it be possible to leave the domainname to something else as"(null)" when such a state occurs,>>>> the xendomains script f.e. seems to interpret this literally andbails out without shutting down>>>> any other domains. >> >>> I guess it should be a one liner, so please submit a patch. Not sure >>> what alternative string should be used, since you would want to avoid >>> clashing with any potential real domain''s name. >> >> I didn''t immediately spot the place where it was set to "null". >> Yes that''s a problem, though domainnaming has more restrictions (likeusing "0" (or any other number that is also a domain-id) as domainname) in the "just don''t do that" category.>> >>> From that PoV it might be better to teach xendomains to ignore such >>> domains. >> >> From what i remember i also couldn''t use "xl destroy" on such adomain (though i probably should by using the domain number instead of the name).>> Perhaps the toolscripts should just uses the domain-id numbersinstead of names for anything except printk''s and echoing to the user ?> > Correct - once a domain enters the (null) state, you cannot use ''xl > destroy'' to kill the domain. As in my first post, the domain ID still > exists, but it cannot be used. Is this a toolset bug?Not really - it is a current Xen limitation. Once a domain enters this state, there is literally nothing the toolstack can do to further kill the domain. One solution to the problem is for the outstanding granted pages to transfer ownership to Xen, which allows the rest of the domain can be cleaned up. However, that would make it far less obvious when problems like this do occur. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Nov 25, 2013 at 12:19:40PM +1100, Steven Haigh wrote:> On 25/11/2013 4:14 AM, Wei Liu wrote: > > On Sat, Nov 23, 2013 at 8:26 PM, Andrew Cooper > > <andrew.cooper3@citrix.com> wrote: > >> (XEN) -------- active -------- -------- shared -------- > >> (XEN) [ref] localdom mfn pin localdom gmfn flags > >> (XEN) grant-table for remote domain: 2 (v1) > >> (XEN) [16302] 0 0x0866e1 0x00000001 0 0x0064e1 0x19 > >> (XEN) [16320] 0 0x0866e0 0x00000001 0 0x0064e0 0x19 > >> > >> Ok - so domain 2 has two outstanding grants. This explains why it is a > >> zombie. > >> > >> Both these grants are GFT_writing | GFT_reading | GFT_permit_access, but > >> seemingly unmapped. > >> > > > > I didn''t go through the whole thread, is there any chance you upgraded > > your Dom0 kernel? > > > > It is possible that you miss some upstream patches. > > The Dom0 kernel is currently 3.11.7 on the system I''ve seen the problem > on after only a few hours of uptime. I''m in the middle of pushing 3.11.9 > to that system. I use the vanilla kernel from kernel.org for all my Dom0 > systems. >Yes, 3.11.7 is missing those two patches which 3.11.9 has those. They should fix your issue. Wei.> -- > Steven Haigh > > Email: netwiz@crc.id.au > Web: https://www.crc.id.au > Phone: (03) 9001 6090 - 0412 935 897 > Fax: (03) 8338 0299 >> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Mon, 2013-11-25 at 11:50 +0100, Sander Eikelenboom wrote:> Monday, November 25, 2013, 11:36:05 AM, you wrote: > > > On Sat, 2013-11-23 at 17:38 +0100, Sander Eikelenboom wrote: > >> Would it be possible to leave the domainname to something else as "(null)" when such a state occurs, > >> the xendomains script f.e. seems to interpret this literally and bails out without shutting down > >> any other domains. > > > I guess it should be a one liner, so please submit a patch. Not sure > > what alternative string should be used, since you would want to avoid > > clashing with any potential real domain''s name. > > I didn''t immediately spot the place where it was set to "null".I think it is what you get from "printf("%s", NULL)" with glibc.> Perhaps the toolscripts should just uses the domain-id numbers instead > of names for anything except printk''s and echoing to the user ?"xl list" is the latter. You could perhaps add an option to print the numeric domid instead (-n is commonly used for this I think) and use that option in xendomains script? Ian.