Hello, We have encountered a strange problem where "virsh list" is attempting to look up some non-existent domains. alpha # virsh list Id Name State ---------------------------------- 0 Domain-0 running 1 hosting:smpst-1 blocked 3 hosting:cpy1-1 blocked 4 hosting:liferay-1 blocked libvir: Xen Daemon error : GET operation failed: libvir: Xen Daemon error : GET operation failed: 8 hosting-dct:dct-1 blocked Looking at xend-debug.log I see a couple of messages (sans stack traces): XendInvalidDomain: <Fault 3: ''6''> XendInvalidDomain: <Fault 3: ''7''> Odd thing is "xm list" doesn''t have the same problem. We tried restarting libvirtd, since, I''m assuming, virsh talks to libvirtd but xm does not. Did not help. I''m a little reluctant to restart any of the other daemons since there are running domains on the hypervisor. I also tried using the xenstore commands (xenstore-list, xenstore-ls) but it doesn''t show any domains with ID 6 or 7: alpha # /usr/lib/xen/bin/xenstore-list /local/domain 0 1 3 4 8 Is there some file or "database" that virsh consults that might need cleaned up? Thanks... --joe
On Mon, Feb 23, 2009 at 04:12:07PM -0800, Joseph Mocker wrote:> We have encountered a strange problem where "virsh list" is attempting > to look up some non-existent domains. > > alpha # virsh listThis can happen occassionally when the hypervisor still thinks a domain exists. Try running this script (passing 6 and 7): $ cat ~johnlev/bin/domstate #!/usr/bin/amd64/python import sys import xen.lowlevel.xc xc = xen.lowlevel.xc.xc() print "%s" % xc.domain_getinfo(int(sys.argv[1])) This is always a bug, but I don''t know which one it might be. I''ve not had a reproducible case of this for quite some time. regards john
The script does return information about 6 and 7. any way to tell the hypervisor to clear them out? alpha # ./domstate 6 [{''paused'': 1, ''cpu_time'': 0L, ''ssidref'': 0, ''hvm'': 0, ''shutdown_reason'': 0, ''dying'': 1, ''mem_kb'': 4096L, ''domid'': 6, ''max_vcpu_id'': 1, ''crashed'': 0, ''running'': 0, ''maxmem_kb'': 4096L, ''shutdown'': 0, ''online_vcpus'': 0, ''handle'': [216, 238, 227, 14, 148, 51, 177, 178, 249, 66, 188, 4, 149, 144, 142, 147], ''blocked'': 1}, {''paused'': 1, ''cpu_time'': 0L, ''ssidref'': 0, ''hvm'': 0, ''shutdown_reason'': 0, ''dying'': 1, ''mem_kb'': 4096L, ''domid'': 7, ''max_vcpu_id'': 1, ''crashed'': 0, ''running'': 0, ''maxmem_kb'': 4194304L, ''shutdown'': 0, ''online_vcpus'': 0, ''handle'': [216, 238, 227, 14, 148, 51, 177, 178, 249, 66, 188, 4, 149, 144, 142, 147], ''blocked'': 1}, {''paused'': 0, ''cpu_time'': 222810037566L, ''ssidref'': 0, ''hvm'': 0, ''shutdown_reason'': 0, ''dying'': 0, ''mem_kb'': 4194304L, ''domid'': 8, ''max_vcpu_id'': 1, ''crashed'': 0, ''running'': 0, ''maxmem_kb'': 4194304L, ''shutdown'': 0, ''online_vcpus'': 2, ''handle'': [216, 238, 227, 14, 148, 51, 177, 178, 249, 66, 188, 4, 149, 144, 142, 147], ''blocked'': 1}] alpha # ./domstate 7 [{''paused'': 1, ''cpu_time'': 0L, ''ssidref'': 0, ''hvm'': 0, ''shutdown_reason'': 0, ''dying'': 1, ''mem_kb'': 4096L, ''domid'': 7, ''max_vcpu_id'': 1, ''crashed'': 0, ''running'': 0, ''maxmem_kb'': 4194304L, ''shutdown'': 0, ''online_vcpus'': 0, ''handle'': [216, 238, 227, 14, 148, 51, 177, 178, 249, 66, 188, 4, 149, 144, 142, 147], ''blocked'': 1}, {''paused'': 0, ''cpu_time'': 223547284066L, ''ssidref'': 0, ''hvm'': 0, ''shutdown_reason'': 0, ''dying'': 0, ''mem_kb'': 4194304L, ''domid'': 8, ''max_vcpu_id'': 1, ''crashed'': 0, ''running'': 0, ''maxmem_kb'': 4194304L, ''shutdown'': 0, ''online_vcpus'': 2, ''handle'': [216, 238, 227, 14, 148, 51, 177, 178, 249, 66, 188, 4, 149, 144, 142, 147], ''blocked'': 1}] John Levon wrote:> On Mon, Feb 23, 2009 at 04:12:07PM -0800, Joseph Mocker wrote: > > >> We have encountered a strange problem where "virsh list" is attempting >> to look up some non-existent domains. >> >> alpha # virsh list >> > > This can happen occassionally when the hypervisor still thinks a domain > exists. Try running this script (passing 6 and 7): > > $ cat ~johnlev/bin/domstate > #!/usr/bin/amd64/python > > import sys > import xen.lowlevel.xc > > xc = xen.lowlevel.xc.xc() > > print "%s" % xc.domain_getinfo(int(sys.argv[1])) > > > > This is always a bug, but I don''t know which one it might be. I''ve not > had a reproducible case of this for quite some time. > > regards > john >
On Mon, Feb 23, 2009 at 09:36:46PM -0800, Joseph Mocker wrote:> The script does return information about 6 and 7. any way to tell the > hypervisor to clear them out?No.> alpha # ./domstate 6 > [{''paused'': 1, ''cpu_time'': 0L, ''ssidref'': 0, ''hvm'': 0, > ''shutdown_reason'': 0, ''dying'': 1, ''mem_kb'': 4096L, ''domid'': 6,^^^^^ Somehow we mis-counted and ended up with all this memory apparently still referenced. regards john
Bummer. Guess I can migrate the domains off the hypervisor and reboot. My coworker is the one who made this occur. I''ll see if he can reproduce his steps to make it happen again. He was trying to change memory allocation, notice he fubared and specified 4k instead of 4g. On Feb 23, 2009, at 9:56 PM, John Levon <john.levon@sun.com> wrote:> On Mon, Feb 23, 2009 at 09:36:46PM -0800, Joseph Mocker wrote: > >> The script does return information about 6 and 7. any way to tell the >> hypervisor to clear them out? > > No. > >> alpha # ./domstate 6 >> [{''paused'': 1, ''cpu_time'': 0L, ''ssidref'': 0, ''hvm'': 0, >> ''shutdown_reason'': 0, ''dying'': 1, ''mem_kb'': 4096L, ''domid'': 6, > ^^^^^ > > Somehow we mis-counted and ended up with all this memory apparently > still referenced. > > regards > john
On Tue, Feb 24, 2009 at 07:15:41AM -0800, Joseph Mocker wrote:> My coworker is the one who made this occur. I''ll see if he can > reproduce his steps to make it happen again.That''d be great, if we can get a reproducible test case we can start debugging the hypervisor so we don''t get these zombies...> He was trying to change memory allocation, notice he fubared and > specified 4k instead of 4g.I don''t blame him. The interface is truly horrible for this, I''ve wanted to take human readable amounts for a long time... regards john