Hi, I am currently using c/s 25371:e9058654ca08. When I try to start a HVM guest I get this failure: xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 libxl: error: libxl.c:3208:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 1: Bad file descriptor libxl: error: libxl_event.c:468:libxl__ev_xswatch_register: create watch for path /local/domain/0/device-model/1/state: Bad file descriptor libxl: error: libxl_dm.c:1069:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3) assertion "ao->in_initiator" failed: file "libxl_event.c", line 1388, function "libxl__ao_complete_check_progress_reports" [1] Abort trap (core dumped) xl create -c ${F... Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Fri, 2012-05-18 at 14:17 +0100, Christoph Egger wrote:> Hi, > > I am currently using c/s 25371:e9058654ca08. > When I try to start a HVM guest I get this failure: > > > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019bd04 > TOTAL: 0000000000000000->00000000ff800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x00000000000003fb > 1GB PAGES: 0x0000000000000002 > libxl: error: libxl.c:3208:libxl_sched_credit_domain_set: Cpu weight out > of range, valid values are within range from 1 to 65535 > libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get > dompath for 1: Bad file descriptorThis is on NetBSD? These sorts of symptoms are similar to those fixed by 25364:8dce7a4121b9 but you''ve already got that. It might be worth doing a full clean and rebuild, just in case. What does your guest config look like? What is your command line? Do you know when it last worked? The places which close ctx->xsh are very few -- might be worth annotating any call to xs_daemon_close() with a printf.
On 05/18/12 15:30, Ian Campbell wrote:> On Fri, 2012-05-18 at 14:17 +0100, Christoph Egger wrote: >> Hi, >> >> I am currently using c/s 25371:e9058654ca08. >> When I try to start a HVM guest I get this failure: >> >> >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> libxl: error: libxl.c:3208:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >> dompath for 1: Bad file descriptor > > This is on NetBSD?Yes.> > These sorts of symptoms are similar to those fixed by 25364:8dce7a4121b9 > but you''ve already got that. It might be worth doing a full clean and > rebuild, just in case.This is a clean build.> What does your guest config look like?builder=''hvm'' memory=4096 nestedhvm=1 name="guest" vcpus=4 cpuid="host,page1gb=k,hypervisor=0" acpi=1 apic=1 vif = [ ''type=ioemu, bridge=bridge0, model=e1000'' ] disk = [ ''file:/hvm-guest/guest.img,ioemu:hda,w'' ] serial=''pty'' vnc=1 sdl=0> What is your command line?xl create -c guest.conf> Do you know when it last worked?Changeset 24462 worked. I need to bisect the exact changeset.> The places which close ctx->xsh are very few -- might be worth > annotating any call to xs_daemon_close() with a printf.ok. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On 05/18/12 16:23, Christoph Egger wrote:> On 05/18/12 15:30, Ian Campbell wrote: > >> On Fri, 2012-05-18 at 14:17 +0100, Christoph Egger wrote: >>> Hi, >>> >>> I am currently using c/s 25371:e9058654ca08. >>> When I try to start a HVM guest I get this failure: >>> >>> >>> xc: info: VIRTUAL MEMORY ARRANGEMENT: >>> Loader: 0000000000100000->000000000019bd04 >>> TOTAL: 0000000000000000->00000000ff800000 >>> ENTRY ADDRESS: 0000000000100000 >>> xc: info: PHYSICAL MEMORY ALLOCATION: >>> 4KB PAGES: 0x0000000000000200 >>> 2MB PAGES: 0x00000000000003fb >>> 1GB PAGES: 0x0000000000000002 >>> libxl: error: libxl.c:3208:libxl_sched_credit_domain_set: Cpu weight out >>> of range, valid values are within range from 1 to 65535 >>> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >>> dompath for 1: Bad file descriptor >> >> This is on NetBSD? > > > Yes. > >> >> These sorts of symptoms are similar to those fixed by 25364:8dce7a4121b9 >> but you''ve already got that. It might be worth doing a full clean and >> rebuild, just in case. > > > This is a clean build. > > >> What does your guest config look like? > > > builder=''hvm'' > memory=4096 > nestedhvm=1 > name="guest" > vcpus=4 > cpuid="host,page1gb=k,hypervisor=0" > acpi=1 > apic=1 > vif = [ ''type=ioemu, bridge=bridge0, model=e1000'' ] > disk = [ ''file:/hvm-guest/guest.img,ioemu:hda,w'' ] > serial=''pty'' > vnc=1 > sdl=0 > >> What is your command line? > > xl create -c guest.conf > > >> Do you know when it last worked? > > > Changeset 24462 worked. I need to bisect the exact > changeset. > > >> The places which close ctx->xsh are very few -- might be worth >> annotating any call to xs_daemon_close() with a printf. > > > ok.In libxl__build_post() I check the return value of libxl__sched_set_params(). Now trying to start a guest results in this failure: xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot (re-)build domain: -6 libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find device model''s pid: No such file or directory libxl: error: libxl.c:1162:libxl_domain_destroy: libxl__destroy_device_model failed for 1 libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call xs_daemon_close <-- the printf annotation Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
> In libxl__build_post() I check the return value > of libxl__sched_set_params().The mesages about scheduler params are a known and benign issue.> Now trying to start a guest results in this failure: > > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019bd04 > TOTAL: 0000000000000000->00000000ff800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x00000000000003fb > 1GB PAGES: 0x0000000000000002 > libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out > of range, valid values are within range from 1 to 65535 > libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > (re-)build domain: -6 > libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > device model''s pid: No such file or directoryIs your device model dying for some reason? Anything in /var/log/xen/*guest*.log about it? You could try "xl -vvv cr ..." too, not sure what it will say.> libxl: error: libxl.c:1162:libxl_domain_destroy: > libxl__destroy_device_model failed for 1 > libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call > xs_daemon_close <-- the printf annotation > > > Christoph > >
On 05/18/12 17:58, Ian Campbell wrote:> >> In libxl__build_post() I check the return value >> of libxl__sched_set_params(). > > The mesages about scheduler params are a known and benign issue. > >> Now trying to start a guest results in this failure: >> >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot >> (re-)build domain: -6 >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find >> device model''s pid: No such file or directory > > Is your device model dying for some reason? Anything > in /var/log/xen/*guest*.log about it?The guest logfile doesn''t exist. Does that mean the errors happens before device model has been started at all?> > You could try "xl -vvv cr ..." too, not sure what it will say.libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=unknown libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk vdev=hda, using backend phy xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot (re-)build domain: -6 libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find device model''s pid: No such file or directory libxl: error: libxl.c:1162:libxl_domain_destroy: libxl__destroy_device_model failed for 2 xc: debug: hypercall buffer: total allocations:1251 total releases:1251 xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 xc: debug: hypercall buffer: cache current size:2 xc: debug: hypercall buffer: cache hits:1248 misses:2 toobig:1 libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call xs_daemon_close <-- the printf annotation Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Mon, 2012-05-21 at 11:26 +0100, Christoph Egger wrote:> On 05/18/12 17:58, Ian Campbell wrote: > > > > >> In libxl__build_post() I check the return value > >> of libxl__sched_set_params(). > > > > The mesages about scheduler params are a known and benign issue. > > > >> Now trying to start a guest results in this failure: > >> > >> xc: info: VIRTUAL MEMORY ARRANGEMENT: > >> Loader: 0000000000100000->000000000019bd04 > >> TOTAL: 0000000000000000->00000000ff800000 > >> ENTRY ADDRESS: 0000000000100000 > >> xc: info: PHYSICAL MEMORY ALLOCATION: > >> 4KB PAGES: 0x0000000000000200 > >> 2MB PAGES: 0x00000000000003fb > >> 1GB PAGES: 0x0000000000000002 > >> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out > >> of range, valid values are within range from 1 to 65535 > >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > >> (re-)build domain: -6 > >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > >> device model''s pid: No such file or directory > > > > Is your device model dying for some reason? Anything > > in /var/log/xen/*guest*.log about it? > > > The guest logfile doesn''t exist.Sorry, I meant guest as in $GUEST_NAME rather than literally "guest" (I was totally non-obvious about that, sorry!).> Does that mean the errors happens before device model has been started at all?I think/hope if that were the case you would get messages about failure to exec etc rather than timeouts.> > > > You could try "xl -vvv cr ..." too, not sure what it will say. > > > libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > vdev=hda spec.backend=unknown > libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk > vdev=hda, using backend phy > xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 > xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019bd04 > TOTAL: 0000000000000000->00000000ff800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x00000000000003fb > 1GB PAGES: 0x0000000000000002No messages about "xs transaction failed: Bad file descriptor" any more?> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 > libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out > of range, valid values are within range from 1 to 65535 > libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > (re-)build domain: -6 > libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > device model''s pid: No such file or directory > libxl: error: libxl.c:1162:libxl_domain_destroy: > libxl__destroy_device_model failed for 2Hrm, actually, the device model stuff might be a red-herring -- that''s trying to tear down the device model on failure and it is entirely reasonable for the device model to not be running if we didn''t get as far as starting it... The interesting message is just:> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > (re-)build domain: -6Which is unhelpfully just a general failure from libxl__domain_build. It seems that we have a non-logging failure path in there somewhere. I''m afraid that the easieist way to fix this is probably just to dive into libxl__domain_build and add prints on the various error cases of sub functions, then recurse as you identify which one is failing etc.. Ian.
On 05/21/12 14:15, Ian Campbell wrote:> On Mon, 2012-05-21 at 11:26 +0100, Christoph Egger wrote: >> On 05/18/12 17:58, Ian Campbell wrote: >> >>> >>>> In libxl__build_post() I check the return value >>>> of libxl__sched_set_params(). >>> >>> The mesages about scheduler params are a known and benign issue. >>> >>>> Now trying to start a guest results in this failure: >>>> >>>> xc: info: VIRTUAL MEMORY ARRANGEMENT: >>>> Loader: 0000000000100000->000000000019bd04 >>>> TOTAL: 0000000000000000->00000000ff800000 >>>> ENTRY ADDRESS: 0000000000100000 >>>> xc: info: PHYSICAL MEMORY ALLOCATION: >>>> 4KB PAGES: 0x0000000000000200 >>>> 2MB PAGES: 0x00000000000003fb >>>> 1GB PAGES: 0x0000000000000002 >>>> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out >>>> of range, valid values are within range from 1 to 65535 >>>> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot >>>> (re-)build domain: -6 >>>> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find >>>> device model''s pid: No such file or directory >>> >>> Is your device model dying for some reason? Anything >>> in /var/log/xen/*guest*.log about it? >> >> >> The guest logfile doesn''t exist. > > Sorry, I meant guest as in $GUEST_NAME rather than literally "guest" (I > was totally non-obvious about that, sorry!).I understood it that way. The guest logfile doesn''t exist.> >> Does that mean the errors happens before device model has been started at all? > > I think/hope if that were the case you would get messages about failure > to exec etc rather than timeouts. > >>> >>> You could try "xl -vvv cr ..." too, not sure what it will say. >> >> >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >> vdev=hda spec.backend=unknown >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >> vdev=hda, using backend phy >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 > > No messages about "xs transaction failed: Bad file descriptor" any more? > >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot >> (re-)build domain: -6 >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find >> device model''s pid: No such file or directory >> libxl: error: libxl.c:1162:libxl_domain_destroy: >> libxl__destroy_device_model failed for 2 > > Hrm, actually, the device model stuff might be a red-herring -- that''s > trying to tear down the device model on failure and it is entirely > reasonable for the device model to not be running if we didn''t get as > far as starting it... > > The interesting message is just: >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot >> (re-)build domain: -6 > > Which is unhelpfully just a general failure from libxl__domain_build. > > It seems that we have a non-logging failure path in there somewhere. I''m > afraid that the easieist way to fix this is probably just to dive into > libxl__domain_build and add prints on the various error cases of sub > functions, then recurse as you identify which one is failing etc..I did that: Parsing config from /root/hvm-guest/netbsd_64b.conf libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=unknown libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk vdev=hda, using backend phy xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_dom.c:74:libxl__sched_set_params: libxl_sched_credit_domain_set failed -6 libxl: error: libxl_dom.c:192:libxl__build_post: libxl__sched_set_params failed -6 libxl: error: libxl_create.c:322:libxl__domain_build: libxl__build_post failed: -6 libxl: error: libxl_create.c:709:domcreate_bootloader_done: cannot (re-)build domain: -6 libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find device model''s pid: No such file or directory libxl: error: libxl.c:1162:libxl_domain_destroy: libxl__destroy_device_model failed for 6 xc: debug: hypercall buffer: total allocations:1264 total releases:1264 xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 xc: debug: hypercall buffer: cache current size:2 xc: debug: hypercall buffer: cache hits:1261 misses:2 toobig:1 libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call xs_daemon_close So it is indeed that ERROR_INVAL from that ''benign'' error. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Mon, 2012-05-21 at 14:10 +0100, Christoph Egger wrote:> On 05/21/12 14:15, Ian Campbell wrote: > > > On Mon, 2012-05-21 at 11:26 +0100, Christoph Egger wrote: > >> On 05/18/12 17:58, Ian Campbell wrote: > >> > >>> > >>>> In libxl__build_post() I check the return value > >>>> of libxl__sched_set_params(). > >>> > >>> The mesages about scheduler params are a known and benign issue. > >>> > >>>> Now trying to start a guest results in this failure: > >>>> > >>>> xc: info: VIRTUAL MEMORY ARRANGEMENT: > >>>> Loader: 0000000000100000->000000000019bd04 > >>>> TOTAL: 0000000000000000->00000000ff800000 > >>>> ENTRY ADDRESS: 0000000000100000 > >>>> xc: info: PHYSICAL MEMORY ALLOCATION: > >>>> 4KB PAGES: 0x0000000000000200 > >>>> 2MB PAGES: 0x00000000000003fb > >>>> 1GB PAGES: 0x0000000000000002 > >>>> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out > >>>> of range, valid values are within range from 1 to 65535 > >>>> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > >>>> (re-)build domain: -6 > >>>> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > >>>> device model''s pid: No such file or directory > >>> > >>> Is your device model dying for some reason? Anything > >>> in /var/log/xen/*guest*.log about it? > >> > >> > >> The guest logfile doesn''t exist. > > > > Sorry, I meant guest as in $GUEST_NAME rather than literally "guest" (I > > was totally non-obvious about that, sorry!). > > > I understood it that way. The guest logfile doesn''t exist. > > > > >> Does that mean the errors happens before device model has been started at all? > > > > I think/hope if that were the case you would get messages about failure > > to exec etc rather than timeouts. > > > >>> > >>> You could try "xl -vvv cr ..." too, not sure what it will say. > >> > >> > >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > >> vdev=hda spec.backend=unknown > >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk > >> vdev=hda, using backend phy > >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 > >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 > >> xc: info: VIRTUAL MEMORY ARRANGEMENT: > >> Loader: 0000000000100000->000000000019bd04 > >> TOTAL: 0000000000000000->00000000ff800000 > >> ENTRY ADDRESS: 0000000000100000 > >> xc: info: PHYSICAL MEMORY ALLOCATION: > >> 4KB PAGES: 0x0000000000000200 > >> 2MB PAGES: 0x00000000000003fb > >> 1GB PAGES: 0x0000000000000002 > > > > No messages about "xs transaction failed: Bad file descriptor" any more? > > > >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 > >> libxl: error: libxl.c:3211:libxl_sched_credit_domain_set: Cpu weight out > >> of range, valid values are within range from 1 to 65535 > >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > >> (re-)build domain: -6 > >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > >> device model''s pid: No such file or directory > >> libxl: error: libxl.c:1162:libxl_domain_destroy: > >> libxl__destroy_device_model failed for 2 > > > > Hrm, actually, the device model stuff might be a red-herring -- that''s > > trying to tear down the device model on failure and it is entirely > > reasonable for the device model to not be running if we didn''t get as > > far as starting it... > > > > The interesting message is just: > >> libxl: error: libxl_create.c:694:domcreate_bootloader_done: cannot > >> (re-)build domain: -6 > > > > Which is unhelpfully just a general failure from libxl__domain_build. > > > > It seems that we have a non-logging failure path in there somewhere. I''m > > afraid that the easieist way to fix this is probably just to dive into > > libxl__domain_build and add prints on the various error cases of sub > > functions, then recurse as you identify which one is failing etc.. > > I did that: > > Parsing config from /root/hvm-guest/netbsd_64b.conf > libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > vdev=hda spec.backend=unknown > libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk > vdev=hda, using backend phy > xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 > xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019bd04 > TOTAL: 0000000000000000->00000000ff800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x00000000000003fb > 1GB PAGES: 0x0000000000000002 > xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 > libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out > of range, valid values are within range from 1 to 65535 > libxl: error: libxl_dom.c:74:libxl__sched_set_params: > libxl_sched_credit_domain_set failed -6 > libxl: error: libxl_dom.c:192:libxl__build_post: libxl__sched_set_params > failed -6 > libxl: error: libxl_create.c:322:libxl__domain_build: libxl__build_post > failed: -6 > libxl: error: libxl_create.c:709:domcreate_bootloader_done: cannot > (re-)build domain: -6 > libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find > device model''s pid: No such file or directory > libxl: error: libxl.c:1162:libxl_domain_destroy: > libxl__destroy_device_model failed for 6 > xc: debug: hypercall buffer: total allocations:1264 total releases:1264 > xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 > xc: debug: hypercall buffer: cache current size:2 > xc: debug: hypercall buffer: cache hits:1261 misses:2 toobig:1 > libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call > xs_daemon_close > > > So it is indeed that ERROR_INVAL from that ''benign'' errorIn my version of libxl libxl__build_post doesn''t even look at the return value of libxl__sched_set_params. .... libxl__sched_set_params (gc, domid, &(info->sched_params)); .... the only other exit path from that function is: dom_path = libxl__xs_get_dompath(gc, domid); if (!dom_path) { return ERROR_FAIL; } which is consistent with the original errors you had (but if ERROR_FAIL, not ERROR_INVAL). This doesn''t really help me figure out what is going on though :-/ Ian.
On 05/21/12 15:49, Ian Campbell wrote:>>> It seems that we have a non-logging failure path in there somewhere. I''m >>> afraid that the easieist way to fix this is probably just to dive into >>> libxl__domain_build and add prints on the various error cases of sub >>> functions, then recurse as you identify which one is failing etc.. >> >> I did that: >> >> Parsing config from /root/hvm-guest/netbsd_64b.conf >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >> vdev=hda spec.backend=unknown >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >> vdev=hda, using backend phy >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >> libxl_sched_credit_domain_set failed -6 >> libxl: error: libxl_dom.c:192:libxl__build_post: libxl__sched_set_params >> failed -6 >> libxl: error: libxl_create.c:322:libxl__domain_build: libxl__build_post >> failed: -6 >> libxl: error: libxl_create.c:709:domcreate_bootloader_done: cannot >> (re-)build domain: -6 >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find >> device model''s pid: No such file or directory >> libxl: error: libxl.c:1162:libxl_domain_destroy: >> libxl__destroy_device_model failed for 6 >> xc: debug: hypercall buffer: total allocations:1264 total releases:1264 >> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 >> xc: debug: hypercall buffer: cache current size:2 >> xc: debug: hypercall buffer: cache hits:1261 misses:2 toobig:1 >> libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call >> xs_daemon_close >> >> >> So it is indeed that ERROR_INVAL from that ''benign'' error > > In my version of libxl libxl__build_post doesn''t even look at the return > value of libxl__sched_set_params. > .... > libxl__sched_set_params (gc, domid, &(info->sched_params)); > ....I reverted my local change and retried. See below.> the only other exit path from that function is: > dom_path = libxl__xs_get_dompath(gc, domid); > if (!dom_path) { > return ERROR_FAIL; > } > which is consistent with the original errors you had (but if ERROR_FAIL, > not ERROR_INVAL). This doesn''t really help me figure out what is going > on though :-/libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=unknown libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk vdev=hda, using backend phy xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_dom.c:74:libxl__sched_set_params: libxl_sched_credit_domain_set failed -6 libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=phy libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hdb spec.backend=phy libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: debug: libxl_dm.c:1008:libxl__spawn_local_dm: Spawning device-model /usr/local.25371.netbsd/libexec/qemu-dm with arguments: libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: /usr/local.25371.netbsd/libexec/qemu-dm libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -d libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 7 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -domain-name libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: HVM64-NetBSD libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vnc libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 0.0.0.0:0 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vncunused libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -serial libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: pty libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -videoram libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 8 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -boot libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: cd libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -acpi libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vcpus libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 4 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vcpu_avail libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 0xf libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -net libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: nic,vlan=0,macaddr=00:16:3e:00:ce:01,model=e1000 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -net libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: tap,vlan=0,ifname=vif7.0-emu,bridge=bridge0,script=/usr/local.25371.netbsd/etc/xen/scripts/qemu-ifup,downscript=/usr/local.25371.netbsd/etc/xen/scripts/qemu-ifup libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -M libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: xenfv libxl: error: libxl_event.c:468:libxl__ev_xswatch_register: create watch for path /local/domain/0/device-model/7/state: Bad file descriptor libxl: error: libxl_dm.c:1072:device_model_spawn_outcome: domain 7 device model: spawn failed (rc=-3) assertion "ao->in_initiator" failed: file "libxl_event.c", line 1388, function "libxl__ao_complete_check_progress_reports" Abort (core dumped) (gdb) bt #0 0x00007f7ff65059aa in _lwp_kill () from /usr/lib/libc.so.12 #1 0x00007f7ff6505612 in abort () from /usr/lib/libc.so.12 #2 0x00007f7ff65052dd in __assert13 () from /usr/lib/libc.so.12 #3 0x00007f7ff742d114 in libxl__ao_complete_check_progress_reports ( egc=0x7f7fffffd140, ao=0x7f7ff7b210e0) at libxl_event.c:1388 #4 0x00007f7ff742d2ec in egc_run_callbacks (egc=0x7f7fffffd140) at libxl_event.c:971 #5 libxl__egc_cleanup (egc=0x7f7fffffd140) at libxl_event.c:991 #6 0x00007f7ff741890f in do_domain_create (ctx=0x7f7ff7b210b8, d_config=<optimized out>, domid=<optimized out>, restore_fd=<optimized out>, ao_how=<optimized out>, aop_console_how=0x7f7fffffffff) at libxl_create.c:905 #7 0x00007f7ff741893e in libxl_domain_create_new (ctx=<optimized out>, d_config=<optimized out>, domid=<optimized out>, ao_how=<optimized out>, aop_console_how=<optimized out>) at libxl_create.c:926 #8 0x000000000040c4d9 in create_domain (dom_info=0x7f7fffffd630) at xl_cmdimpl.c:1760 #9 0x0000000000410161 in main_create (argc=3, argv=<optimized out>) at xl_cmdimpl.c:3730 #10 0x0000000000406d86 in main (argc=3, argv=0x7f7fffffdba0) at xl.c:208 Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On 05/21/12 15:49, Ian Campbell wrote:>>> It seems that we have a non-logging failure path in there somewhere. I''m >>> afraid that the easieist way to fix this is probably just to dive into >>> libxl__domain_build and add prints on the various error cases of sub >>> functions, then recurse as you identify which one is failing etc.. >> >> I did that: >> >> Parsing config from /root/hvm-guest/netbsd_64b.conf >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >> vdev=hda spec.backend=unknown >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >> vdev=hda, using backend phy >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >> libxl_sched_credit_domain_set failed -6 >> libxl: error: libxl_dom.c:192:libxl__build_post: libxl__sched_set_params >> failed -6 >> libxl: error: libxl_create.c:322:libxl__domain_build: libxl__build_post >> failed: -6 >> libxl: error: libxl_create.c:709:domcreate_bootloader_done: cannot >> (re-)build domain: -6 >> libxl: error: libxl_dm.c:1104:libxl__destroy_device_model: Couldn''t find >> device model''s pid: No such file or directory >> libxl: error: libxl.c:1162:libxl_domain_destroy: >> libxl__destroy_device_model failed for 6 >> xc: debug: hypercall buffer: total allocations:1264 total releases:1264 >> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2 >> xc: debug: hypercall buffer: cache current size:2 >> xc: debug: hypercall buffer: cache hits:1261 misses:2 toobig:1 >> libxl: error: libxl.c:155:libxl_ctx_free: libxl_ctx_free: call >> xs_daemon_close >> >> >> So it is indeed that ERROR_INVAL from that ''benign'' error > > In my version of libxl libxl__build_post doesn''t even look at the return > value of libxl__sched_set_params. > .... > libxl__sched_set_params (gc, domid, &(info->sched_params)); > ....I reverted my local change and retried. See below. > the only other exit path from that function is:> dom_path = libxl__xs_get_dompath(gc, domid); > if (!dom_path) { > return ERROR_FAIL; > } > which is consistent with the original errors you had (but if ERROR_FAIL, > not ERROR_INVAL). This doesn''t really help me figure out what is going > on though :-/libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=unknown libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk vdev=hda, using backend phy xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019bd04 TOTAL: 0000000000000000->00000000ff800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000002 xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out of range, valid values are within range from 1 to 65535 libxl: error: libxl_dom.c:74:libxl__sched_set_params: libxl_sched_credit_domain_set failed -6 libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hda spec.backend=phy libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk vdev=hdb spec.backend=phy libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:107:libxl__device_generic_add: xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get dompath for 7: Bad file descriptor libxl: debug: libxl_dm.c:1008:libxl__spawn_local_dm: Spawning device-model /usr/local.25371.netbsd/libexec/qemu-dm with arguments: libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: /usr/local.25371.netbsd/libexec/qemu-dm libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -d libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 7 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -domain-name libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: HVM64-NetBSD libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vnc libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 0.0.0.0:0 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vncunused libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -serial libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: pty libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -videoram libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 8 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -boot libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: cd libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -acpi libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vcpus libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 4 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -vcpu_avail libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: 0xf libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -net libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: nic,vlan=0,macaddr=00:16:3e:00:ce:01,model=e1000 libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -net libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: tap,vlan=0,ifname=vif7.0-emu,bridge=bridge0,script=/usr/local.25371.netbsd/etc/xen/scripts/qemu-ifup,downscript=/usr/local.25371.netbsd/etc/xen/scripts/qemu-ifup libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: -M libxl: debug: libxl_dm.c:1010:libxl__spawn_local_dm: xenfv libxl: error: libxl_event.c:468:libxl__ev_xswatch_register: create watch for path /local/domain/0/device-model/7/state: Bad file descriptor libxl: error: libxl_dm.c:1072:device_model_spawn_outcome: domain 7 device model: spawn failed (rc=-3) assertion "ao->in_initiator" failed: file "libxl_event.c", line 1388, function "libxl__ao_complete_check_progress_reports" Abort (core dumped) (gdb) bt #0 0x00007f7ff65059aa in _lwp_kill () from /usr/lib/libc.so.12 #1 0x00007f7ff6505612 in abort () from /usr/lib/libc.so.12 #2 0x00007f7ff65052dd in __assert13 () from /usr/lib/libc.so.12 #3 0x00007f7ff742d114 in libxl__ao_complete_check_progress_reports ( egc=0x7f7fffffd140, ao=0x7f7ff7b210e0) at libxl_event.c:1388 #4 0x00007f7ff742d2ec in egc_run_callbacks (egc=0x7f7fffffd140) at libxl_event.c:971 #5 libxl__egc_cleanup (egc=0x7f7fffffd140) at libxl_event.c:991 #6 0x00007f7ff741890f in do_domain_create (ctx=0x7f7ff7b210b8, d_config=<optimized out>, domid=<optimized out>, restore_fd=<optimized out>, ao_how=<optimized out>, aop_console_how=0x7f7fffffffff) at libxl_create.c:905 #7 0x00007f7ff741893e in libxl_domain_create_new (ctx=<optimized out>, d_config=<optimized out>, domid=<optimized out>, ao_how=<optimized out>, aop_console_how=<optimized out>) at libxl_create.c:926 #8 0x000000000040c4d9 in create_domain (dom_info=0x7f7fffffd630) at xl_cmdimpl.c:1760 #9 0x0000000000410161 in main_create (argc=3, argv=<optimized out>) at xl_cmdimpl.c:3730 #10 0x0000000000406d86 in main (argc=3, argv=0x7f7fffffdba0) at xl.c:208 Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Mon, 2012-05-21 at 16:44 +0100, Christoph Egger wrote:> I reverted my local change and retried. See below. > > > the only other exit path from that function is: > > > dom_path = libxl__xs_get_dompath(gc, domid); > > if (!dom_path) { > > return ERROR_FAIL; > > } > > which is consistent with the original errors you had (but if ERROR_FAIL, > > not ERROR_INVAL). This doesn''t really help me figure out what is going > > on though :-/ > > > > > libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > vdev=hda spec.backend=unknown > libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk > vdev=hda, using backend phy > xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 > xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 > xc: info: VIRTUAL MEMORY ARRANGEMENT: > Loader: 0000000000100000->000000000019bd04 > TOTAL: 0000000000000000->00000000ff800000 > ENTRY ADDRESS: 0000000000100000 > xc: info: PHYSICAL MEMORY ALLOCATION: > 4KB PAGES: 0x0000000000000200 > 2MB PAGES: 0x00000000000003fb > 1GB PAGES: 0x0000000000000002 > xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 > libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out > of range, valid values are within range from 1 to 65535 > libxl: error: libxl_dom.c:74:libxl__sched_set_params: > libxl_sched_credit_domain_set failed -6 > libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > vdev=hda spec.backend=phy > libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get > dompath for 7: Bad file descriptorThis is back to the original issue, I think the last couple of mails have been something of a tangent since you weren''t getting as far as this failure. I''m not really sure what to suggest here -- something is either closing the fd or scribbling over the memory which contains it. I suppose you could sprinkle calls to libxl__xs_get_dompath() around between libxl__sched_set_params and libxl__device_disk_set_backend and see where it starts failing -- that''s going to be pretty tedious though. If you''ve got the gdb-fu you might be able to set a write watch on the location in the ctx with the fd -- could tell you something perhaps. Otherwise perhaps bisection is the best bet?> for path /local/domain/0/device-model/7/state: Bad file descriptor > libxl: error: libxl_dm.c:1072:device_model_spawn_outcome: domain 7 > device model: spawn failed (rc=-3) > assertion "ao->in_initiator" failed: file "libxl_event.c", line 1388, > function "libxl__ao_complete_check_progress_reports" > Abort (core dumped) > > (gdb) btCan you tell if the xs fd is still actually open at this point? On Linux I would look in /proc/<ipd>/fds for the socket. Also can you print out the xsh from the ctx (perhaps that''s easier from e.g. frame #7 below?) Also the ao failure smells like bad error handling resulting from the underlying issue, which might be worth someone investigating separately.> #0 0x00007f7ff65059aa in _lwp_kill () from /usr/lib/libc.so.12 > #1 0x00007f7ff6505612 in abort () from /usr/lib/libc.so.12 > #2 0x00007f7ff65052dd in __assert13 () from /usr/lib/libc.so.12 > #3 0x00007f7ff742d114 in libxl__ao_complete_check_progress_reports ( > egc=0x7f7fffffd140, ao=0x7f7ff7b210e0) at libxl_event.c:1388 > #4 0x00007f7ff742d2ec in egc_run_callbacks (egc=0x7f7fffffd140) > at libxl_event.c:971 > #5 libxl__egc_cleanup (egc=0x7f7fffffd140) at libxl_event.c:991 > #6 0x00007f7ff741890f in do_domain_create (ctx=0x7f7ff7b210b8, > d_config=<optimized out>, domid=<optimized out>, > restore_fd=<optimized out>, > ao_how=<optimized out>, aop_console_how=0x7f7fffffffff) at > libxl_create.c:905 > #7 0x00007f7ff741893e in libxl_domain_create_new (ctx=<optimized out>, > d_config=<optimized out>, domid=<optimized out>, ao_how=<optimized > out>, > aop_console_how=<optimized out>) at libxl_create.c:926 > #8 0x000000000040c4d9 in create_domain (dom_info=0x7f7fffffd630) > at xl_cmdimpl.c:1760 > #9 0x0000000000410161 in main_create (argc=3, argv=<optimized out>) > at xl_cmdimpl.c:3730 > #10 0x0000000000406d86 in main (argc=3, argv=0x7f7fffffdba0) at xl.c:208 > > Christoph > >
Ian Campbell writes ("Re: [Xen-devel] libxl: cannot start guest"):> On Mon, 2012-05-21 at 16:44 +0100, Christoph Egger wrote: > > libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get > > dompath for 7: Bad file descriptor > > This is back to the original issue, I think the last couple of mails > have been something of a tangent since you weren''t getting as far as > this failure. > > I''m not really sure what to suggest here -- something is either closing > the fd or scribbling over the memory which contains it.I would strace (on BSD, ktrace?) the process. That would tell you whether the fd was being closed and if so when. If it''s not being closed then the fd value is being overwritten and a gdb hardware watchpoint will find where. Ian.
On 05/21/12 17:57, Ian Campbell wrote:>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >> vdev=hda spec.backend=unknown >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >> vdev=hda, using backend phy >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >> xc: info: VIRTUAL MEMORY ARRANGEMENT: >> Loader: 0000000000100000->000000000019bd04 >> TOTAL: 0000000000000000->00000000ff800000 >> ENTRY ADDRESS: 0000000000100000 >> xc: info: PHYSICAL MEMORY ALLOCATION: >> 4KB PAGES: 0x0000000000000200 >> 2MB PAGES: 0x00000000000003fb >> 1GB PAGES: 0x0000000000000002 >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >> of range, valid values are within range from 1 to 65535 >> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >> libxl_sched_credit_domain_set failed -6 >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >> vdev=hda spec.backend=phy >> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >> dompath for 7: Bad file descriptor > > This is back to the original issue, I think the last couple of mails > have been something of a tangent since you weren''t getting as far as > this failure. > > I''m not really sure what to suggest here -- something is either closing > the fd or scribbling over the memory which contains it. > > I suppose you could sprinkle calls to libxl__xs_get_dompath() around > between libxl__sched_set_params and libxl__device_disk_set_backend and > see where it starts failing -- that''s going to be pretty tedious though.It starts failing in libxl__build_post() right after xs_introduce_domain(). Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Tue, 2012-05-22 at 13:35 +0100, Christoph Egger wrote:> On 05/21/12 17:57, Ian Campbell wrote: > > >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > >> vdev=hda spec.backend=unknown > >> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk > >> vdev=hda, using backend phy > >> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 > >> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 > >> xc: info: VIRTUAL MEMORY ARRANGEMENT: > >> Loader: 0000000000100000->000000000019bd04 > >> TOTAL: 0000000000000000->00000000ff800000 > >> ENTRY ADDRESS: 0000000000100000 > >> xc: info: PHYSICAL MEMORY ALLOCATION: > >> 4KB PAGES: 0x0000000000000200 > >> 2MB PAGES: 0x00000000000003fb > >> 1GB PAGES: 0x0000000000000002 > >> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 > >> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out > >> of range, valid values are within range from 1 to 65535 > >> libxl: error: libxl_dom.c:74:libxl__sched_set_params: > >> libxl_sched_credit_domain_set failed -6 > >> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk > >> vdev=hda spec.backend=phy > >> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get > >> dompath for 7: Bad file descriptor > > > > This is back to the original issue, I think the last couple of mails > > have been something of a tangent since you weren''t getting as far as > > this failure. > > > > I''m not really sure what to suggest here -- something is either closing > > the fd or scribbling over the memory which contains it. > > > > I suppose you could sprinkle calls to libxl__xs_get_dompath() around > > between libxl__sched_set_params and libxl__device_disk_set_backend and > > see where it starts failing -- that''s going to be pretty tedious though. > > > It starts failing in libxl__build_post() right after > xs_introduce_domain().What method did you use to determine that? So at the xs_transaction_end right before that ctx->xsh is valid, but right after... xs_introduce_domain(ctx->xsh, domid, state->store_mfn, state->store_port); ...it is invalid? i.e. before the free(vmpath) it is already corrupt? (Aside: why isn''t vmpath in the gc, instead of done manually, nevermind...) Does the xs_introduce_domain itself succeed? Or do you mean that the next use of xsh after this fails (where is that, somewhere back up the callchain? store_libxl_entry perhaps?) xs_introduce_domain doesn''t seem to do much which is untoward with the handle. The only thing which springs to mind is that it may generate an @IntroduceDomain watch event. However xl is single threaded so we won''t process that event until we unwind to whichever point we do an event loop iteration, in which case the corruption would have to happen later than right after xs_introduce_domain(). Did you manage to determine if "Bad file descriptor" was due to it being closed vs. the value being corrupted? Ian.
On 05/22/12 14:53, Ian Campbell wrote:> On Tue, 2012-05-22 at 13:35 +0100, Christoph Egger wrote: >> On 05/21/12 17:57, Ian Campbell wrote: >> >>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>> vdev=hda spec.backend=unknown >>>> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >>>> vdev=hda, using backend phy >>>> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >>>> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >>>> xc: info: VIRTUAL MEMORY ARRANGEMENT: >>>> Loader: 0000000000100000->000000000019bd04 >>>> TOTAL: 0000000000000000->00000000ff800000 >>>> ENTRY ADDRESS: 0000000000100000 >>>> xc: info: PHYSICAL MEMORY ALLOCATION: >>>> 4KB PAGES: 0x0000000000000200 >>>> 2MB PAGES: 0x00000000000003fb >>>> 1GB PAGES: 0x0000000000000002 >>>> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >>>> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >>>> of range, valid values are within range from 1 to 65535 >>>> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >>>> libxl_sched_credit_domain_set failed -6 >>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>> vdev=hda spec.backend=phy >>>> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >>>> dompath for 7: Bad file descriptor >>> >>> This is back to the original issue, I think the last couple of mails >>> have been something of a tangent since you weren''t getting as far as >>> this failure. >>> >>> I''m not really sure what to suggest here -- something is either closing >>> the fd or scribbling over the memory which contains it. >>> >>> I suppose you could sprinkle calls to libxl__xs_get_dompath() around >>> between libxl__sched_set_params and libxl__device_disk_set_backend and >>> see where it starts failing -- that''s going to be pretty tedious though. >> >> >> It starts failing in libxl__build_post() right after >> xs_introduce_domain(). > > What method did you use to determine that?What you said: "sprinkle calls to libxl__xs_get_dompath() around between libxl__sched_set_params and libxl__device_disk_set_backend and see where it starts failing"> So at the xs_transaction_end right before that ctx->xsh is valid, but > right after... > xs_introduce_domain(ctx->xsh, domid, state->store_mfn, state->store_port); > ...it is invalid? i.e. before the free(vmpath) it is already corrupt?Yes, you got it.> > (Aside: why isn''t vmpath in the gc, instead of done manually, > nevermind...) > > Does the xs_introduce_domain itself succeed?No, it fails.> Or do you mean that the next use of xsh after this fails> (where is that, somewhere back up the callchain? store_libxl_entry > perhaps?)> > xs_introduce_domain doesn''t seem to do much which is untoward with the > handle.I think, in xs_talkv() something must fail.> The only thing which springs to mind is that it may generate an > @IntroduceDomain watch event. However xl is single threaded so we won''t > process that event until we unwind to whichever point we do an event > loop iteration, in which case the corruption would have to happen later > than right after xs_introduce_domain(). > > Did you manage to determine if "Bad file descriptor" was due to it being > closed vs. the value being corrupted?I''m looking into it. I suspicion is that if (msg.type != type) in xs_talkv() is true. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On 05/22/12 14:53, Ian Campbell wrote:> On Tue, 2012-05-22 at 13:35 +0100, Christoph Egger wrote: >> On 05/21/12 17:57, Ian Campbell wrote: >> >>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>> vdev=hda spec.backend=unknown >>>> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >>>> vdev=hda, using backend phy >>>> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >>>> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >>>> xc: info: VIRTUAL MEMORY ARRANGEMENT: >>>> Loader: 0000000000100000->000000000019bd04 >>>> TOTAL: 0000000000000000->00000000ff800000 >>>> ENTRY ADDRESS: 0000000000100000 >>>> xc: info: PHYSICAL MEMORY ALLOCATION: >>>> 4KB PAGES: 0x0000000000000200 >>>> 2MB PAGES: 0x00000000000003fb >>>> 1GB PAGES: 0x0000000000000002 >>>> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >>>> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >>>> of range, valid values are within range from 1 to 65535 >>>> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >>>> libxl_sched_credit_domain_set failed -6 >>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>> vdev=hda spec.backend=phy >>>> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >>>> dompath for 7: Bad file descriptor >>> >>> This is back to the original issue, I think the last couple of mails >>> have been something of a tangent since you weren''t getting as far as >>> this failure. >>> >>> I''m not really sure what to suggest here -- something is either closing >>> the fd or scribbling over the memory which contains it. >>> >>> I suppose you could sprinkle calls to libxl__xs_get_dompath() around >>> between libxl__sched_set_params and libxl__device_disk_set_backend and >>> see where it starts failing -- that''s going to be pretty tedious though. >> >> >> It starts failing in libxl__build_post() right after >> xs_introduce_domain(). > > What method did you use to determine that?What you said: "sprinkle calls to libxl__xs_get_dompath() around between libxl__sched_set_params and libxl__device_disk_set_backend and see where it starts failing" > So at the xs_transaction_end right before that ctx->xsh is valid, but> right after... > xs_introduce_domain(ctx->xsh, domid, state->store_mfn, state->store_port); > ...it is invalid? i.e. before the free(vmpath) it is already corrupt?Yes, you got it.> > (Aside: why isn''t vmpath in the gc, instead of done manually, > nevermind...) > > Does the xs_introduce_domain itself succeed?No, it fails.> Or do you mean that the next use of xsh after this fails> (where is that, somewhere back up the callchain? store_libxl_entry > perhaps?)> > xs_introduce_domain doesn''t seem to do much which is untoward with the > handle.I thinkIn xs_talkv() something must fail.> The only thing which springs to mind is that it may generate an > @IntroduceDomain watch event. However xl is single threaded so we won''t > process that event until we unwind to whichever point we do an event > loop iteration, in which case the corruption would have to happen later > than right after xs_introduce_domain(). > > Did you manage to determine if "Bad file descriptor" was due to it being > closed vs. the value being corrupted?My suspicion is that if (msg.type != type) in xs_talkv() is true. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Tue, 2012-05-22 at 14:18 +0100, Christoph Egger wrote:> I thinkIn xs_talkv() something must fail. > > > The only thing which springs to mind is that it may generate an > > @IntroduceDomain watch event. However xl is single threaded so we won''t > > process that event until we unwind to whichever point we do an event > > loop iteration, in which case the corruption would have to happen later > > than right after xs_introduce_domain(). > > > > Did you manage to determine if "Bad file descriptor" was due to it being > > closed vs. the value being corrupted? > > My suspicion is that > > if (msg.type != type) > > in xs_talkv() is true. >Yes, that definitely seems worth investigating. Ian.
On 05/22/12 15:18, Christoph Egger wrote:> On 05/22/12 14:53, Ian Campbell wrote: > >> On Tue, 2012-05-22 at 13:35 +0100, Christoph Egger wrote: >>> On 05/21/12 17:57, Ian Campbell wrote: >>> >>>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>>> vdev=hda spec.backend=unknown >>>>> libxl: debug: libxl_device.c:219:libxl__device_disk_set_backend: Disk >>>>> vdev=hda, using backend phy >>>>> xc: detail: elf_parse_binary: phdr: paddr=0x100000 memsz=0x9bd04 >>>>> xc: detail: elf_parse_binary: memory: 0x100000 -> 0x19bd04 >>>>> xc: info: VIRTUAL MEMORY ARRANGEMENT: >>>>> Loader: 0000000000100000->000000000019bd04 >>>>> TOTAL: 0000000000000000->00000000ff800000 >>>>> ENTRY ADDRESS: 0000000000100000 >>>>> xc: info: PHYSICAL MEMORY ALLOCATION: >>>>> 4KB PAGES: 0x0000000000000200 >>>>> 2MB PAGES: 0x00000000000003fb >>>>> 1GB PAGES: 0x0000000000000002 >>>>> xc: detail: elf_load_binary: phdr 0 at 0x0x7f7ff7f42000 -> 0x0x7f7ff7fd4b74 >>>>> libxl: error: libxl.c:3213:libxl_sched_credit_domain_set: Cpu weight out >>>>> of range, valid values are within range from 1 to 65535 >>>>> libxl: error: libxl_dom.c:74:libxl__sched_set_params: >>>>> libxl_sched_credit_domain_set failed -6 >>>>> libxl: debug: libxl_device.c:183:libxl__device_disk_set_backend: Disk >>>>> vdev=hda spec.backend=phy >>>>> libxl: error: libxl_xshelp.c:102:libxl__xs_get_dompath: failed to get >>>>> dompath for 7: Bad file descriptor >>>> >>>> This is back to the original issue, I think the last couple of mails >>>> have been something of a tangent since you weren''t getting as far as >>>> this failure. >>>> >>>> I''m not really sure what to suggest here -- something is either closing >>>> the fd or scribbling over the memory which contains it. >>>> >>>> I suppose you could sprinkle calls to libxl__xs_get_dompath() around >>>> between libxl__sched_set_params and libxl__device_disk_set_backend and >>>> see where it starts failing -- that''s going to be pretty tedious though. >>> >>> >>> It starts failing in libxl__build_post() right after >>> xs_introduce_domain(). >> >> What method did you use to determine that? > > > > > What you said: > > "sprinkle calls to libxl__xs_get_dompath() around between > libxl__sched_set_params and libxl__device_disk_set_backend and > see where it starts failing" > > > So at the xs_transaction_end right before that ctx->xsh is valid, but > >> right after... >> xs_introduce_domain(ctx->xsh, domid, state->store_mfn, state->store_port); >> ...it is invalid? i.e. before the free(vmpath) it is already corrupt? > > > > > Yes, you got it. > >> >> (Aside: why isn''t vmpath in the gc, instead of done manually, >> nevermind...) >> >> Does the xs_introduce_domain itself succeed? > > > > > No, it fails. > >> Or do you mean that the next use of xsh after this fails > >> (where is that, somewhere back up the callchain? store_libxl_entry >> perhaps?) > >> >> xs_introduce_domain doesn''t seem to do much which is untoward with the >> handle. > > > > I thinkIn xs_talkv() something must fail. > >> The only thing which springs to mind is that it may generate an >> @IntroduceDomain watch event. However xl is single threaded so we won''t >> process that event until we unwind to whichever point we do an event >> loop iteration, in which case the corruption would have to happen later >> than right after xs_introduce_domain(). >> >> Did you manage to determine if "Bad file descriptor" was due to it being >> closed vs. the value being corrupted? > > My suspicion is that > > if (msg.type != type) > > in xs_talkv() is true. > > Christoph-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On 05/22/12 15:21, Ian Campbell wrote:> On Tue, 2012-05-22 at 14:18 +0100, Christoph Egger wrote: >> I thinkIn xs_talkv() something must fail. >> >>> The only thing which springs to mind is that it may generate an >>> @IntroduceDomain watch event. However xl is single threaded so we won''t >>> process that event until we unwind to whichever point we do an event >>> loop iteration, in which case the corruption would have to happen later >>> than right after xs_introduce_domain(). >>> >>> Did you manage to determine if "Bad file descriptor" was due to it being >>> closed vs. the value being corrupted? >> >> My suspicion is that >> >> if (msg.type != type) >> >> in xs_talkv() is true. >> > > Yes, that definitely seems worth investigating.Ok, I got it. xenstored crashes due to dereferencing NULL pointer. In xenstored_domain.c, map_interface() *xcg_handle is NULL and in xc_gnttab.c, xc_gnttab_map_grant_ref() it is dereferenced. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Tue, 2012-05-22 at 15:03 +0100, Christoph Egger wrote:> On 05/22/12 15:21, Ian Campbell wrote: > > > On Tue, 2012-05-22 at 14:18 +0100, Christoph Egger wrote: > >> I thinkIn xs_talkv() something must fail. > >> > >>> The only thing which springs to mind is that it may generate an > >>> @IntroduceDomain watch event. However xl is single threaded so we won''t > >>> process that event until we unwind to whichever point we do an event > >>> loop iteration, in which case the corruption would have to happen later > >>> than right after xs_introduce_domain(). > >>> > >>> Did you manage to determine if "Bad file descriptor" was due to it being > >>> closed vs. the value being corrupted? > >> > >> My suspicion is that > >> > >> if (msg.type != type) > >> > >> in xs_talkv() is true. > >> > > > > Yes, that definitely seems worth investigating. > > > Ok, I got it. > > xenstored crashes due to dereferencing NULL pointer.Huh, xenstore has materially changed for quite a while (since February).> In xenstored_domain.c, map_interface() *xcg_handle is NULL > and in xc_gnttab.c, xc_gnttab_map_grant_ref() it is dereferenced.This comes from 24757:aae516b78fce. Diego and Alex aren''t around any more but CCing Daniel in case he remembers anything. I guess the original xc_gnttab_open which sets *xcg_handle is failing for you, I suppose that is to be expected on NetBSD? Either way it should still work after this has failed. All the >= checks on *xcg_handle seem wrong to me. Really they should be checking != NULL, since otherwise they don''t actually discriminate the two cases! Does making that change help? Ian.
On 05/22/12 16:20, Ian Campbell wrote:> On Tue, 2012-05-22 at 15:03 +0100, Christoph Egger wrote: >> On 05/22/12 15:21, Ian Campbell wrote: >> >>> On Tue, 2012-05-22 at 14:18 +0100, Christoph Egger wrote: >>>> I thinkIn xs_talkv() something must fail. >>>> >>>>> The only thing which springs to mind is that it may generate an >>>>> @IntroduceDomain watch event. However xl is single threaded so we won''t >>>>> process that event until we unwind to whichever point we do an event >>>>> loop iteration, in which case the corruption would have to happen later >>>>> than right after xs_introduce_domain(). >>>>> >>>>> Did you manage to determine if "Bad file descriptor" was due to it being >>>>> closed vs. the value being corrupted? >>>> >>>> My suspicion is that >>>> >>>> if (msg.type != type) >>>> >>>> in xs_talkv() is true. >>>> >>> >>> Yes, that definitely seems worth investigating. >> >> >> Ok, I got it. >> >> xenstored crashes due to dereferencing NULL pointer. > > Huh, xenstore has materially changed for quite a while (since February). > >> In xenstored_domain.c, map_interface() *xcg_handle is NULL >> and in xc_gnttab.c, xc_gnttab_map_grant_ref() it is dereferenced. > > This comes from 24757:aae516b78fce. Diego and Alex aren''t around any > more but CCing Daniel in case he remembers anything. > > I guess the original xc_gnttab_open which sets *xcg_handle is failing > for you, I suppose that is to be expected on NetBSD? Either way it > should still work after this has failed. > > All the >= checks on *xcg_handle seem wrong to me. Really they should be > checking != NULL, since otherwise they don''t actually discriminate the > two cases! Does making that change help?Yes, that helps! I can start guests again. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
On Tue, 2012-05-22 at 16:16 +0100, Christoph Egger wrote:> On 05/22/12 16:20, Ian Campbell wrote: > > All the >= checks on *xcg_handle seem wrong to me. Really they should be > > checking != NULL, since otherwise they don''t actually discriminate the > > two cases! Does making that change help? > > Yes, that helps! I can start guests again.Excellent, I assume you are going to submit the patch (i.e. I don''t need to..) Ian.
On 05/22/12 17:21, Ian Campbell wrote:> On Tue, 2012-05-22 at 16:16 +0100, Christoph Egger wrote: >> On 05/22/12 16:20, Ian Campbell wrote: >>> All the >= checks on *xcg_handle seem wrong to me. Really they should be >>> checking != NULL, since otherwise they don''t actually discriminate the >>> two cases! Does making that change help? >> >> Yes, that helps! I can start guests again. > > Excellent, I assume you are going to submit the patch (i.e. I don''t need > to..)Yes, patch attached. Fix pointer checks introduced in changeset 24757:aae516b78fce. This fixes xenstored crash on platforms with no gnttap implementation. Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 2012-05-22 at 16:32 +0100, Christoph Egger wrote:> On 05/22/12 17:21, Ian Campbell wrote: > > > On Tue, 2012-05-22 at 16:16 +0100, Christoph Egger wrote: > >> On 05/22/12 16:20, Ian Campbell wrote: > >>> All the >= checks on *xcg_handle seem wrong to me. Really they should be > >>> checking != NULL, since otherwise they don''t actually discriminate the > >>> two cases! Does making that change help? > >> > >> Yes, that helps! I can start guests again. > > > > Excellent, I assume you are going to submit the patch (i.e. I don''t need > > to..) > > Yes, patch attached.I fixed up the commit message as follows. I''ll apply if IanJ agrees or acks it. 8<----------------------------- From 6b43ca97f5f8c4fa9bf24101253af21bc66ddf96 Mon Sep 17 00:00:00 2001 From: Christoph Egger <Christoph.Egger@amd.com> Date: Tue, 22 May 2012 17:32:21 +0200 Subject: [PATCH] xenstore: fix crash on platforms with no gntdev driver implementation. Fix pointer checks introduced in changeset 24757:aae516b78fce. Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- tools/xenstore/xenstored_domain.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index f8c822f..bf83d58 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -167,7 +167,7 @@ static int readchn(struct connection *conn, void *data, unsigned int len) static void *map_interface(domid_t domid, unsigned long mfn) { - if (*xcg_handle >= 0) { + if (*xcg_handle != NULL) { /* this is the preferred method */ return xc_gnttab_map_grant_ref(*xcg_handle, domid, GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE); @@ -179,7 +179,7 @@ static void *map_interface(domid_t domid, unsigned long mfn) static void unmap_interface(void *interface) { - if (*xcg_handle >= 0) + if (*xcg_handle != NULL) xc_gnttab_munmap(*xcg_handle, interface, 1); else munmap(interface, getpagesize()); -- 1.7.2.5
On 05/23/12 12:11, Ian Campbell wrote:> On Tue, 2012-05-22 at 16:32 +0100, Christoph Egger wrote: >> On 05/22/12 17:21, Ian Campbell wrote: >> >>> On Tue, 2012-05-22 at 16:16 +0100, Christoph Egger wrote: >>>> On 05/22/12 16:20, Ian Campbell wrote: >>>>> All the >= checks on *xcg_handle seem wrong to me. Really they should be >>>>> checking != NULL, since otherwise they don''t actually discriminate the >>>>> two cases! Does making that change help? >>>> >>>> Yes, that helps! I can start guests again. >>> >>> Excellent, I assume you are going to submit the patch (i.e. I don''t need >>> to..) >> >> Yes, patch attached. > > I fixed up the commit message as follows. I''ll apply if IanJ agrees or > acks it.Thank you. Ian J. what do you say? Christoph> 8<----------------------------- > > From 6b43ca97f5f8c4fa9bf24101253af21bc66ddf96 Mon Sep 17 00:00:00 2001 > From: Christoph Egger <Christoph.Egger@amd.com> > Date: Tue, 22 May 2012 17:32:21 +0200 > Subject: [PATCH] xenstore: fix crash on platforms with no gntdev driver implementation. > > Fix pointer checks introduced in changeset 24757:aae516b78fce. > > Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com> > --- > tools/xenstore/xenstored_domain.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c > index f8c822f..bf83d58 100644 > --- a/tools/xenstore/xenstored_domain.c > +++ b/tools/xenstore/xenstored_domain.c > @@ -167,7 +167,7 @@ static int readchn(struct connection *conn, void *data, unsigned int len) > > static void *map_interface(domid_t domid, unsigned long mfn) > { > - if (*xcg_handle >= 0) { > + if (*xcg_handle != NULL) { > /* this is the preferred method */ > return xc_gnttab_map_grant_ref(*xcg_handle, domid, > GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE); > @@ -179,7 +179,7 @@ static void *map_interface(domid_t domid, unsigned long mfn) > > static void unmap_interface(void *interface) > { > - if (*xcg_handle >= 0) > + if (*xcg_handle != NULL) > xc_gnttab_munmap(*xcg_handle, interface, 1); > else > munmap(interface, getpagesize());-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85689 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632
Christoph Egger wrote:> On 05/23/12 12:11, Ian Campbell wrote: > >> On Tue, 2012-05-22 at 16:32 +0100, Christoph Egger wrote: >>> On 05/22/12 17:21, Ian Campbell wrote: >>> >>>> On Tue, 2012-05-22 at 16:16 +0100, Christoph Egger wrote: >>>>> On 05/22/12 16:20, Ian Campbell wrote: >>>>>> All the>= checks on *xcg_handle seem wrong to me. Really they should be >>>>>> checking != NULL, since otherwise they don''t actually discriminate the >>>>>> two cases! Does making that change help? >>>>> Yes, that helps! I can start guests again. >>>> Excellent, I assume you are going to submit the patch (i.e. I don''t need >>>> to..) >>> Yes, patch attached. >> I fixed up the commit message as follows. I''ll apply if IanJ agrees or >> acks it. > > Thank you. Ian J. what do you say? > > Christoph > > >> 8<----------------------------- >> >> From 6b43ca97f5f8c4fa9bf24101253af21bc66ddf96 Mon Sep 17 00:00:00 2001 >> From: Christoph Egger<Christoph.Egger@amd.com> >> Date: Tue, 22 May 2012 17:32:21 +0200 >> Subject: [PATCH] xenstore: fix crash on platforms with no gntdev driver implementation. >> >> Fix pointer checks introduced in changeset 24757:aae516b78fce. >> >> Signed-off-by: Christoph Egger<Christoph.Egger@amd.com> >> Acked-by: Ian Campbell<ian.campbell@citrix.com> >> --- >> tools/xenstore/xenstored_domain.c | 4 ++-- >> 1 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c >> index f8c822f..bf83d58 100644 >> --- a/tools/xenstore/xenstored_domain.c >> +++ b/tools/xenstore/xenstored_domain.c >> @@ -167,7 +167,7 @@ static int readchn(struct connection *conn, void *data, unsigned int len) >> >> static void *map_interface(domid_t domid, unsigned long mfn) >> { >> - if (*xcg_handle>= 0) { >> + if (*xcg_handle != NULL) { >> /* this is the preferred method */ >> return xc_gnttab_map_grant_ref(*xcg_handle, domid, >> GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE); >> @@ -179,7 +179,7 @@ static void *map_interface(domid_t domid, unsigned long mfn) >> >> static void unmap_interface(void *interface) >> { >> - if (*xcg_handle>= 0) >> + if (*xcg_handle != NULL) >> xc_gnttab_munmap(*xcg_handle, interface, 1); >> else >> munmap(interface, getpagesize());I also see an error when starting xencommons on NetBSD: test# /usr/xen42/etc/rc.d/xencommons onestart Cleaning xenstore database. Starting xenservices: xenstored, xenconsoled, xenbackendd.xc: error: OSDEP: interface 2 (gnttab) not supported on this platform: Internal error Which is quite annoying, but I''m not really sure of the most elegant way to solve this. The error comes from tools/libxc/xc_private.c:177, so maybe just removing that message would be ok, or something like this: --- a/tools/libxc/xc_private.c +++ b/tools/libxc/xc_private.c @@ -265,8 +265,12 @@ int xc_evtchn_close(xc_evtchn *xce) xc_gnttab *xc_gnttab_open(xentoollog_logger *logger, unsigned open_flags) { +#ifndef __NetBSD__ return xc_interface_open_common(logger, NULL, open_flags, XC_OSDEP_GNTTAB); +#else + return NULL; +#endif } Which is not really pretty.
On Fri, 2012-05-25 at 15:56 +0100, Roger Pau Monne wrote:> I also see an error when starting xencommons on NetBSD: > > test# /usr/xen42/etc/rc.d/xencommons onestart > Cleaning xenstore database. > Starting xenservices: xenstored, xenconsoled, xenbackendd.xc: error: > OSDEP: interface 2 (gnttab) not supported on this platform: Internal error > > Which is quite annoying, but I''m not really sure of the most elegant way > to solve this. The error comes from tools/libxc/xc_private.c:177, so > maybe just removing that message would be ok,I think removing the message is fine. This interface is intentionally "optional" so making a load of noise when the option is exercised seems silly... If you make it a DPRINTF is it silent in this context? If not then just nuke it entirely... Ian.> or something like this: > > --- a/tools/libxc/xc_private.c > +++ b/tools/libxc/xc_private.c > @@ -265,8 +265,12 @@ int xc_evtchn_close(xc_evtchn *xce) > xc_gnttab *xc_gnttab_open(xentoollog_logger *logger, > unsigned open_flags) > { > +#ifndef __NetBSD__ > return xc_interface_open_common(logger, NULL, open_flags, > XC_OSDEP_GNTTAB); > +#else > + return NULL; > +#endif > } > > Which is not really pretty.
Ian Campbell writes ("Re: [Xen-devel] libxl: cannot start guest"):> From: Christoph Egger <Christoph.Egger@amd.com> > Date: Tue, 22 May 2012 17:32:21 +0200 > Subject: [PATCH] xenstore: fix crash on platforms with no gntdev driver implementation. > > Fix pointer checks introduced in changeset 24757:aae516b78fce. > > Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> > Acked-by: Ian Campbell <ian.campbell@citrix.com>Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
On Tue, 2012-05-29 at 11:02 +0100, Ian Jackson wrote:> Ian Campbell writes ("Re: [Xen-devel] libxl: cannot start guest"): > > From: Christoph Egger <Christoph.Egger@amd.com> > > Date: Tue, 22 May 2012 17:32:21 +0200 > > Subject: [PATCH] xenstore: fix crash on platforms with no gntdev driver implementation. > > > > Fix pointer checks introduced in changeset 24757:aae516b78fce. > > > > Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> > > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>Committed, thanks.