flight 16779 qemu-upstream-4.2-testing real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-i386-i386-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 15122 Tests which did not succeed, but are not blocking: build-armhf 4 xen-build fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win 13 guest-stop fail never pass test-amd64-i386-qemuu-win-vcpus1 16 leak-check/check fail never pass test-i386-i386-qemuu-win 16 leak-check/check fail never pass test-amd64-i386-qemuu-win 16 leak-check/check fail never pass test-amd64-i386-xl-qemuu-win-vcpus1 13 guest-stop fail never pass test-amd64-i386-xend-qemuu-winxpsp3 16 leak-check/check fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 13 guest-stop fail never pass test-i386-i386-xl-qemuu-win 12 guest-localmigrate/x10 fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass test-amd64-amd64-qemuu-win 16 leak-check/check fail never pass test-amd64-i386-xl-qemuu-win7-amd64 13 guest-stop fail never pass version targeted for testing: qemuu eccc68722696864fc4823f048c7be58d11281b97 baseline version: qemuu 70454385eeee6f0b3f7a9eddca9f7340b5060824 ------------------------------------------------------------ People who touched revisions under test: Alex Bligh <alex@alex.org.uk> Anthony PERARD <anthony.perard@citrix.com> Stefano Stabellini <stefano.stabellini@eu.citrix.com> ------------------------------------------------------------ jobs: build-amd64 pass build-armhf fail build-i386 pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvops pass build-i386-pvops pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-i386-qemuu-win-vcpus1 fail test-amd64-i386-xl-qemuu-win-vcpus1 fail test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 fail test-amd64-amd64-qemuu-win fail test-amd64-i386-qemuu-win fail test-i386-i386-qemuu-win fail test-amd64-amd64-xl-qemuu-win fail test-i386-i386-xl-qemuu-win fail test-amd64-i386-xend-qemuu-winxpsp3 fail test-amd64-amd64-xl-qemuu-winxpsp3 fail test-i386-i386-xl-qemuu-winxpsp3 fail ------------------------------------------------------------ sg-report-flight on woking.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. ------------------------------------------------------------ commit eccc68722696864fc4823f048c7be58d11281b97 Author: Anthony PERARD <anthony.perard@citrix.com> Date: Thu Feb 21 12:16:42 2013 +0000 xen: Set the vram dirty when an error occurs. If the call to xc_hvm_track_dirty_vram() fails, then we set dirtybit on all the video ram. This case happens during migration. Backport of 8aba7dc02d5660df7e7d8651304b3079908358be This backport is less clean that it might be because there is no memory_region_set_dirty that copes with more than one page in 4.2, and the case where the call to xc_hvm_track_dirty_vram is successful also needs to ensure xen_modified_memory is called (which would on unstable have been done within memory_region_set_dirty). Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Alex Bligh <alex@alex.org.uk>
Ian Jackson
2013-Mar-01 16:15 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
xen.org writes ("[qemu-upstream-4.2-testing test] 16779: regressions - FAIL"):> flight 16779 qemu-upstream-4.2-testing real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-i386-i386-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 15122This is a persistent failure.> version targeted for testing: > qemuu eccc68722696864fc4823f048c7be58d11281b97 > baseline version: > qemuu 70454385eeee6f0b3f7a9eddca9f7340b5060824Looking at the changes, it seems likely to be related to the migration dirty bit tracking series. These were originally written by Anthony Perard and committed to qemu-upstream-unstable where they work, and then backported by Alex Blight. So the problem is probably something in Alex''s backports ? The logs show this, most relevantly: Mar 1 09:19:52 field-cricket kernel: [ 74.765641] qemu-system-i38[2474]: segfault at b909fed0 ip b77b3b3e sp bff1bae0 error 6 in qemu-system-i386[b7554000+2a5000] I''m afraid we don''t have a stack backtrace. qemu seems to have printed this: DSsDSsDSsDSs which is a bit odd! We want to be able to release Xen 4.2 soon. Our next step is probably to revert the migration changes so that we have a tree which works as well as it did before. Ian.
Alex Bligh
2013-Mar-01 21:18 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Ian, Stefano, --On 1 March 2013 16:15:16 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Looking at the changes, it seems likely to be related to the migration > dirty bit tracking series. These were originally written by Anthony > Perard and committed to qemu-upstream-unstable where they work, and > then backported by Alex Blight. > > So the problem is probably something in Alex''s backports ?Do we know from the test thing that it''s definitely the last patch in the series (i.e. the specific one indicated)? That last patch is the only one with substantive modifications to it, and I wasn''t entirely clear what the original code did (as posted here). Also do we know how to trigger/replicate this (I''m afraid I don''t know how to read the output).> We want to be able to release Xen 4.2 soon. Our next step is probably > to revert the migration changes so that we have a tree which works as > well as it did before.Stefano had an alternate way of fixing this though it involved un-inlining something IIRC. Perhaps I should go back and look at that. Any ideas Stefano? -- Alex Bligh
Stefano Stabellini
2013-Mar-03 01:57 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
On Fri, 1 Mar 2013, Alex Bligh wrote:> Ian, Stefano, > > --On 1 March 2013 16:15:16 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> > wrote: > > > Looking at the changes, it seems likely to be related to the migration > > dirty bit tracking series. These were originally written by Anthony > > Perard and committed to qemu-upstream-unstable where they work, and > > then backported by Alex Blight. > > > > So the problem is probably something in Alex''s backports ? > > Do we know from the test thing that it''s definitely the last patch > in the series (i.e. the specific one indicated)? That last patch is > the only one with substantive modifications to it, and I wasn''t > entirely clear what the original code did (as posted here). > > Also do we know how to trigger/replicate this (I''m afraid I don''t > know how to read the output).I think it''s just a standard Windows XP SP3 install in a VM> > We want to be able to release Xen 4.2 soon. Our next step is probably > > to revert the migration changes so that we have a tree which works as > > well as it did before. > > Stefano had an alternate way of fixing this though it involved un-inlining > something IIRC. Perhaps I should go back and look at that. > > Any ideas Stefano?It could be any of the last 6 commits: I would start by bisecting the problem further until you manage to pinpoint the precise backport that caused it. But like you say, it is probably the last one. Consider that the changes introduced by the last commit are only useful for live-migration, you can try to remove one by one to figure out exactly what''s breaking Windows install. My guess would be this chunk: @@ -470,7 +470,21 @@ static int xen_sync_dirty_bitmap(XenIOState *state, rc = xc_hvm_track_dirty_vram(xen_xc, xen_domid, start_addr >> TARGET_PAGE_BITS, npages, bitmap); - if (rc) { + if (rc < 0) { + if (rc != -ENODATA) { + ram_addr_t addr, end; + + xen_modified_memory(start_addr, size); + + end = TARGET_PAGE_ALIGN(start_addr + size); + for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr += TARGET_PAGE_SIZE) { + cpu_physical_memory_set_dirty(addr); + } + + DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx + ", 0x" TARGET_FMT_plx "): %s\n", + start_addr, start_addr + size, strerror(-rc)); + } return rc; } in particular the loop around cpu_physical_memory_set_dirty
Ian Jackson
2013-Mar-04 11:38 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL"):> --On 1 March 2013 16:15:16 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> > wrote: > > Looking at the changes, it seems likely to be related to the migration > > dirty bit tracking series. These were originally written by Anthony > > Perard and committed to qemu-upstream-unstable where they work, and > > then backported by Alex Blight. > > > > So the problem is probably something in Alex''s backports ? > > Do we know from the test thing that it''s definitely the last patch > in the series (i.e. the specific one indicated)? That last patch is > the only one with substantive modifications to it, and I wasn''t > entirely clear what the original code did (as posted here).No, we don''t. (In fact I think it''s a bug that it doesn''t show the whole series of 6 commits.)> Also do we know how to trigger/replicate this (I''m afraid I don''t > know how to read the output).As Stefano says, it''s a 32-bit Windows XP install. It''s on a 32-bit hypervisor. The full details can be found from various links here: http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/ The key part this this, which is from the link at the top of the column with the regression (whose cell is highlighted in pink): http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/test-i386-i386-xl-qemuu-winxpsp3/info.html There you can find various logfiles, but crucially also the domain configuration. I''m afraid the ISO used for the installation is not available for obvious reasons. Ian.
Ian Jackson
2013-Mar-04 11:40 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Stefano Stabellini writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL"):> It could be any of the last 6 commits: I would start by bisecting the > problem further until you manage to pinpoint the precise backport that > caused it. But like you say, it is probably the last one.Unfortunately the automatic bisection has been scuppered by the switch from xen.hg to xen.git. Alex, if you can''t repro the problem let me know and I can investigate it. Ian.
Alex Bligh
2013-Mar-04 15:08 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Stefano,> It could be any of the last 6 commits: I would start by bisecting the > problem further until you manage to pinpoint the precise backport that > caused it.My problem is trying to replicate it...> But like you say, it is probably the last one. > Consider that the changes introduced by the last commit are only useful > for live-migration, you can try to remove one by one to figure out > exactly what''s breaking Windows install. > My guess would be this chunk:Me too. This was the bit that I couldn''t actually work out what it was doing and why. In the original patch (below) it called memory_region_set_dirty, which doesn''t exist in 4.2, with the parameter ''framebuffer'', which also doesn''t exist in 4.2. This only gets called if xc_hvm_track_dirty_vram returns an error other than ENODATA. I had presumed the intent was to mark the whole framebuffer as dirty if xc_hvm_track_dirty_vram failed this way. Does the loop start (and end) need to start from vram_offset (i.e. physmap->phys_offset), rather than start_addr? I suspect my confusion is that both start_addr and vram_offset are target_phys_addr_t but one is one side of a mapping and one the other.> > @@ -470,7 +470,21 @@ static int xen_sync_dirty_bitmap(XenIOState *state, > rc = xc_hvm_track_dirty_vram(xen_xc, xen_domid, > start_addr >> TARGET_PAGE_BITS, npages, > bitmap); > - if (rc) { > + if (rc < 0) { > + if (rc != -ENODATA) { > + ram_addr_t addr, end; > + > + xen_modified_memory(start_addr, size); > + > + end = TARGET_PAGE_ALIGN(start_addr + size); > + for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr > += TARGET_PAGE_SIZE) { + > cpu_physical_memory_set_dirty(addr); > + } > + > + DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx > + ", 0x" TARGET_FMT_plx "): %s\n", > + start_addr, start_addr + size, strerror(-rc)); > + } > return rc; > } > > in particular the loop around cpu_physical_memory_set_dirty > >Original patch: diff --git a/xen-all.c b/xen-all.c index b11542c..e6308be 100644 --- a/xen-all.c +++ b/xen-all.c @@ -507,7 +507,8 @@ static void xen_sync_dirty_bitmap(XenIOState *state, bitmap); if (rc < 0) { if (rc != -ENODATA) { - fprintf(stderr, "xen: track_dirty_vram failed (0x" TARGET_FMT_plx + memory_region_set_dirty(framebuffer, 0, size); + DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx ", 0x" TARGET_FMT_plx "): %s\n", start_addr, start_addr + size, strerror(-rc)); } -- Alex Bligh
Alex Bligh
2013-Mar-04 15:14 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Ian, --On 4 March 2013 11:40:00 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Stefano Stabellini writes ("Re: [qemu-upstream-4.2-testing test] 16779: > regressions - FAIL"): >> It could be any of the last 6 commits: I would start by bisecting the >> problem further until you manage to pinpoint the precise backport that >> caused it. But like you say, it is probably the last one. > > Unfortunately the automatic bisection has been scuppered by the switch > from xen.hg to xen.git. Alex, if you can''t repro the problem let me > know and I can investigate it.I haven''t yet (though possibly from failure to get hold of the relevant OS version). I''d actually appreciate some code review with the last of the qemu patches as I was unsure what was meant to be going on (as per last message to Stefano). -- Alex Bligh
Ian Jackson
2013-Mar-05 17:10 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL"):> I haven''t yet (though possibly from failure to get hold of the relevant > OS version). I''d actually appreciate some code review with the last of the > qemu patches as I was unsure what was meant to be going on (as per > last message to Stefano).Well, I have reproduced this. I was hoping to actually debug it but gdb is having no luck getting a useful stack trce out of qemu-system-i386, even after I removed a bunch of bizarre compiler options and rebuilt it. I can bisect it so that''s what I''ll do next I guess. Ian.
When xc_hvm_track_dirty_vram fails, iterate through pages based on vram_offset and npages, rather than start_addr and size. DPRINTF before the loop in the hope of seeing output before SEGV. Signed-off-by: Alex Bligh <alex@alex.org.uk> --- xen-all.c | 15 +++++++-------- 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/xen-all.c b/xen-all.c index dbd759c..96a34a9 100644 --- a/xen-all.c +++ b/xen-all.c @@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state, bitmap); if (rc < 0) { if (rc != -ENODATA) { - ram_addr_t addr, end; - - xen_modified_memory(start_addr, size); - - end = TARGET_PAGE_ALIGN(start_addr + size); - for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr += TARGET_PAGE_SIZE) { - cpu_physical_memory_set_dirty(addr); - } + target_phys_addr_t todirty; DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx ", 0x" TARGET_FMT_plx "): %s\n", start_addr, start_addr + size, strerror(-rc)); + + xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE); + + for (todirty = vram_offset, i=0; i < npages; todirty += TARGET_PAGE_SIZE, i++) { + cpu_physical_memory_set_dirty(todirty); + } } return rc; } -- 1.7.4.1
Ian Jackson
2013-Mar-06 14:14 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL"):> In the original patch (below) it called memory_region_set_dirty, which > doesn''t exist in 4.2, with the parameter ''framebuffer'', which also doesn''t > exist in 4.2. This only gets called if xc_hvm_track_dirty_vram returns > an error other than ENODATA. I had presumed the intent was to mark the > whole framebuffer as dirty if xc_hvm_track_dirty_vram failed this way.I have managed to get a stack trace out of this crash. See below. Looking at the code we seem to be calling cpu_physical_memory_set_dirty with an invalid address. It has no checking and directly uses the supplied address (shifted to index one byte per page) in an array lookup. cpu_physical_*''s array for this, ram_list.phys_dirty, is allocated in find_ram_offset and qemu_ram_alloc_from_ptr. It seems likely to me that the code in xen_sync_dirty_bitmap which calls cpu_physical_memory_set_dirty should only be entered, at least via this path, with a valid ram address, as it is itself being called from cpu_physical_sync_dirty_bitmap (the assumptions here aren''t stated but it seems reasonable to assume that all the ram_addr_t''s supplied to cpu_physical_* are supposed to be valid ram) ? But it seems that this is not the case. At least, the address here is not in any of the blocks in ram_list. I''m afraid I have no idea whether the right fix is for xen_sync_dirty_bitmap to somehow check whether the address is relevant, or whether the bug is higher up in the call chain. Ian. Program received signal SIGSEGV, Segmentation fault. 0x082483a2 in cpu_physical_memory_set_dirty (addr=4026531840) at /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h:540 540 /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h: No such file or directory. in /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h (gdb) bt #0 0x082483a2 in cpu_physical_memory_set_dirty (addr=4026531840) at /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h:540 #1 0x08249735 in xen_sync_dirty_bitmap (state=0x946e250, start_addr=4026531840, size=8388608) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:481 #2 0x08249953 in xen_client_sync_dirty_bitmap (client=0x946e270, start_addr=4026531840, end_addr=4034920448) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:527 #3 0x081b43cb in cpu_notify_sync_dirty_bitmap (start=4026531840, end=4034920448) at /u/iwj/work/1/qemu-upstream-4.2-testing/exec.c:1751 #4 0x081b506d in cpu_physical_sync_dirty_bitmap (start_addr=4026531840, end_addr=4034920448) at /u/iwj/work/1/qemu-upstream-4.2-testing/exec.c:2139 #5 0x081df005 in as_memory_range_del (as=0x832783c, fr=0x948da30) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:336 #6 0x081e0b84 in address_space_update_topology_pass (as=0x832783c, old_view=..., new_view=..., adding=false) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:711 #7 0x081e0ccd in address_space_update_topology (as=0x832783c) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:746 #8 0x081e0d5b in memory_region_update_topology () at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:761 #9 0x081e2aa5 in memory_region_del_subregion (mr=0x969ed38, subregion=0x96b4628) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:1304 #10 0x080e8a33 in pci_update_mappings (d=0x96a3b00) at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci.c:997 #11 0x080e8d86 in pci_default_write_config (d=0x96a3b00, addr=4, val=0, l=4) at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci.c:1050 #12 0x080ebbbe in pci_host_config_write_common (pci_dev=0x96a3b00, addr=4, limit=256, val=0, len=4) at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:54 #13 0x080ebc80 in pci_data_write (s=0x969fe98, addr=2147487748, val=0, len=4) at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:75 #14 0x080ebda7 in pci_host_data_write (opaque=0x969f018, addr=0, val=0, len=4) at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:125 #15 0x081ded13 in memory_region_write_accessor (opaque=0x969fdf4, addr=0, value=0xbfc1a700, size=4, shift=0, mask=4294967295) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:265 #16 0x081dedde in access_with_adjusted_size (addr=0, value=0xbfc1a700, size=4, access_size_min=1, access_size_max=4, access=0x81dec8f <memory_region_write_accessor>, opaque=0x969fdf4) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:295 #17 0x081df6ed in memory_region_iorange_write (iorange=0x969fe30, offset=0, width=4, data=0) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:456 #18 0x081db000 in ioport_writel_thunk (opaque=0x969fe30, addr=3324, data=0) at /u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:225 #19 0x081dabaf in ioport_write (index=2, address=3324, data=0) at /u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:82 #20 0x081db319 in cpu_outl (addr=3324, val=0) at /u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:288 #21 0x08249bd0 in do_outp (addr=3324, size=4, val=0) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:653 #22 0x08249d5d in cpu_ioreq_pio (req=0xb7798000) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:680 #23 0x0824a25c in handle_ioreq (req=0xb7798000) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:748 #24 0x0824a539 in cpu_handle_ioreq (opaque=0x946e250) at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:823 #25 0x080a10f6 in qemu_iohandler_poll (readfds=0xbfc1aa90, writefds=0xbfc1aa10, xfds=0xbfc1a990, ret=1) at /u/iwj/work/1/qemu-upstream-4.2-testing/iohandler.c:121 #26 0x081177d7 in main_loop_wait (nonblocking=1) at /u/iwj/work/1/qemu-upstream-4.2-testing/main-loop.c:464 #27 0x0810e16f in main_loop () at /u/iwj/work/1/qemu-upstream-4.2-testing/vl.c:1481 #28 0x08112e28 in main (argc=36, argv=0xbfc1aec4, envp=0xbfc1af58) at /u/iwj/work/1/qemu-upstream-4.2-testing/vl.c:3485 (gdb) (gdb) print ram_list $1 = { phys_dirty = 0x96b6ee8 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\3 77\377\377\377\377\377\377"..., blocks = {lh_first = 0x946e748}} (gdb) print ram_list.blocks $2 = {lh_first = 0x946e748} ... (gdb) print *ram_list.blocks.lh_first $4 = {host = 0x0, offset = 0, length = 528482304, flags = 0, idstr = "xen.ram", ''\000'' <repeats 248 times>, next = {le_next = 0x948c228, le_prev = 0x8745768}, fd = 0} ... (gdb) print *ram_list.blocks.lh_first->next.le_next $5 = {host = 0xb4b62000 <Address 0xb4b62000 out of bounds>, offset = 536936448, length = 65536, flags = 0, idstr = "0000:00:04.0/rtl8139.rom", ''\000'' <repeats 231 times>, next = {le_next = 0x96a28f8, le_prev = 0x946e860}, fd = 0} (gdb) print *ram_list.blocks.lh_first->next.le_next->next.le_next $6 = {host = 0xb5cf9000 <Address 0xb5cf9000 out of bounds>, offset = 536870912, length = 65536, flags = 0, idstr = "0000:00:02.0/cirrus_vga.rom", ''\000'' <repeats 228 times>, next = {le_next = 0x96b6dc0, le_prev = 0x948c340}, fd = 0} (gdb) print *ram_list.blocks.lh_first->next.le_next->next.le_next->next.le_next $7 = {host = 0xb5e37000 <Address 0xb5e37000 out of bounds>, offset = 528482304, length = 8388608, flags = 0, idstr = "vga.vram", ''\000'' <repeats 247 times>, next = {le_next = 0x0, le_prev = 0x96a2a10}, fd = 0} (gdb)
Alex Bligh
2013-Mar-06 14:15 UTC
Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL
Ian, --On 5 March 2013 17:10:41 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:>> I haven''t yet (though possibly from failure to get hold of the relevant >> OS version). I''d actually appreciate some code review with the last of >> the qemu patches as I was unsure what was meant to be going on (as per >> last message to Stefano). > > Well, I have reproduced this. I was hoping to actually debug it but > gdb is having no luck getting a useful stack trce out of > qemu-system-i386, even after I removed a bunch of bizarre compiler > options and rebuilt it. > > I can bisect it so that''s what I''ll do next I guess.We still can''t reproduce this here, but I have recoded that routine to be more obviously correct, and it also doesn''t crash here. I suspect the problem is that I can''t get xc_hvm_track_dirty_vram to fail. I think this patch makes the routine more obviously correct, and the problem might be I was not translating through get_physmapping()->phys_offset on the failure case. This assuming the success case is correct). If you can reproduce it, perhaps you''d care to try this. Otherwise, any comments welcome. Patch appended below in no-doubt mailer mangled form for context, but already sent to the list with git send-email under Message-Id: <1362579002-7981-1-git-send-email-alex@alex.org.uk> -- Alex Bligh Subject: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV Date: Wed, 6 Mar 2013 14:10:02 +0000 Message-Id: <1362579002-7981-1-git-send-email-alex@alex.org.uk> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <89D1720972979969F5F2FA5A@Ximines.local> References: <89D1720972979969F5F2FA5A@Ximines.local> When xc_hvm_track_dirty_vram fails, iterate through pages based on vram_offset and npages, rather than start_addr and size. DPRINTF before the loop in the hope of seeing output before SEGV. Signed-off-by: Alex Bligh <alex@alex.org.uk> --- xen-all.c | 15 +++++++-------- 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/xen-all.c b/xen-all.c index dbd759c..96a34a9 100644 --- a/xen-all.c +++ b/xen-all.c @@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state, bitmap); if (rc < 0) { if (rc != -ENODATA) { - ram_addr_t addr, end; - - xen_modified_memory(start_addr, size); - - end = TARGET_PAGE_ALIGN(start_addr + size); - for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr += TARGET_PAGE_SIZE) { - cpu_physical_memory_set_dirty(addr); - } + target_phys_addr_t todirty; DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx ", 0x" TARGET_FMT_plx "): %s\n", start_addr, start_addr + size, strerror(-rc)); + + xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE); + + for (todirty = vram_offset, i=0; i < npages; todirty += TARGET_PAGE_SIZE, i++) { + cpu_physical_memory_set_dirty(todirty); + } } return rc; } -- 1.7.4.1
Ian Jackson
2013-Mar-06 14:16 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV
Alex Bligh writes ("[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV"):> When xc_hvm_track_dirty_vram fails, iterate through pages based on > vram_offset and npages, rather than start_addr and size. DPRINTF > before the loop in the hope of seeing output before SEGV.See email just sent. I have a comprehensible stack trace. If you''d still like me to try this patch I can do so but it might be easier for you to ask me to poke my gdb ... Ian.
Alex Bligh
2013-Mar-06 14:35 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV
Ian, --On 6 March 2013 14:16:22 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Alex Bligh writes ("[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV"): >> When xc_hvm_track_dirty_vram fails, iterate through pages based on >> vram_offset and npages, rather than start_addr and size. DPRINTF >> before the loop in the hope of seeing output before SEGV. > > See email just sent. I have a comprehensible stack trace. If you''d > still like me to try this patch I can do so but it might be easier for > you to ask me to poke my gdb ...Thanks. So the good news is that it is xen_sync_dirty_bitmap that is the culprit, as we suspected. My question is: does the call to cpu_physical_memory_set_dirty occur within the "if (rc<0)" block, as I suspect, or does it occur within the "for (i = 0; i < ARRAY_SIZE(bitmap); i++)" block? (it''s called in two places from xen_sync_dirty_bitmap) If the former, then I think the problem is that the address that cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs to be passed through get_physmapping()->phys_offset before use, like the second case already does. In this case the patch I''ve just sent should (hopefully) work. If the latter, then something else is broken. -- Alex Bligh
Ian Jackson
2013-Mar-06 14:45 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV
Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV"):> Thanks. So the good news is that it is xen_sync_dirty_bitmap that is > the culprit, as we suspected. > > My question is: does the call to cpu_physical_memory_set_dirty > occur within the "if (rc<0)" block, as I suspect, or does it occur > within the "for (i = 0; i < ARRAY_SIZE(bitmap); i++)" block? > (it''s called in two places from xen_sync_dirty_bitmap)Line 481, see below. (You should be able to see that from the stack trace I sent, which was generated from the tip of qemu-upstream-4.2-testing.git, eccc68722696864fc4823f048c7be58d11281b97)> If the former, then I think the problem is that the address that > cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs > to be passed through get_physmapping()->phys_offset before use, > like the second case already does. In this case the patch I''ve > just sent should (hopefully) work.479 end = TARGET_PAGE_ALIGN(start_addr + size); 480 for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr += TARGET_PAGE_SIZE) { 481 cpu_physical_memory_set_dirty(addr); 482 } 483 484 DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx I shall try your patch. Ian.
Ian Jackson
2013-Mar-06 17:12 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]
Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV"):> If the former, then I think the problem is that the address that > cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs > to be passed through get_physmapping()->phys_offset before use, > like the second case already does. In this case the patch I''ve > just sent should (hopefully) work.Indeed. Your patch fixes the problem. I discussed this with Stefano before he went away and he said I should push a fix to qemu-xen-upstream-4.2-testing.git staging, so I have done that. I edited the commit message slightly. Thanks for your help! Regards, Ian. commit 351f94ff4bf3a7795ca5b282305aa610e598eec0 Author: Alex Bligh <alex@alex.org.uk> Date: Wed Mar 6 14:59:27 2013 +0000 xen: xen_sync_dirty_bitmap: attempt to fix SEGV When xc_hvm_track_dirty_vram fails, iterate through pages based on vram_offset and npages, rather than start_addr and size. DPRINTF before the loop too. [ Fixes a regression introduced by eccc68722696864fc4823f048c7be58d11281b97 - iwj ] Signed-off-by: Alex Bligh <alex@alex.org.uk> Tested-by: Ian Jackson <ian.jackson@eu.citrix.com> diff --git a/xen-all.c b/xen-all.c index dbd759c..96a34a9 100644 --- a/xen-all.c +++ b/xen-all.c @@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state, bitmap); if (rc < 0) { if (rc != -ENODATA) { - ram_addr_t addr, end; - - xen_modified_memory(start_addr, size); - - end = TARGET_PAGE_ALIGN(start_addr + size); - for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr += TARGET_PAGE_SIZE) { - cpu_physical_memory_set_dirty(addr); - } + target_phys_addr_t todirty; DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx ", 0x" TARGET_FMT_plx "): %s\n", start_addr, start_addr + size, strerror(-rc)); + + xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE); + + for (todirty = vram_offset, i=0; i < npages; todirty += TARGET_PAGE_SIZE, i++) { + cpu_physical_memory_set_dirty(todirty); + } } return rc; }
Alex Bligh
2013-Mar-06 18:15 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]
Ian, --On 6 March 2013 17:12:12 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> Indeed. Your patch fixes the problem. > > I discussed this with Stefano before he went away and he said I should > push a fix to qemu-xen-upstream-4.2-testing.git staging, so I have > done that. I edited the commit message slightly. > > Thanks for your help!No problem. Unless I have misunderstood how the new git tree works, I think you also need to fix QEMU_UPSTREAM_REVISION in Config.mk in xen. I''d send a patch but I''m not sure whether it''s the stable commit you want or the staging commit. -- Alex Bligh
Ian Jackson
2013-Mar-07 10:57 UTC
Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]
Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]"):> No problem. Unless I have misunderstood how the new git tree works, > I think you also need to fix QEMU_UPSTREAM_REVISION in Config.mk in xen. > I''d send a patch but I''m not sure whether it''s the stable commit you > want or the staging commit.stable-4.2 will get the same commit as staging-4.2. But I think this time I''ll wait with updating the QEMU_UPSTREAM_REVISION until we get a test pass ... Thanks, Ian.