thr3ads.net - Xen devel - [qemu-upstream-4.2-testing test] 16779: regressions

If this information is useful, please help other people find it:
Share via:

xen.org

2013-Mar-01 14:07 UTC

[qemu-upstream-4.2-testing test] 16779: regressions - FAIL

flight 16779 qemu-upstream-4.2-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-i386-i386-xl-qemuu-winxpsp3  7 windows-install       fail REGR. vs. 15122

Tests which did not succeed, but are not blocking:
 build-armhf                   4 xen-build                    fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop             fail never pass
 test-amd64-amd64-xl-qemuu-win 13 guest-stop                   fail  never pass
 test-amd64-i386-qemuu-win-vcpus1 16 leak-check/check           fail never pass
 test-i386-i386-qemuu-win     16 leak-check/check             fail   never pass
 test-amd64-i386-qemuu-win    16 leak-check/check             fail   never pass
 test-amd64-i386-xl-qemuu-win-vcpus1 13 guest-stop              fail never pass
 test-amd64-i386-xend-qemuu-winxpsp3 16 leak-check/check        fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 13 guest-stop         fail never pass
 test-i386-i386-xl-qemuu-win  12 guest-localmigrate/x10       fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop               fail never pass
 test-amd64-amd64-qemuu-win   16 leak-check/check             fail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 13 guest-stop              fail never pass

version targeted for testing:
 qemuu                eccc68722696864fc4823f048c7be58d11281b97
baseline version:
 qemuu                70454385eeee6f0b3f7a9eddca9f7340b5060824

------------------------------------------------------------
People who touched revisions under test:
  Alex Bligh <alex@alex.org.uk>
  Anthony PERARD <anthony.perard@citrix.com>
  Stefano Stabellini <stefano.stabellini@eu.citrix.com>
------------------------------------------------------------

jobs:
 build-amd64                                                  pass    
 build-armhf                                                  fail    
 build-i386                                                   pass    
 build-amd64-oldkern                                          pass    
 build-i386-oldkern                                           pass    
 build-amd64-pvops                                            pass    
 build-i386-pvops                                             pass    
 test-amd64-i386-qemuu-rhel6hvm-amd                           pass    
 test-amd64-amd64-xl-qemuu-win7-amd64                         fail    
 test-amd64-i386-xl-qemuu-win7-amd64                          fail    
 test-amd64-i386-qemuu-rhel6hvm-intel                         pass    
 test-amd64-i386-qemuu-win-vcpus1                             fail    
 test-amd64-i386-xl-qemuu-win-vcpus1                          fail    
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1                     fail    
 test-amd64-amd64-qemuu-win                                   fail    
 test-amd64-i386-qemuu-win                                    fail    
 test-i386-i386-qemuu-win                                     fail    
 test-amd64-amd64-xl-qemuu-win                                fail    
 test-i386-i386-xl-qemuu-win                                  fail    
 test-amd64-i386-xend-qemuu-winxpsp3                          fail    
 test-amd64-amd64-xl-qemuu-winxpsp3                           fail    
 test-i386-i386-xl-qemuu-winxpsp3                             fail    


------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
    http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
    http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit eccc68722696864fc4823f048c7be58d11281b97
Author: Anthony PERARD <anthony.perard@citrix.com>
Date:   Thu Feb 21 12:16:42 2013 +0000

    xen: Set the vram dirty when an error occurs.
    
    If the call to xc_hvm_track_dirty_vram() fails, then we set dirtybit on all
the
    video ram. This case happens during migration.
    
    Backport of 8aba7dc02d5660df7e7d8651304b3079908358be
    
    This backport is less clean that it might be because there is no
    memory_region_set_dirty that copes with more than one page in 4.2,
    and the case where the call to xc_hvm_track_dirty_vram is
    successful also needs to ensure xen_modified_memory is
    called (which would on unstable have been done within
    memory_region_set_dirty).
    
    Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
    Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Signed-off-by: Alex Bligh <alex@alex.org.uk>

Ian Jackson

2013-Mar-01 16:15 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

xen.org writes ("[qemu-upstream-4.2-testing test] 16779: regressions -
FAIL"):> flight 16779 qemu-upstream-4.2-testing real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-i386-i386-xl-qemuu-winxpsp3  7 windows-install       fail REGR. vs.
15122
This is a persistent failure.
> version targeted for testing:
>  qemuu                eccc68722696864fc4823f048c7be58d11281b97
> baseline version:
>  qemuu                70454385eeee6f0b3f7a9eddca9f7340b5060824
Looking at the changes, it seems likely to be related to the migration
dirty bit tracking series.  These were originally written by Anthony
Perard and committed to qemu-upstream-unstable where they work, and
then backported by Alex Blight.

So the problem is probably something in Alex''s backports ?

The logs show this, most relevantly:

  Mar  1 09:19:52 field-cricket kernel: [   74.765641]
  qemu-system-i38[2474]: segfault at b909fed0 ip b77b3b3e sp bff1bae0
  error 6 in qemu-system-i386[b7554000+2a5000]

I''m afraid we don''t have a stack backtrace.

qemu seems to have printed this:

  DSsDSsDSsDSs

which is a bit odd!


We want to be able to release Xen 4.2 soon.  Our next step is probably
to revert the migration changes so that we have a tree which works as
well as it did before.

Ian.

Alex Bligh

2013-Mar-01 21:18 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Ian, Stefano,

--On 1 March 2013 16:15:16 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> 
wrote:
> Looking at the changes, it seems likely to be related to the migration
> dirty bit tracking series.  These were originally written by Anthony
> Perard and committed to qemu-upstream-unstable where they work, and
> then backported by Alex Blight.
>
> So the problem is probably something in Alex''s backports ?
Do we know from the test thing that it''s definitely the last patch
in the series (i.e. the specific one indicated)? That last patch is
the only one with substantive modifications to it, and I wasn''t
entirely clear what the original code did (as posted here).

Also do we know how to trigger/replicate this (I''m afraid I
don''t
know how to read the output).
> We want to be able to release Xen 4.2 soon.  Our next step is probably
> to revert the migration changes so that we have a tree which works as
> well as it did before.
Stefano had an alternate way of fixing this though it involved un-inlining
something IIRC. Perhaps I should go back and look at that.

Any ideas Stefano?

-- 
Alex Bligh

Stefano Stabellini

2013-Mar-03 01:57 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

On Fri, 1 Mar 2013, Alex Bligh wrote:> Ian, Stefano,
> 
> --On 1 March 2013 16:15:16 +0000 Ian Jackson
<Ian.Jackson@eu.citrix.com>
> wrote:
> 
> > Looking at the changes, it seems likely to be related to the migration
> > dirty bit tracking series.  These were originally written by Anthony
> > Perard and committed to qemu-upstream-unstable where they work, and
> > then backported by Alex Blight.
> >
> > So the problem is probably something in Alex''s backports ?
> 
> Do we know from the test thing that it''s definitely the last patch
> in the series (i.e. the specific one indicated)? That last patch is
> the only one with substantive modifications to it, and I wasn''t
> entirely clear what the original code did (as posted here).
> 
> Also do we know how to trigger/replicate this (I''m afraid I
don''t
> know how to read the output).
I think it''s just a standard Windows XP SP3 install in a VM

> > We want to be able to release Xen 4.2 soon.  Our next step is probably
> > to revert the migration changes so that we have a tree which works as
> > well as it did before.
> 
> Stefano had an alternate way of fixing this though it involved un-inlining
> something IIRC. Perhaps I should go back and look at that.
> 
> Any ideas Stefano?
It could be any of the last 6 commits: I would start by bisecting the
problem further until you manage to pinpoint the precise backport that
caused it. But like you say, it is probably the last one.
Consider that the changes introduced by the last commit are only useful
for live-migration, you can try to remove one by one to figure out
exactly what''s breaking Windows install.
My guess would be this chunk:


@@ -470,7 +470,21 @@ static int xen_sync_dirty_bitmap(XenIOState *state,
     rc = xc_hvm_track_dirty_vram(xen_xc, xen_domid,
                                  start_addr >> TARGET_PAGE_BITS, npages,
                                  bitmap);
-    if (rc) {
+    if (rc < 0) {
+        if (rc != -ENODATA) {
+            ram_addr_t addr, end;
+
+            xen_modified_memory(start_addr, size);
+            
+            end = TARGET_PAGE_ALIGN(start_addr + size);
+            for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr
+= TARGET_PAGE_SIZE) {
+                cpu_physical_memory_set_dirty(addr);
+            }
+
+            DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx
+                    ", 0x" TARGET_FMT_plx "): %s\n",
+                    start_addr, start_addr + size, strerror(-rc));
+        }
         return rc;
     }
 
in particular the loop around cpu_physical_memory_set_dirty

Ian Jackson

2013-Mar-04 11:38 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions
- FAIL"):> --On 1 March 2013 16:15:16 +0000 Ian Jackson
<Ian.Jackson@eu.citrix.com>
> wrote:
> > Looking at the changes, it seems likely to be related to the migration
> > dirty bit tracking series.  These were originally written by Anthony
> > Perard and committed to qemu-upstream-unstable where they work, and
> > then backported by Alex Blight.
> >
> > So the problem is probably something in Alex''s backports ?
> 
> Do we know from the test thing that it''s definitely the last patch
> in the series (i.e. the specific one indicated)? That last patch is
> the only one with substantive modifications to it, and I wasn''t
> entirely clear what the original code did (as posted here).
No, we don''t.  (In fact I think it''s a bug that it
doesn''t show the
whole series of 6 commits.)
> Also do we know how to trigger/replicate this (I''m afraid I
don''t
> know how to read the output).
As Stefano says, it''s a 32-bit Windows XP install.  It''s on a
32-bit
hypervisor.  The full details can be found from various links here:

 http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/

The key part this this, which is from the link at the top of the
column with the regression (whose cell is highlighted in pink):

 http://www.chiark.greenend.org.uk/~xensrcts/logs/16779/test-i386-i386-xl-qemuu-winxpsp3/info.html

There you can find various logfiles, but crucially also the domain
configuration.

I''m afraid the ISO used for the installation is not available for
obvious reasons.

Ian.

Ian Jackson

2013-Mar-04 11:40 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Stefano Stabellini writes ("Re: [qemu-upstream-4.2-testing test] 16779:
regressions - FAIL"):> It could be any of the last 6 commits: I would start by bisecting the
> problem further until you manage to pinpoint the precise backport that
> caused it. But like you say, it is probably the last one.
Unfortunately the automatic bisection has been scuppered by the switch
from xen.hg to xen.git.  Alex, if you can''t repro the problem let me
know and I can investigate it.

Ian.

Alex Bligh

2013-Mar-04 15:08 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Stefano,
> It could be any of the last 6 commits: I would start by bisecting the
> problem further until you manage to pinpoint the precise backport that
> caused it.
My problem is trying to replicate it...
> But like you say, it is probably the last one.
> Consider that the changes introduced by the last commit are only useful
> for live-migration, you can try to remove one by one to figure out
> exactly what''s breaking Windows install.
> My guess would be this chunk:
Me too. This was the bit that I couldn''t actually work out what it was 
doing and why.

In the original patch (below) it called memory_region_set_dirty, which
doesn''t exist in 4.2, with the parameter
''framebuffer'', which also doesn''t
exist in 4.2. This only gets called if xc_hvm_track_dirty_vram returns
an error other than ENODATA. I had presumed the intent was to mark the
whole framebuffer as dirty if xc_hvm_track_dirty_vram failed this way.

Does the loop start (and end) need to start from vram_offset (i.e.
physmap->phys_offset), rather than start_addr? I suspect my confusion
is that both start_addr and vram_offset are target_phys_addr_t but
one is one side of a mapping and one the other.
>
> @@ -470,7 +470,21 @@ static int xen_sync_dirty_bitmap(XenIOState *state,
>      rc = xc_hvm_track_dirty_vram(xen_xc, xen_domid,
>                                   start_addr >> TARGET_PAGE_BITS,
npages,
>                                   bitmap);
> -    if (rc) {
> +    if (rc < 0) {
> +        if (rc != -ENODATA) {
> +            ram_addr_t addr, end;
> +
> +            xen_modified_memory(start_addr, size);
> +
> +            end = TARGET_PAGE_ALIGN(start_addr + size);
> +            for (addr = start_addr & TARGET_PAGE_MASK; addr < end;
addr
> += TARGET_PAGE_SIZE) { +
> cpu_physical_memory_set_dirty(addr);
> +            }
> +
> +            DPRINTF("xen: track_dirty_vram failed (0x"
TARGET_FMT_plx
> +                    ", 0x" TARGET_FMT_plx "): %s\n",
> +                    start_addr, start_addr + size, strerror(-rc));
> +        }
>          return rc;
>      }
>
> in particular the loop around cpu_physical_memory_set_dirty
>
>

Original patch:

diff --git a/xen-all.c b/xen-all.c
index b11542c..e6308be 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -507,7 +507,8 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
                                  bitmap);
     if (rc < 0) {
         if (rc != -ENODATA) {
-            fprintf(stderr, "xen: track_dirty_vram failed (0x" 
TARGET_FMT_plx
+            memory_region_set_dirty(framebuffer, 0, size);
+            DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx
                     ", 0x" TARGET_FMT_plx "): %s\n",
                     start_addr, start_addr + size, strerror(-rc));
         }


-- 
Alex Bligh

Alex Bligh

2013-Mar-04 15:14 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Ian,

--On 4 March 2013 11:40:00 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> 
wrote:
> Stefano Stabellini writes ("Re: [qemu-upstream-4.2-testing test]
16779:
> regressions - FAIL"):
>> It could be any of the last 6 commits: I would start by bisecting the
>> problem further until you manage to pinpoint the precise backport that
>> caused it. But like you say, it is probably the last one.
>
> Unfortunately the automatic bisection has been scuppered by the switch
> from xen.hg to xen.git.  Alex, if you can''t repro the problem let
me
> know and I can investigate it.
I haven''t yet (though possibly from failure to get hold of the relevant
OS version). I''d actually appreciate some code review with the last of
the
qemu patches as I was unsure what was meant to be going on (as per
last message to Stefano).

-- 
Alex Bligh

Ian Jackson

2013-Mar-05 17:10 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions
- FAIL"):> I haven''t yet (though possibly from failure to get hold of the
relevant
> OS version). I''d actually appreciate some code review with the
last of the
> qemu patches as I was unsure what was meant to be going on (as per
> last message to Stefano).
Well, I have reproduced this.  I was hoping to actually debug it but
gdb is having no luck getting a useful stack trce out of
qemu-system-i386, even after I removed a bunch of bizarre compiler
options and rebuilt it.

I can bisect it so that''s what I''ll do next I guess.

Ian.

Alex Bligh

2013-Mar-06 14:10 UTC

head link

[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

When xc_hvm_track_dirty_vram fails, iterate through pages based on
vram_offset and npages, rather than start_addr and size. DPRINTF
before the loop in the hope of seeing output before SEGV.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 xen-all.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen-all.c b/xen-all.c
index dbd759c..96a34a9 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state,
                                  bitmap);
     if (rc < 0) {
         if (rc != -ENODATA) {
-            ram_addr_t addr, end;
-
-            xen_modified_memory(start_addr, size);
-            
-            end = TARGET_PAGE_ALIGN(start_addr + size);
-            for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr
+= TARGET_PAGE_SIZE) {
-                cpu_physical_memory_set_dirty(addr);
-            }
+            target_phys_addr_t todirty;
 
             DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx
                     ", 0x" TARGET_FMT_plx "): %s\n",
                     start_addr, start_addr + size, strerror(-rc));
+
+            xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE);
+
+            for (todirty = vram_offset, i=0; i < npages; todirty +=
TARGET_PAGE_SIZE, i++) {
+                cpu_physical_memory_set_dirty(todirty);
+            }
         }
         return rc;
     }
-- 
1.7.4.1

Ian Jackson

2013-Mar-06 14:14 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Alex Bligh writes ("Re: [qemu-upstream-4.2-testing test] 16779: regressions
- FAIL"):> In the original patch (below) it called memory_region_set_dirty, which
> doesn''t exist in 4.2, with the parameter
''framebuffer'', which also doesn''t
> exist in 4.2. This only gets called if xc_hvm_track_dirty_vram returns
> an error other than ENODATA. I had presumed the intent was to mark the
> whole framebuffer as dirty if xc_hvm_track_dirty_vram failed this way.
I have managed to get a stack trace out of this crash.  See below.

Looking at the code we seem to be calling
cpu_physical_memory_set_dirty with an invalid address.  It has no
checking and directly uses the supplied address (shifted to index one
byte per page) in an array lookup.

cpu_physical_*''s array for this, ram_list.phys_dirty, is allocated in
find_ram_offset and qemu_ram_alloc_from_ptr.

It seems likely to me that the code in xen_sync_dirty_bitmap which
calls cpu_physical_memory_set_dirty should only be entered, at least
via this path, with a valid ram address, as it is itself being called
from cpu_physical_sync_dirty_bitmap (the assumptions here aren''t
stated but it seems reasonable to assume that all the ram_addr_t''s
supplied to cpu_physical_* are supposed to be valid ram) ?

But it seems that this is not the case.  At least, the address here is
not in any of the blocks in ram_list.

I''m afraid I have no idea whether the right fix is for
xen_sync_dirty_bitmap to somehow check whether the address is
relevant, or whether the bug is higher up in the call chain.

Ian.


Program received signal SIGSEGV, Segmentation fault.
0x082483a2 in cpu_physical_memory_set_dirty (addr=4026531840)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h:540
540     /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h: No such file or
directory.
        in /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h
(gdb) bt
#0  0x082483a2 in cpu_physical_memory_set_dirty (addr=4026531840)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/cpu-all.h:540
#1  0x08249735 in xen_sync_dirty_bitmap (state=0x946e250, start_addr=4026531840,
size=8388608)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:481
#2  0x08249953 in xen_client_sync_dirty_bitmap (client=0x946e270,
start_addr=4026531840, end_addr=4034920448)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:527
#3  0x081b43cb in cpu_notify_sync_dirty_bitmap (start=4026531840,
end=4034920448)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/exec.c:1751
#4  0x081b506d in cpu_physical_sync_dirty_bitmap (start_addr=4026531840,
end_addr=4034920448)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/exec.c:2139
#5  0x081df005 in as_memory_range_del (as=0x832783c, fr=0x948da30)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:336
#6  0x081e0b84 in address_space_update_topology_pass (as=0x832783c,
old_view=..., new_view=..., adding=false)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:711
#7  0x081e0ccd in address_space_update_topology (as=0x832783c)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:746
#8  0x081e0d5b in memory_region_update_topology () at
/u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:761
#9  0x081e2aa5 in memory_region_del_subregion (mr=0x969ed38,
subregion=0x96b4628)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:1304
#10 0x080e8a33 in pci_update_mappings (d=0x96a3b00) at
/u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci.c:997
#11 0x080e8d86 in pci_default_write_config (d=0x96a3b00, addr=4, val=0, l=4)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci.c:1050
#12 0x080ebbbe in pci_host_config_write_common (pci_dev=0x96a3b00, addr=4,
limit=256, val=0, len=4)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:54
#13 0x080ebc80 in pci_data_write (s=0x969fe98, addr=2147487748, val=0, len=4)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:75
#14 0x080ebda7 in pci_host_data_write (opaque=0x969f018, addr=0, val=0, len=4)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/hw/pci_host.c:125
#15 0x081ded13 in memory_region_write_accessor (opaque=0x969fdf4, addr=0,
value=0xbfc1a700, size=4, shift=0,
    mask=4294967295) at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:265
#16 0x081dedde in access_with_adjusted_size (addr=0, value=0xbfc1a700, size=4,
access_size_min=1,
    access_size_max=4, access=0x81dec8f <memory_region_write_accessor>,
opaque=0x969fdf4)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:295
#17 0x081df6ed in memory_region_iorange_write (iorange=0x969fe30, offset=0,
width=4, data=0)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/memory.c:456
#18 0x081db000 in ioport_writel_thunk (opaque=0x969fe30, addr=3324, data=0)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:225
#19 0x081dabaf in ioport_write (index=2, address=3324, data=0)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:82
#20 0x081db319 in cpu_outl (addr=3324, val=0) at
/u/iwj/work/1/qemu-upstream-4.2-testing/ioport.c:288
#21 0x08249bd0 in do_outp (addr=3324, size=4, val=0) at
/u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:653
#22 0x08249d5d in cpu_ioreq_pio (req=0xb7798000) at
/u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:680
#23 0x0824a25c in handle_ioreq (req=0xb7798000) at
/u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:748
#24 0x0824a539 in cpu_handle_ioreq (opaque=0x946e250) at
/u/iwj/work/1/qemu-upstream-4.2-testing/xen-all.c:823
#25 0x080a10f6 in qemu_iohandler_poll (readfds=0xbfc1aa90, writefds=0xbfc1aa10,
xfds=0xbfc1a990, ret=1)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/iohandler.c:121
#26 0x081177d7 in main_loop_wait (nonblocking=1) at
/u/iwj/work/1/qemu-upstream-4.2-testing/main-loop.c:464
#27 0x0810e16f in main_loop () at
/u/iwj/work/1/qemu-upstream-4.2-testing/vl.c:1481
#28 0x08112e28 in main (argc=36, argv=0xbfc1aec4, envp=0xbfc1af58)
    at /u/iwj/work/1/qemu-upstream-4.2-testing/vl.c:3485
(gdb) 

(gdb) print ram_list
$1 = {
  phys_dirty = 0x96b6ee8
"\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\3
 77\377\377\377\377\377\377"..., blocks = {lh_first = 0x946e748}}
(gdb) print ram_list.blocks
$2 = {lh_first = 0x946e748}
...
(gdb) print *ram_list.blocks.lh_first
$4 = {host = 0x0, offset = 0, length = 528482304, flags = 0, idstr =
"xen.ram", ''\000'' <repeats 248 times>,
  next = {le_next = 0x948c228, le_prev = 0x8745768}, fd = 0}
...
(gdb) print *ram_list.blocks.lh_first->next.le_next
$5 = {host = 0xb4b62000 <Address 0xb4b62000 out of bounds>, offset =
536936448, length = 65536, flags = 0,
  idstr = "0000:00:04.0/rtl8139.rom", ''\000''
<repeats 231 times>, next = {le_next = 0x96a28f8,
    le_prev = 0x946e860}, fd = 0}
(gdb) print *ram_list.blocks.lh_first->next.le_next->next.le_next
$6 = {host = 0xb5cf9000 <Address 0xb5cf9000 out of bounds>, offset =
536870912, length = 65536, flags = 0,
  idstr = "0000:00:02.0/cirrus_vga.rom", ''\000''
<repeats 228 times>, next = {le_next = 0x96b6dc0,
    le_prev = 0x948c340}, fd = 0}
(gdb) print
*ram_list.blocks.lh_first->next.le_next->next.le_next->next.le_next
$7 = {host = 0xb5e37000 <Address 0xb5e37000 out of bounds>, offset =
528482304, length = 8388608, flags = 0,
  idstr = "vga.vram", ''\000'' <repeats 247
times>, next = {le_next = 0x0, le_prev = 0x96a2a10}, fd = 0}
(gdb)

Alex Bligh

2013-Mar-06 14:15 UTC

head link

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Ian,

--On 5 March 2013 17:10:41 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
>> I haven''t yet (though possibly from failure to get hold of the
relevant
>> OS version). I''d actually appreciate some code review with the
last of
>> the qemu patches as I was unsure what was meant to be going on (as per
>> last message to Stefano).
>
> Well, I have reproduced this.  I was hoping to actually debug it but
> gdb is having no luck getting a useful stack trce out of
> qemu-system-i386, even after I removed a bunch of bizarre compiler
> options and rebuilt it.
>
> I can bisect it so that''s what I''ll do next I guess.
We still can''t reproduce this here, but I have recoded that routine to
be more obviously correct, and it also doesn''t crash here. I suspect
the
problem is that I can''t get xc_hvm_track_dirty_vram to fail.

I think this patch makes the routine more obviously correct, and the
problem might be I was not translating through
get_physmapping()->phys_offset on the failure case. This assuming
the success case is correct).

If you can reproduce it, perhaps you''d care to try this. Otherwise,
any comments welcome.

Patch appended below in no-doubt mailer mangled form for context,
but already sent to the list with git send-email under

Message-Id: <1362579002-7981-1-git-send-email-alex@alex.org.uk>

-- 
Alex Bligh



Subject: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV
Date: Wed,  6 Mar 2013 14:10:02 +0000
Message-Id: <1362579002-7981-1-git-send-email-alex@alex.org.uk>
X-Mailer: git-send-email 1.7.4.1
In-Reply-To: <89D1720972979969F5F2FA5A@Ximines.local>
References: <89D1720972979969F5F2FA5A@Ximines.local>

When xc_hvm_track_dirty_vram fails, iterate through pages based on
vram_offset and npages, rather than start_addr and size. DPRINTF
before the loop in the hope of seeing output before SEGV.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 xen-all.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen-all.c b/xen-all.c
index dbd759c..96a34a9 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state,
                                  bitmap);
     if (rc < 0) {
         if (rc != -ENODATA) {
-            ram_addr_t addr, end;
-
-            xen_modified_memory(start_addr, size);
-
-            end = TARGET_PAGE_ALIGN(start_addr + size);
-            for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr
+= TARGET_PAGE_SIZE) {
-                cpu_physical_memory_set_dirty(addr);
-            }
+            target_phys_addr_t todirty;

             DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx
                     ", 0x" TARGET_FMT_plx "): %s\n",
                     start_addr, start_addr + size, strerror(-rc));
+
+            xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE);
+
+            for (todirty = vram_offset, i=0; i < npages; todirty +=
TARGET_PAGE_SIZE, i++) {
+                cpu_physical_memory_set_dirty(todirty);
+            }
         }
         return rc;
     }
-- 
1.7.4.1

Ian Jackson

2013-Mar-06 14:16 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Alex Bligh writes ("[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix
SEGV"):> When xc_hvm_track_dirty_vram fails, iterate through pages based on
> vram_offset and npages, rather than start_addr and size. DPRINTF
> before the loop in the hope of seeing output before SEGV.
See email just sent.  I have a comprehensible stack trace.  If you''d
still like me to try this patch I can do so but it might be easier for
you to ask me to poke my gdb ...

Ian.

Alex Bligh

2013-Mar-06 14:35 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Ian,

--On 6 March 2013 14:16:22 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
> Alex Bligh writes ("[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix
SEGV"):
>> When xc_hvm_track_dirty_vram fails, iterate through pages based on
>> vram_offset and npages, rather than start_addr and size. DPRINTF
>> before the loop in the hope of seeing output before SEGV.
>
> See email just sent.  I have a comprehensible stack trace.  If
you''d
> still like me to try this patch I can do so but it might be easier for
> you to ask me to poke my gdb ...
Thanks. So the good news is that it is xen_sync_dirty_bitmap that is
the culprit, as we suspected.

My question is: does the call to cpu_physical_memory_set_dirty
occur within the "if (rc<0)" block, as I suspect, or does it occur
within the "for (i = 0; i < ARRAY_SIZE(bitmap); i++)" block?
(it''s called in two places from xen_sync_dirty_bitmap)

If the former, then I think the problem is that the address that
cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs
to be passed through get_physmapping()->phys_offset before use,
like the second case already does. In this case the patch I''ve
just sent should (hopefully) work.

If the latter, then something else is broken.

-- 
Alex Bligh

Ian Jackson

2013-Mar-06 14:45 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix
SEGV"):> Thanks. So the good news is that it is xen_sync_dirty_bitmap that is
> the culprit, as we suspected.
> 
> My question is: does the call to cpu_physical_memory_set_dirty
> occur within the "if (rc<0)" block, as I suspect, or does it
occur
> within the "for (i = 0; i < ARRAY_SIZE(bitmap); i++)" block?
> (it''s called in two places from xen_sync_dirty_bitmap)
Line 481, see below.  (You should be able to see that from the stack
trace I sent, which was generated from the tip of
qemu-upstream-4.2-testing.git, eccc68722696864fc4823f048c7be58d11281b97)
> If the former, then I think the problem is that the address that
> cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs
> to be passed through get_physmapping()->phys_offset before use,
> like the second case already does. In this case the patch I''ve
> just sent should (hopefully) work.
   479	            end = TARGET_PAGE_ALIGN(start_addr + size);
   480	            for (addr = start_addr & TARGET_PAGE_MASK; addr < end;
addr += TARGET_PAGE_SIZE) {
   481	                cpu_physical_memory_set_dirty(addr);
   482	            }
   483	
   484	            DPRINTF("xen: track_dirty_vram failed (0x"
TARGET_FMT_plx

I shall try your patch.

Ian.

Ian Jackson

2013-Mar-06 17:12 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]

Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix
SEGV"):> If the former, then I think the problem is that the address that
> cpu_physical_memory_set_dirty (and xen_modified_memory) takes needs
> to be passed through get_physmapping()->phys_offset before use,
> like the second case already does. In this case the patch I''ve
> just sent should (hopefully) work.
Indeed.  Your patch fixes the problem.

I discussed this with Stefano before he went away and he said I should
push a fix to qemu-xen-upstream-4.2-testing.git staging, so I have
done that.  I edited the commit message slightly.

Thanks for your help!

Regards,
Ian.

commit 351f94ff4bf3a7795ca5b282305aa610e598eec0
Author: Alex Bligh <alex@alex.org.uk>
Date:   Wed Mar 6 14:59:27 2013 +0000

    xen: xen_sync_dirty_bitmap: attempt to fix SEGV
    
    When xc_hvm_track_dirty_vram fails, iterate through pages based on
    vram_offset and npages, rather than start_addr and size. DPRINTF
    before the loop too.
    
    [ Fixes a regression introduced by
      eccc68722696864fc4823f048c7be58d11281b97 - iwj ]
    
    Signed-off-by: Alex Bligh <alex@alex.org.uk>
    Tested-by: Ian Jackson <ian.jackson@eu.citrix.com>

diff --git a/xen-all.c b/xen-all.c
index dbd759c..96a34a9 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -472,18 +472,17 @@ static int xen_sync_dirty_bitmap(XenIOState *state,
                                  bitmap);
     if (rc < 0) {
         if (rc != -ENODATA) {
-            ram_addr_t addr, end;
-
-            xen_modified_memory(start_addr, size);
-            
-            end = TARGET_PAGE_ALIGN(start_addr + size);
-            for (addr = start_addr & TARGET_PAGE_MASK; addr < end; addr
+= TARGET_PAGE_SIZE) {
-                cpu_physical_memory_set_dirty(addr);
-            }
+            target_phys_addr_t todirty;
 
             DPRINTF("xen: track_dirty_vram failed (0x" TARGET_FMT_plx
                     ", 0x" TARGET_FMT_plx "): %s\n",
                     start_addr, start_addr + size, strerror(-rc));
+
+            xen_modified_memory(vram_offset, npages * TARGET_PAGE_SIZE);
+
+            for (todirty = vram_offset, i=0; i < npages; todirty +=
TARGET_PAGE_SIZE, i++) {
+                cpu_physical_memory_set_dirty(todirty);
+            }
         }
         return rc;
     }

Alex Bligh

2013-Mar-06 18:15 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]

Ian,

--On 6 March 2013 17:12:12 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
> Indeed.  Your patch fixes the problem.
>
> I discussed this with Stefano before he went away and he said I should
> push a fix to qemu-xen-upstream-4.2-testing.git staging, so I have
> done that.  I edited the commit message slightly.
>
> Thanks for your help!
No problem. Unless I have misunderstood how the new git tree works,
I think you also need to fix QEMU_UPSTREAM_REVISION in Config.mk in xen.
I''d send a patch but I''m not sure whether it''s the
stable commit you
want or the staging commit.

-- 
Alex Bligh

Ian Jackson

2013-Mar-07 10:57 UTC

head link

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]

Alex Bligh writes ("Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix
SEGV [and 1 more messages]"):> No problem. Unless I have misunderstood how the new git tree works,
> I think you also need to fix QEMU_UPSTREAM_REVISION in Config.mk in xen.
> I''d send a patch but I''m not sure whether it''s
the stable commit you
> want or the staging commit.
stable-4.2 will get the same commit as staging-4.2.  But I think this
time I''ll wait with updating the QEMU_UPSTREAM_REVISION until we get a
test pass ...

Thanks,
Ian.

Xen devel - Mar 2013 - [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

[qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

[PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [qemu-upstream-4.2-testing test] 16779: regressions - FAIL

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]

Re: [PATCH] xen: xen_sync_dirty_bitmap: attempt to fix SEGV [and 1 more messages]