I had this problem in 4.0.1 (still not resolved) and it persists in
4.1.0-rc6-pre.
And I am not the only one facing this issue apparently.
http://lists.xensource.com/archives/html/xen-users/2011-02/msg00362.html
also reports the same issue, on xen 4.0.2-rc2
My workload was simple 2.6.18 domU (512M) with just 2 threads constantly
mallocing, touching and freeing memory.
I enabled remus on the domain (just memory replication) which basically
exercises xc_domain_save/xc_domain_restore paths.
Issue 1:
On primary during replication, xm dmesg logs are flooded with messages like
........
(XEN) mm.c:889:d0 Error getting mfn 468900 (pfn 1fdd1) from L1 entry
8000000468900625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4688fd (pfn 1fdd4) from L1 entry
80000004688fd625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4688f8 (pfn 1fdd9) from L1 entry
80000004688f8625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46889f (pfn 1fe32) from L1 entry
800000046889f625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46888c (pfn 1fe45) from L1 entry
800000046888c625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468877 (pfn 1fe5a) from L1 entry
8000000468877625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468876 (pfn 1fe5b) from L1 entry
8000000468876625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468825 (pfn 1feac) from L1 entry
8000000468825625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468824 (pfn 1fead) from L1 entry
8000000468824625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468820 (pfn 1feb1) from L1 entry
8000000468820625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46881c (pfn 1feb5) from L1 entry
800000046881c625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46881b (pfn 1feb6) from L1 entry
800000046881b625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46881a (pfn 1feb7) from L1 entry
800000046881a625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468817 (pfn 1feba) from L1 entry
8000000468817625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4687ec (pfn 1fee5) from L1 entry
80000004687ec625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4687e7 (pfn 1feea) from L1 entry
80000004687e7625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4687c8 (pfn 1ff09) from L1 entry
80000004687c8625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4687a9 (pfn 1ff28) from L1 entry
80000004687a9625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468799 (pfn 1ff38) from L1 entry
8000000468799625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468798 (pfn 1ff39) from L1 entry
8000000468798625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468791 (pfn 1ff40) from L1 entry
8000000468791625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 468790 (pfn 1ff41) from L1 entry
8000000468790625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46878d (pfn 1ff44) from L1 entry
800000046878d625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46872d (pfn 1ffa4) from L1 entry
800000046872d625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 46870d (pfn 1ffc4) from L1 entry
800000046870d625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4686fe (pfn 1ffd3) from L1 entry
80000004686fe625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4686e3 (pfn 1ffee) from L1 entry
80000004686e3625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4686dd (pfn 1fff4) from L1 entry
80000004686dd625 for l1e_owner=0, pg_owner=17
(XEN) mm.c:889:d0 Error getting mfn 4686dc (pfn 1fff5) from L1 entry
80000004686dc625 for l1e_owner=0, pg_owner=17
...........
Issue 2:
VM fails to recover on secondary when I destroy it on primary. xm
dmesg on secondary again shows issues wrt pagetable pinning
(XEN) mm.c:802:d0 Bad L1 flags 400010
(XEN) mm.c:1204:d0 Failure in alloc_l1_table: entry 16
(XEN) mm.c:2142:d0 Error while validating mfn 4229c1 (pfn 1bf44) for
type 1000000000000000: caf=8000000000000002 taf=1000000000000001
(XEN) mm.c:897:d0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1348:d0 Failure in alloc_l2_table: entry 433
(XEN) mm.c:2142:d0 Error while validating mfn 421639 (pfn 1e654) for
type 2000000000000000: caf=8000000000000002 taf=2000000000000001
(XEN) mm.c:1458:d0 Failure in alloc_l3_table: entry 1
(XEN) mm.c:2142:d0 Error while validating mfn 44d975 (pfn 1e62d) for
type 3000000000000000: caf=8000000000000002 taf=3000000000000001
(XEN) mm.c:2965:d0 Error while pinning mfn 44d975
and xend.log on target machine shows
[2011-02-24 13:25:25 2868] DEBUG (XendCheckpoint:278)
restore:shadow=0x0, _static_max=0x20000000, _static_min=0x0,
[2011-02-24 13:25:25 2868] DEBUG (XendCheckpoint:305) [xc_restore]:
/usr/lib/xen/bin/xc_restore 16 9 1 2 0 0 0 0
[2011-02-24 13:28:14 2868] INFO (XendCheckpoint:423) xc: error:
0-length read: Internal error
[2011-02-24 13:28:14 2868] INFO (XendCheckpoint:423) xc: error:
read_exact_timed failed (read rc: 0, errno: 0): Internal error
[2011-02-24 13:28:14 2868] INFO (XendCheckpoint:423) xc: error: Error
when reading ctxt (0 = Success): Internal error
[2011-02-24 13:28:14 2868] INFO (XendCheckpoint:423) xc: error: error
buffering image tail, finishing: Internal error
[2011-02-24 13:28:14 2868] INFO (XendCheckpoint:423) xc: error: Failed
to pin batch of 18 page tables (22 = Invalid argument): Internal error
I wager this has got something to do either with the
canonicalization/uncanonicalization code but cannot pin point
where exactly, atm.
shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel