Shriram Rajagopalan
2010-Sep-07  05:56 UTC
[Xen-devel] remus failure -xen 4.0.1: xc_domain_restore cannot pin page tables
Hardware: Dell Poweredge R510 (32G ram, 8 CPU- Xeon)
64bit - xen 4.0.1 stable
64bit - 2.6.32.18 dom0 (.config attached) running Ubuntu 10.04
32 bit - 2.6.18.8 domU (.config attached) running ubuntu 8.04
domU has 3 tap2 disks, on lvm snapshots.
 domU has 2G mem, 2 VCPU
workload on domU - ssh + top running, destroy domain -- This works .
But, If i run a heavier workload say postgres db (just starting db, no
queries), remus fails to recover. Note that this is not spurious timeout
error.
 On destroying the vm on primary, the backup fails to recover the vm with
the following error in xm dmesg:
(XEN) mm.c:779:d0 Bad L1 flags 98
(XEN) mm.c:1186:d0 Failure in alloc_l1_table: entry 1
(XEN) mm.c:2117:d0 Error while validating mfn 4101af (pfn 2cc08) for type
1000000000000000: caf=8000000000000003 taf=1000000000000001
(XEN) mm.c:868:d0 Attempt to create linear p.t. with write perms
(XEN) mm.c:1330:d0 Failure in alloc_l2_table: entry 113
(XEN) mm.c:2117:d0 Error while validating mfn 40fc4c (pfn 2d1ce) for type
2000000000000000: caf=8000000000000003 taf=2000000000000001
(XEN) mm.c:1440:d0 Failure in alloc_l3_table: entry 2
(XEN) mm.c:2117:d0 Error while validating mfn 40fcdf (pfn 2d08d) for type
3000000000000000: caf=8000000000000003 taf=3000000000000001
(XEN) mm.c:2733:d0 Error while pinning mfn 40fcdf
===========
Error in xend.log @ backup
-----------------------------
[2010-09-06 21:38:16 2392] DEBUG (XendDomainInfo:1804) Storing domain
details: {''image/entry'': ''3222274048'',
''console/port'': ''2'',
''image/loader'':
''generic'',
''vm'':
''/vm/7be5f9bf-da53-6c10-d4e5-330940210966'',
''control/platform-feature-multiprocessor-suspend'':
''1'',
''image/hv-start-low'': ''4118806528'',
''image/guest-os
'': ''linux'', ''cpu/1/availability'':
''online'',
''image/features/writable-descriptor-tables'':
''1'', ''image/virt-base'':
''3221225472'', ''memory/target'':
''2048000'', ''i
mage/guest-version'': ''2.6'',
''image/features/supervisor-mode-kernel'':
''1'',
''image/pae-mode'': ''yes'',
''description'': '''',
''console/limit'': ''1048576'',
''image/padd
r-offset'': ''3221225472'',
''image/hypercall-page'': ''3222278144'',
''image/suspend-cancel'': ''1'',
''cpu/0/availability'': ''online'',
''image/features/pae-pgdir-above-4
gb'': ''1'',
''image/features/writable-page-tables'': ''1'',
''console/type'':
''xenconsoled'',
''image/features/auto-translated-physmap'':
''1'', ''name'':
''tpccExpt-remus'',
 ''domid'': ''6'',
''image/xen-version'': ''xen-3.0'',
''store/port'': ''1''}
[2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0,
_static_max=0x7d000000, _static_min=0x0,
[2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:305) [xc_restore]:
/usr/lib/xen/bin/xc_restore 4 6 1 2 0 0 0 0
[2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) xc_domain_restore
start: p2m_size = 7d000
[2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) Reloading memory
pages:   0%
[2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
Error when reading batch size
[2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
error when buffering batch, finishing
[2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423)
[2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error:
Failed to pin batch of 18 page tables (22 = Invalid argument)
[2010-09-06 21:40:25 2392] INFO (XendCheckpoint:423) Restore exit with rc=1
the number of page tables falling under the error category also varies
(16,18,20)...
============
xm info output (stripped)
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 2
cores_per_socket   : 4
threads_per_core    : 1
cpu_mhz                : 2133
hw_caps                 :
bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000
virt_caps                : hvm hvm_directio
total_memory           : 32758
free_memory            : 28985
node_to_cpu            : node0:0,2,4,6
                         node1:1,3,5,7
node_to_memory         : node0:12731
                         node1:16254
node_to_dma32_mem      : node0:0
                         node1:2993
max_node_id            : 1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_commandline        : dummy=dummy dom0_mem=4096M
cc_compiler            : gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
xend_config_format     : 4
------------------------------------------------------
I need the 2.6.18 domU because of the suspend event channel support.
-- 
perception is but an offspring of its own self
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Bruce Edge
2010-Nov-17  17:13 UTC
Re: [Xen-devel] remus failure -xen 4.0.1: xc_domain_restore cannot pin page tables
On Mon, Sep 6, 2010 at 10:56 PM, Shriram Rajagopalan <rshriram@gmail.com> wrote:> > Hardware: Dell Poweredge R510 (32G ram, 8 CPU- Xeon) > > 64bit - xen 4.0.1 stable > > 64bit - 2.6.32.18 dom0 (.config attached) running Ubuntu 10.04 > 32 bit - 2.6.18.8 domU (.config attached) running ubuntu 8.04 > > domU has 3 tap2 disks, on lvm snapshots. > domU has 2G mem, 2 VCPU > > workload on domU - ssh + top running, destroy domain -- This works . > > But, If i run a heavier workload say postgres db (just starting db, no > queries), remus fails to recover. Note that this is not spurious timeout > error. > On destroying the vm on primary, the backup fails to recover the vm with > the following error in xm dmesg: > > (XEN) mm.c:779:d0 Bad L1 flags 98 > (XEN) mm.c:1186:d0 Failure in alloc_l1_table: entry 1 > (XEN) mm.c:2117:d0 Error while validating mfn 4101af (pfn 2cc08) for type > 1000000000000000: caf=8000000000000003 taf=1000000000000001 > (XEN) mm.c:868:d0 Attempt to create linear p.t. with write perms > (XEN) mm.c:1330:d0 Failure in alloc_l2_table: entry 113 > (XEN) mm.c:2117:d0 Error while validating mfn 40fc4c (pfn 2d1ce) for type > 2000000000000000: caf=8000000000000003 taf=2000000000000001 > (XEN) mm.c:1440:d0 Failure in alloc_l3_table: entry 2 > (XEN) mm.c:2117:d0 Error while validating mfn 40fcdf (pfn 2d08d) for type > 3000000000000000: caf=8000000000000003 taf=3000000000000001 > (XEN) mm.c:2733:d0 Error while pinning mfn 40fcdf > ===========> > Error in xend.log @ backup > ----------------------------- > [2010-09-06 21:38:16 2392] DEBUG (XendDomainInfo:1804) Storing domain > details: {''image/entry'': ''3222274048'', ''console/port'': ''2'', ''image/loader'': > ''generic'', > ''vm'': ''/vm/7be5f9bf-da53-6c10-d4e5-330940210966'', > ''control/platform-feature-multiprocessor-suspend'': ''1'', > ''image/hv-start-low'': ''4118806528'', ''image/guest-os > '': ''linux'', ''cpu/1/availability'': ''online'', > ''image/features/writable-descriptor-tables'': ''1'', ''image/virt-base'': > ''3221225472'', ''memory/target'': ''2048000'', ''i > mage/guest-version'': ''2.6'', ''image/features/supervisor-mode-kernel'': ''1'', > ''image/pae-mode'': ''yes'', ''description'': '''', ''console/limit'': ''1048576'', > ''image/padd > r-offset'': ''3221225472'', ''image/hypercall-page'': ''3222278144'', > ''image/suspend-cancel'': ''1'', ''cpu/0/availability'': ''online'', > ''image/features/pae-pgdir-above-4 > gb'': ''1'', ''image/features/writable-page-tables'': ''1'', ''console/type'': > ''xenconsoled'', ''image/features/auto-translated-physmap'': ''1'', ''name'': > ''tpccExpt-remus'', > ''domid'': ''6'', ''image/xen-version'': ''xen-3.0'', ''store/port'': ''1''} > [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0, > _static_max=0x7d000000, _static_min=0x0, > [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:305) [xc_restore]: > /usr/lib/xen/bin/xc_restore 4 6 1 2 0 0 0 0 > [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) xc_domain_restore > start: p2m_size = 7d000 > [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) Reloading memory > pages: 0% > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error: > Error when reading batch size > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error: > error when buffering batch, finishing > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal error: > Failed to pin batch of 18 page tables (22 = Invalid argument) > [2010-09-06 21:40:25 2392] INFO (XendCheckpoint:423) Restore exit with rc=1 > > the number of page tables falling under the error category also varies > (16,18,20)... > ============I''m seeing this too. Here''s my config: xen unstable - 22395:deb438d43e79 Tue Nov 16 15:41:28 2010 +0000 dom0 - xen/stable-2.6.32.x a504ac446b2ca0d308000bdf5a3b96b2afd79261 Thu Aug 12 10:51:38 domU - mainline 2.6.37-rc2 -Bruce> > > xm info output (stripped) > machine : x86_64 > nr_cpus : 8 > nr_nodes : 2 > cores_per_socket : 4 > threads_per_core : 1 > cpu_mhz : 2133 > hw_caps : > bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 > virt_caps : hvm hvm_directio > total_memory : 32758 > free_memory : 28985 > node_to_cpu : node0:0,2,4,6 > node1:1,3,5,7 > node_to_memory : node0:12731 > node1:16254 > node_to_dma32_mem : node0:0 > node1:2993 > max_node_id : 1 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_commandline : dummy=dummy dom0_mem=4096M > cc_compiler : gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) > xend_config_format : 4 > ------------------------------------------------------ > > I need the 2.6.18 domU because of the suspend event channel support. > > -- > perception is but an offspring of its own self > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2010-Nov-17  17:16 UTC
Re: [Xen-devel] remus failure -xen 4.0.1: xc_domain_restore cannot pin page tables
what kind of workload? shriram On Wed, Nov 17, 2010 at 9:13 AM, Bruce Edge <bruce.edge@gmail.com> wrote:> On Mon, Sep 6, 2010 at 10:56 PM, Shriram Rajagopalan <rshriram@gmail.com> > wrote: > > > > Hardware: Dell Poweredge R510 (32G ram, 8 CPU- Xeon) > > > > 64bit - xen 4.0.1 stable > > > > 64bit - 2.6.32.18 dom0 (.config attached) running Ubuntu 10.04 > > 32 bit - 2.6.18.8 domU (.config attached) running ubuntu 8.04 > > > > domU has 3 tap2 disks, on lvm snapshots. > > domU has 2G mem, 2 VCPU > > > > workload on domU - ssh + top running, destroy domain -- This works . > > > > But, If i run a heavier workload say postgres db (just starting db, no > > queries), remus fails to recover. Note that this is not spurious timeout > > error. > > On destroying the vm on primary, the backup fails to recover the vm with > > the following error in xm dmesg: > > > > (XEN) mm.c:779:d0 Bad L1 flags 98 > > (XEN) mm.c:1186:d0 Failure in alloc_l1_table: entry 1 > > (XEN) mm.c:2117:d0 Error while validating mfn 4101af (pfn 2cc08) for type > > 1000000000000000: caf=8000000000000003 taf=1000000000000001 > > (XEN) mm.c:868:d0 Attempt to create linear p.t. with write perms > > (XEN) mm.c:1330:d0 Failure in alloc_l2_table: entry 113 > > (XEN) mm.c:2117:d0 Error while validating mfn 40fc4c (pfn 2d1ce) for type > > 2000000000000000: caf=8000000000000003 taf=2000000000000001 > > (XEN) mm.c:1440:d0 Failure in alloc_l3_table: entry 2 > > (XEN) mm.c:2117:d0 Error while validating mfn 40fcdf (pfn 2d08d) for type > > 3000000000000000: caf=8000000000000003 taf=3000000000000001 > > (XEN) mm.c:2733:d0 Error while pinning mfn 40fcdf > > ===========> > > > Error in xend.log @ backup > > ----------------------------- > > [2010-09-06 21:38:16 2392] DEBUG (XendDomainInfo:1804) Storing domain > > details: {''image/entry'': ''3222274048'', ''console/port'': ''2'', > ''image/loader'': > > ''generic'', > > ''vm'': ''/vm/7be5f9bf-da53-6c10-d4e5-330940210966'', > > ''control/platform-feature-multiprocessor-suspend'': ''1'', > > ''image/hv-start-low'': ''4118806528'', ''image/guest-os > > '': ''linux'', ''cpu/1/availability'': ''online'', > > ''image/features/writable-descriptor-tables'': ''1'', ''image/virt-base'': > > ''3221225472'', ''memory/target'': ''2048000'', ''i > > mage/guest-version'': ''2.6'', ''image/features/supervisor-mode-kernel'': ''1'', > > ''image/pae-mode'': ''yes'', ''description'': '''', ''console/limit'': ''1048576'', > > ''image/padd > > r-offset'': ''3221225472'', ''image/hypercall-page'': ''3222278144'', > > ''image/suspend-cancel'': ''1'', ''cpu/0/availability'': ''online'', > > ''image/features/pae-pgdir-above-4 > > gb'': ''1'', ''image/features/writable-page-tables'': ''1'', ''console/type'': > > ''xenconsoled'', ''image/features/auto-translated-physmap'': ''1'', ''name'': > > ''tpccExpt-remus'', > > ''domid'': ''6'', ''image/xen-version'': ''xen-3.0'', ''store/port'': ''1''} > > [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:286) restore:shadow=0x0, > > _static_max=0x7d000000, _static_min=0x0, > > [2010-09-06 21:38:16 2392] DEBUG (XendCheckpoint:305) [xc_restore]: > > /usr/lib/xen/bin/xc_restore 4 6 1 2 0 0 0 0 > > [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) xc_domain_restore > > start: p2m_size = 7d000 > > [2010-09-06 21:38:16 2392] INFO (XendCheckpoint:423) Reloading memory > > pages: 0% > > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal > error: > > Error when reading batch size > > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal > error: > > error when buffering batch, finishing > > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) > > [2010-09-06 21:40:24 2392] INFO (XendCheckpoint:423) ERROR Internal > error: > > Failed to pin batch of 18 page tables (22 = Invalid argument) > > [2010-09-06 21:40:25 2392] INFO (XendCheckpoint:423) Restore exit with > rc=1 > > > > the number of page tables falling under the error category also varies > > (16,18,20)... > > ============> > I''m seeing this too. Here''s my config: > > xen unstable - 22395:deb438d43e79 Tue Nov 16 15:41:28 2010 +0000 > dom0 - xen/stable-2.6.32.x a504ac446b2ca0d308000bdf5a3b96b2afd79261 > Thu Aug 12 10:51:38 > domU - mainline 2.6.37-rc2 > > -Bruce > > > > > > > > > xm info output (stripped) > > machine : x86_64 > > nr_cpus : 8 > > nr_nodes : 2 > > cores_per_socket : 4 > > threads_per_core : 1 > > cpu_mhz : 2133 > > hw_caps : > > bfebfbff:28100800:00000000:00001b40:009ce3bd:00000000:00000001:00000000 > > virt_caps : hvm hvm_directio > > total_memory : 32758 > > free_memory : 28985 > > node_to_cpu : node0:0,2,4,6 > > node1:1,3,5,7 > > node_to_memory : node0:12731 > > node1:16254 > > node_to_dma32_mem : node0:0 > > node1:2993 > > max_node_id : 1 > > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > > hvm-3.0-x86_32p hvm-3.0-x86_64 > > xen_scheduler : credit > > xen_pagesize : 4096 > > platform_params : virt_start=0xffff800000000000 > > xen_commandline : dummy=dummy dom0_mem=4096M > > cc_compiler : gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) > > xend_config_format : 4 > > ------------------------------------------------------ > > > > I need the 2.6.18 domU because of the suspend event channel support. > > > > -- > > perception is but an offspring of its own self > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > >-- perception is but an offspring of its own self _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Seemingly Similar Threads
- PV domain save/restore break
- HVM Live Migration Troubles - Xen 3.3.1
- Problem with restore/migration with Xen 4.0.0 and Jeremy kernel (2.6.32.12)
- Problem with restore/migration with Xen 4.0.0 and Jeremy kernel (2.6.32.12)
- Restoring a DomU HVM-Domain is "slow" (Bandwidth 23MB/sec from a ramdisk, xen3.2.1)