Hi, I''ve got a question regarding live migration: We have enabled our mainframe OS BS2000 to run as pv-hvm domain on Xen. Live migration is working rather fine, but we have timing problems with large memory (e.g. 128 GB). The reason for the timing problem is our own backend driver for the BS2000 devices which requires to map all the domain memory to avoid too many changes in BS2000. The mapping occurs at reconnection to the backend driver and has to be synchronous. For a 128 GB domain this leads to a 16 second stall of the domain when it is resuming on the target machine. To avoid this stall I tried to start a little daemon on the target machine and watch for a new BS2000 domain to show up due to live migration. I wanted to map the domain memory as soon as the needed mapping information located in a fixed guest mfn was transferred. Discovery of the new domain works as expected, but I''m not capable doing any memory mapping until the restore of the domain is finished. The mapping ioctl using IOCTL_PRIVCMD_MMAP returns EINVAL until xc_restore is finished (more or less). Why can xc_restore do the mapping while I can''t? I know xc_restore is using IOCTL_PRIVCMD_MMAPBATCH_V2, but I can''t see a difference which should matter between those two, as both are using the same hypercall to update the dom0 page tables. Xen version is 4.0.2, linux kernel is 2.6.32 (both SLES11 SP1), the machine is a x86_64 INTEL based one. Juergen -- Juergen Gross Principal Developer Operating Systems PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
>>> On 30.01.12 at 13:24, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > To avoid this stall I tried to start a little daemon on the target machine > and watch for a new BS2000 domain to show up due to live migration. I wanted > to map the domain memory as soon as the needed mapping information located > in a fixed guest mfn was transferred. Discovery of the new domain works as > expected, but I''m not capable doing any memory mapping until the restore of > the domain is finished. The mapping ioctl using IOCTL_PRIVCMD_MMAP returns > EINVAL until xc_restore is finished (more or less). > > Why can xc_restore do the mapping while I can''t? I know xc_restore is using > IOCTL_PRIVCMD_MMAPBATCH_V2, but I can''t see a difference which should matter > between those two, as both are using the same hypercall to update the dom0 > page tables.I cannot immediately think of a reason (and indeed the difference between the two is only how errors get handled), so I wonder whether you checked where the - pretty generic - -EINVAL is coming from. You also didn''t mention whether any hypervisor log entries are associated with you failed attempts. Jan
On 01/30/2012 05:26 PM, Jan Beulich wrote:>>>> On 30.01.12 at 13:24, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >> To avoid this stall I tried to start a little daemon on the target machine >> and watch for a new BS2000 domain to show up due to live migration. I wanted >> to map the domain memory as soon as the needed mapping information located >> in a fixed guest mfn was transferred. Discovery of the new domain works as >> expected, but I''m not capable doing any memory mapping until the restore of >> the domain is finished. The mapping ioctl using IOCTL_PRIVCMD_MMAP returns >> EINVAL until xc_restore is finished (more or less). >> >> Why can xc_restore do the mapping while I can''t? I know xc_restore is using >> IOCTL_PRIVCMD_MMAPBATCH_V2, but I can''t see a difference which should matter >> between those two, as both are using the same hypercall to update the dom0 >> page tables. > I cannot immediately think of a reason (and indeed the difference > between the two is only how errors get handled), so I wonder > whether you checked where the - pretty generic - -EINVAL is > coming from. You also didn''t mention whether any hypervisor log > entries are associated with you failed attempts.I''ll start to add some logging to the hypervisor today. No hypervisor logs were produced in my tests, despite of setting debug=yes loglvl=all guest_loglvl=all as boot parameters. I''ve made an additional test using xm save/xm restore to see if the same problem shows up. It does NOT. Mapping succeeds at once while restoring memory is still running. I always thought xm restore and live migration on the target machine are more or less the same. This seems not to be true. Juergen -- Juergen Gross Principal Developer Operating Systems PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
On 01/31/2012 08:50 AM, Juergen Gross wrote:> On 01/30/2012 05:26 PM, Jan Beulich wrote: >>>>> On 30.01.12 at 13:24, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>> To avoid this stall I tried to start a little daemon on the target machine >>> and watch for a new BS2000 domain to show up due to live migration. I wanted >>> to map the domain memory as soon as the needed mapping information located >>> in a fixed guest mfn was transferred. Discovery of the new domain works as >>> expected, but I''m not capable doing any memory mapping until the restore of >>> the domain is finished. The mapping ioctl using IOCTL_PRIVCMD_MMAP returns >>> EINVAL until xc_restore is finished (more or less). >>> >>> Why can xc_restore do the mapping while I can''t? I know xc_restore is using >>> IOCTL_PRIVCMD_MMAPBATCH_V2, but I can''t see a difference which should matter >>> between those two, as both are using the same hypercall to update the dom0 >>> page tables. >> I cannot immediately think of a reason (and indeed the difference >> between the two is only how errors get handled), so I wonder >> whether you checked where the - pretty generic - -EINVAL is >> coming from. You also didn''t mention whether any hypervisor log >> entries are associated with you failed attempts. > > I''ll start to add some logging to the hypervisor today. > > No hypervisor logs were produced in my tests, despite of setting > > debug=yes loglvl=all guest_loglvl=all > > as boot parameters. > > I''ve made an additional test using xm save/xm restore to see if the same > problem shows up. It does NOT. Mapping succeeds at once while restoring > memory is still running. I always thought xm restore and live migration on > the target machine are more or less the same. This seems not to be true.Okay, here are my results (so far): do_mmu_update() calls mod_l1_entry() which fails with -EINVAL due to an invalid mfn and p2m-type == 4 returned by: mfn_x(gfn_to_mfn(pg_dom, l1e_get_pfn(nl1e), &p2mt)) I still don''t see why xc_restore is able to do the mapping while my daemon is not. And I can''t find any difference between a domain creation due to xm restore and a live migration. Juergen -- Juergen Gross Principal Developer Operating Systems PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
On 31/01/2012 12:42, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> I''ve made an additional test using xm save/xm restore to see if the same >> problem shows up. It does NOT. Mapping succeeds at once while restoring >> memory is still running. I always thought xm restore and live migration on >> the target machine are more or less the same. This seems not to be true. > > Okay, here are my results (so far): > > do_mmu_update() calls mod_l1_entry() which fails with -EINVAL due to an > invalid mfn and p2m-type == 4 returned by: > > mfn_x(gfn_to_mfn(pg_dom, l1e_get_pfn(nl1e), &p2mt)) > > I still don''t see why xc_restore is able to do the mapping while my daemon is > not. And I can''t find any difference between a domain creation due to > xm restore and a live migration.Memory pages are populated as they appear in the migration data stream. Perhaps you are trying to map a guest page that has simply not yet been allocated. Bear in mind that IOCTL_PRIVCMD_MMAPBATCH_V2 will note, but proceed past, failed individual mappings. While IOCTL_PRIVCMD_MMAP, for example, will fail the entire ioctl() in that situation. There''s a reason that xc_restore uses the former ioctl! -- Keir
On 01/31/2012 01:56 PM, Keir Fraser wrote:> On 31/01/2012 12:42, "Juergen Gross"<juergen.gross@ts.fujitsu.com> wrote: > >>> I''ve made an additional test using xm save/xm restore to see if the same >>> problem shows up. It does NOT. Mapping succeeds at once while restoring >>> memory is still running. I always thought xm restore and live migration on >>> the target machine are more or less the same. This seems not to be true. >> Okay, here are my results (so far): >> >> do_mmu_update() calls mod_l1_entry() which fails with -EINVAL due to an >> invalid mfn and p2m-type == 4 returned by: >> >> mfn_x(gfn_to_mfn(pg_dom, l1e_get_pfn(nl1e),&p2mt)) >> >> I still don''t see why xc_restore is able to do the mapping while my daemon is >> not. And I can''t find any difference between a domain creation due to >> xm restore and a live migration. > Memory pages are populated as they appear in the migration data stream. > Perhaps you are trying to map a guest page that has simply not yet been > allocated.As far as I can tell, memory is transferred from low to high mfns. The first iteration takes about 12 minutes in my test case. Why is the mapping possible only after the last iteration has finished? The mfn I try to map is in the first 16 MB of the domain, so it should arrive in the first second!> Bear in mind that IOCTL_PRIVCMD_MMAPBATCH_V2 will note, but proceed past, > failed individual mappings. While IOCTL_PRIVCMD_MMAP, for example, will fail > the entire ioctl() in that situation. There''s a reason that xc_restore uses > the former ioctl!I understand that. But what is the difference between xm restore and live migration? Somehow the memory seems to be treated different. Juergen -- Juergen Gross Principal Developer Operating Systems PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
On 31/01/2012 13:58, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>>> I still don''t see why xc_restore is able to do the mapping while my daemon >>> is >>> not. And I can''t find any difference between a domain creation due to >>> xm restore and a live migration. >> Memory pages are populated as they appear in the migration data stream. >> Perhaps you are trying to map a guest page that has simply not yet been >> allocated. > > As far as I can tell, memory is transferred from low to high mfns. The first > iteration takes about 12 minutes in my test case. Why is the mapping possible > only after the last iteration has finished? The mfn I try to map is in the > first 16 MB of the domain, so it should arrive in the first second!It''s pointless to speculate further until you add some hypervisor tracing to determine when your guest pfn of interest is actually allocated.>> Bear in mind that IOCTL_PRIVCMD_MMAPBATCH_V2 will note, but proceed past, >> failed individual mappings. While IOCTL_PRIVCMD_MMAP, for example, will fail >> the entire ioctl() in that situation. There''s a reason that xc_restore uses >> the former ioctl! > > I understand that. But what is the difference between xm restore and live > migration? Somehow the memory seems to be treated different.As you already noted, they are almost identical, however they are driven by the stream of memory data they are provided from the original VM. The ordering of pages used to be randomised in xc_domain_save in the live migration case. -- Keir