Andres Lagar-Cavilla
2012-Nov-14 20:14 UTC
Question on Xen management of qemu ram blocks and memory regions
Stefano, and Xen-qemu team, I have a question. The standard Xen-qemu workflow has Xen manage the physmap for a VM, and allocate all the backing memory for valid pfns, regardless of whether they are MMIO, RAM, etc. On save/migrate, when using upstream qemu, a special monitor command is used to save the device model state, while the save/restore code blocks in libxc takes care of the memory. Qemu has a chain of ram blocks with offsets, each of which is further subdivided into memory regions that map to specific chunks of the physmap. AFAICT, the restore code in libxc has no knowledge of qemu''s ram blocks and offsets. My question is, how is a mismatch avoided? How does the workflow ensure that all the sub regions in each ram block map to the same physmap chunks on restore? Is this an implicit guarantee from qemu when building the VM (with the same command line) on the restore side? Are the regions and physmap offsets contained in the device state that is saved? If, for example, I were to save/restore a VM with four e1000 emulated devices, how does the workflow guarantee that each physmap region backing each e1000 ROM gets reconstructed with exactly the same ram block, offset, and physmap chunk coordinates? Code inspection seems to suggest qemu will lay out things deterministically given the command line. I want to make sure I am not missing anything. Thanks! Andres
Stefano Stabellini
2012-Nov-15 14:33 UTC
Re: Question on Xen management of qemu ram blocks and memory regions
On Wed, 14 Nov 2012, Andres Lagar-Cavilla wrote:> Stefano, and Xen-qemu team, I have a question. > > The standard Xen-qemu workflow has Xen manage the physmap for a VM, and allocate all the backing memory for valid pfns, regardless of whether they are MMIO, RAM, etc. On save/migrate, when using upstream qemu, a special monitor command is used to save the device model state, while the save/restore code blocks in libxc takes care of the memory. > > Qemu has a chain of ram blocks with offsets, each of which is further subdivided into memory regions that map to specific chunks of the physmap. > > AFAICT, the restore code in libxc has no knowledge of qemu''s ram blocks and offsets. My question is, how is a mismatch avoided? > > How does the workflow ensure that all the sub regions in each ram block map to the same physmap chunks on restore? Is this an implicit guarantee from qemu when building the VM (with the same command line) on the restore side? > > Are the regions and physmap offsets contained in the device state that is saved? > > If, for example, I were to save/restore a VM with four e1000 emulated devices, how does the workflow guarantee that each physmap region backing each e1000 ROM gets reconstructed with exactly the same ram block, offset, and physmap chunk coordinates? > > Code inspection seems to suggest qemu will lay out things deterministically given the command line. I want to make sure I am not missing anything.Yes, it does. Moreover QEMU is going to save everything it needs to restore the state of the devices exactly the way it was, MMIO regions addresses and sizes included. The only issue is the videoram: even though it is an MMIO region, it is saved by Xen because it looks like normal ram to the hypervisor. To solve the problem QEMU writes the location and the size of the videoram to xenstore and keeps the records up to date. The toolstack reads those records and adds them to the savefile. At restore time the toolstack writes back the records to xenstore and QEMU at boot time uses them to populate a list of physmap regions, see xen_read_physmap.
Andres Lagar-Cavilla
2012-Nov-15 17:00 UTC
Re: Question on Xen management of qemu ram blocks and memory regions
On Nov 15, 2012, at 9:33 AM, Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:> On Wed, 14 Nov 2012, Andres Lagar-Cavilla wrote: >> Stefano, and Xen-qemu team, I have a question. >> >> The standard Xen-qemu workflow has Xen manage the physmap for a VM, and allocate all the backing memory for valid pfns, regardless of whether they are MMIO, RAM, etc. On save/migrate, when using upstream qemu, a special monitor command is used to save the device model state, while the save/restore code blocks in libxc takes care of the memory. >> >> Qemu has a chain of ram blocks with offsets, each of which is further subdivided into memory regions that map to specific chunks of the physmap. >> >> AFAICT, the restore code in libxc has no knowledge of qemu''s ram blocks and offsets. My question is, how is a mismatch avoided? >> >> How does the workflow ensure that all the sub regions in each ram block map to the same physmap chunks on restore? Is this an implicit guarantee from qemu when building the VM (with the same command line) on the restore side? >> >> Are the regions and physmap offsets contained in the device state that is saved? >> >> If, for example, I were to save/restore a VM with four e1000 emulated devices, how does the workflow guarantee that each physmap region backing each e1000 ROM gets reconstructed with exactly the same ram block, offset, and physmap chunk coordinates? >> >> Code inspection seems to suggest qemu will lay out things deterministically given the command line. I want to make sure I am not missing anything. > > Yes, it does. Moreover QEMU is going to save everything it needs to > restore the state of the devices exactly the way it was, MMIO regions > addresses and sizes included. > > The only issue is the videoram: even though it is an MMIO region, it is > saved by Xen because it looks like normal ram to the hypervisor. > To solve the problem QEMU writes the location and the size of the > videoram to xenstore and keeps the records up to date. > The toolstack reads those records and adds them to the save file. >Thanks Stefano. That matches my read. IIUC, the xen MemoryListener callbacks will write to xenstore any ram block that is relocated, for restore''s benefit. It just happens to only be the cirrus vram in the standard workflow. Cheers! Andres> At restore time the toolstack writes back the records to xenstore and > QEMU at boot time uses them to populate a list of physmap regions, see > xen_read_physmap.