On 02/10/19 19:04, Jerome Glisse wrote:> On Wed, Oct 02, 2019 at 06:18:06PM +0200, Paolo Bonzini wrote: >>>> If the mapping of the source VMA changes, mirroring can update the >>>> target VMA via insert_pfn. But what ensures that KVM's MMU notifier >>>> dismantles its own existing page tables (so that they can be recreated >>>> with the new mapping from the source VMA)? >> >> The KVM inspector process is also (or can be) a QEMU that will have to >> create its own KVM guest page table. So if a page in the source VMA is >> unmapped we want: >> >> - the source KVM to invalidate its guest page table (done by the KVM MMU >> notifier) >> >> - the target VMA to be invalidated (easy using mirroring) >> >> - the target KVM to invalidate its guest page table, as a result of >> invalidation of the target VMA > > You can do the target KVM invalidation inside the mirroring invalidation > code.Why should the source and target KVMs behave differently? If the source invalidates its guest page table via MMU notifiers, so should the target. The KVM MMU notifier exists so that nothing (including mirroring) needs to know that there is KVM on the other side. Any interaction between KVM page tables and VMAs must be mediated by MMU notifiers, anything else is unacceptable. If it is possible to invoke the MMU notifiers around the calls to insert_pfn, that of course would be perfect. Thanks, Paolo
On Wed, Oct 02, 2019 at 10:10:18PM +0200, Paolo Bonzini wrote:> On 02/10/19 19:04, Jerome Glisse wrote: > > On Wed, Oct 02, 2019 at 06:18:06PM +0200, Paolo Bonzini wrote: > >>>> If the mapping of the source VMA changes, mirroring can update the > >>>> target VMA via insert_pfn. But what ensures that KVM's MMU notifier > >>>> dismantles its own existing page tables (so that they can be recreated > >>>> with the new mapping from the source VMA)? > >> > >> The KVM inspector process is also (or can be) a QEMU that will have to > >> create its own KVM guest page table. So if a page in the source VMA is > >> unmapped we want: > >> > >> - the source KVM to invalidate its guest page table (done by the KVM MMU > >> notifier) > >> > >> - the target VMA to be invalidated (easy using mirroring) > >> > >> - the target KVM to invalidate its guest page table, as a result of > >> invalidation of the target VMA > > > > You can do the target KVM invalidation inside the mirroring invalidation > > code. > > Why should the source and target KVMs behave differently? If the source > invalidates its guest page table via MMU notifiers, so should the target. > > The KVM MMU notifier exists so that nothing (including mirroring) needs > to know that there is KVM on the other side. Any interaction between > KVM page tables and VMAs must be mediated by MMU notifiers, anything > else is unacceptable. > > If it is possible to invoke the MMU notifiers around the calls to > insert_pfn, that of course would be perfect.Ok and yes you can do that exactly ie inside the mmu notifier callback from the target. For instance it is as easy as: target_mirror_notifier_start_callback(start, end) { struct kvm_mirror_struct *kvmms = from_mmun(...); unsigned long target_foff, size; size = end - start; target_foff = kvmms_convert_mirror_address(start); take_lock(kvmms->mirror_fault_exclusion_lock); unmap_mapping_range(kvmms->address_space, target_foff, size, 1); drop_lock(kvmms->mirror_fault_exclusion_lock); } All that is needed is to make sure that vm_normal_page() will see those pte (inside the process that is mirroring the other process) as special which is the case either because insert_pfn() mark the pte as special or the kvm device driver which control the vm_operation struct set a find_special_page() callback that always return NULL, or the vma has either VM_PFNMAP or VM_MIXEDMAP set (which is the case with insert_pfn). So you can keep the existing kvm code unmodified. Cheers, J?r?me
On 03/10/19 17:42, Jerome Glisse wrote:> All that is needed is to make sure that vm_normal_page() will see those > pte (inside the process that is mirroring the other process) as special > which is the case either because insert_pfn() mark the pte as special or > the kvm device driver which control the vm_operation struct set a > find_special_page() callback that always return NULL, or the vma has > either VM_PFNMAP or VM_MIXEDMAP set (which is the case with insert_pfn). > > So you can keep the existing kvm code unmodified.Great, thanks. And KVM is already able to handle VM_PFNMAP/VM_MIXEDMAP, so that should work. Paolo
On Thu, Oct 03, 2019 at 04:42:20PM +0000, Mircea CIRJALIU - MELIU wrote:> > On 03/10/19 17:42, Jerome Glisse wrote: > > > All that is needed is to make sure that vm_normal_page() will see > > > those pte (inside the process that is mirroring the other process) as > > > special which is the case either because insert_pfn() mark the pte as > > > special or the kvm device driver which control the vm_operation struct > > > set a > > > find_special_page() callback that always return NULL, or the vma has > > > either VM_PFNMAP or VM_MIXEDMAP set (which is the case with > > insert_pfn). > > > > > > So you can keep the existing kvm code unmodified. > > > > Great, thanks. And KVM is already able to handle > > VM_PFNMAP/VM_MIXEDMAP, so that should work. > > This means setting VM_PFNMAP/VM_MIXEDMAP on the anon VMA that acts as the VM's system RAM. > Will it have any side effects?You do not set it up on the anonymous vma but on the mmap of the kvm device file, the resulting vma is under the control of the kvm device file and is not an anonymous vma but a "device" special vma. So in summary, the source qemu process has anonymous vma (regular libc malloc for instance). The introspector qemu process which mirror the the source qemu use mmap on /dev/kvm (assuming you can reuse the kvm device file for this otherwise you can introduce a new kvm device file). The resulting mmap inside the introspector qemu process is a vma which has vma->vm_file pointing to the kvm device file and has VM_PFNMAP or VM_MIXEDMAP (i think you want the former). On architecture with ARCH_SPECIAL_PTE the pte will be mark as special when using insert_pfn() on other architecture you can either rely on VM_PFNMAP/VM_MIXEDMAP flag or set a specific find_special_page() callbacks in vm_ops. I am at a conference right now but i will put an example of what i mean next week. Cheers, J?r?me
On 03/10/19 20:31, Jerome Glisse wrote:> So in summary, the source qemu process has anonymous vma (regular > libc malloc for instance). The introspector qemu process which > mirror the the source qemu use mmap on /dev/kvm (assuming you can > reuse the kvm device file for this otherwise you can introduce a > new kvm device file).It should be a new device, something like /dev/kvmmem. BitDefender's RFC patches already have the right userspace API, that was not an issue. Paolo
On 04/10/19 11:41, Mircea CIRJALIU - MELIU wrote:> I get it so far. I have a patch that does mirroring in a separate VMA. > We create an extra VMA with VM_PFNMAP/VM_MIXEDMAP that mirrors the > source VMA in the other QEMU and is refreshed by the device MMU notifier.So for example on the host you'd have a new ioctl on the kvm file descriptor. You pass a size and you get back a file descriptor for that guest's physical memory, which is mmap-able up to the size you specified in the ioctl. In turn, the file descriptor would have ioctls to map/unmap ranges of the guest memory into its mmap-able range. Accessing an unmapped range produces a SIGSEGV. When asked via the QEMU monitor, QEMU will create the file descriptor and pass it back via SCM_RIGHTS. The management application can then use it to hotplug memory into the destination...> Create a new memslot based on the mirror VMA, hotplug it into the guest as > new memory device (is this possible?) and have a guest-side driver allocate > pages from that area.... using the existing ivshmem device, whose BAR can be accessed and mmap-ed from the guest via sysfs. In other words, the hotplugging will use the file descriptor returned by QEMU when creating the ivshmem device. We then need an additional mechanism to invoke the map/unmap ioctls from the guest. Without writing a guest-side driver it is possible to: - pass a socket into the "create guest physical memory view" ioctl above. KVM will then associate that KVMI socket with the newly created file descriptor. - use KVMI messages to that socket to map/unmap sections of memory> Redirect (some) GFN->HVA translations into the new VMA based on a table > of addresses required by the introspector process.That would be tricky because there are multiple paths (gfn_to_page, gfn_to_pfn, etc.). There is some complication in this because the new device has to be plumbed at multiple levels (KVM, QEMU, libvirt). But it seems like a very easily separated piece of code (except for the KVMI socket part, which can be added later), so I suggest that you contribute the KVM parts first. Paolo