On 02/10/19 21:27, Jerome Glisse wrote:> On Tue, Sep 10, 2019 at 07:49:51AM +0000, Mircea CIRJALIU - MELIU wrote: >>> On 05/09/19 20:09, Jerome Glisse wrote: >>>> Not sure i understand, you are saying that the solution i outline >>>> above does not work ? If so then i think you are wrong, in the above >>>> solution the importing process mmap a device file and the resulting >>>> vma is then populated using insert_pfn() and constantly keep >>>> synchronize with the target process through mirroring which means that >>>> you never have to look at the struct page ... you can mirror any kind >>>> of memory from the remote process. >>> >>> If insert_pfn in turn calls MMU notifiers for the target VMA (which would be >>> the KVM MMU notifier), then that would work. Though I guess it would be >>> possible to call MMU notifier update callbacks around the call to insert_pfn. >> >> Can't do that. >> First, insert_pfn() uses set_pte_at() which won't trigger the MMU notifier on >> the target VMA. It's also static, so I'll have to access it thru vmf_insert_pfn() >> or vmf_insert_mixed(). > > Why would you need to target mmu notifier on target vma ?If the mapping of the source VMA changes, mirroring can update the target VMA via insert_pfn. But what ensures that KVM's MMU notifier dismantles its own existing page tables (so that they can be recreated with the new mapping from the source VMA)? Thanks, Paolo> You do not need > that. The workflow is: > > userspace: > ptr = mmap(/dev/kvm-mirroring-device, virtual_addresse_of_target) > > Then when the mirroring process access ptr it triggers page fault that > endup in the vm_operation_struct->fault() which is just doing: > > kernel-kvm-mirroring-function: > kvm_mirror_page_fault(struct vm_fault *vmf) { > struct kvm_mirror_struct *kvmms; > > kvmms = kvm_mirror_struct_from_file(vmf->vma->vm_file); > ... > again: > hmm_range_register(&range); > hmm_range_snapshot(&range); > take_lock(kvmms->update); > if (!hmm_range_valid(&range)) { > vm_insert_pfn(); > drop_lock(kvmms->update); > hmm_range_unregister(&range); > return VM_FAULT_NOPAGE; > } > drop_lock(kvmms->update); > goto again; > } > > The notifier callback: > kvmms_notifier_start() { > take_lock(kvmms->update); > clear_pte(start, end); > drop_lock(kvmms->update); > } > >> >> Our model (the importing process is encapsulated in another VM) forces us >> to mirror certain pages from the anon VMA backing one VM's system RAM to >> the other VM's anon VMA. > > The mirror does not have to be an anon vma it can very well be a > device vma ie mmap of a device file. I do not see any reasons why > the mirror need to be an anon vma. Please explain why. > >> >> Using the functions above means setting VM_PFNMAP|VM_MIXEDMAP on >> the target anon VMA, but I guess this breaks the VMA. Is this recommended? > > The mirror vma should not be an anon vma. > >> >> Then, mapping anon pages from one VMA to another without fixing the >> refcount and the mapcount breaks the daemons that think they're working >> on a pure anon VMA (kcompactd, khugepaged). > > Note here the target vma ie the mirroring one is a mmap of device file > and thus is skip by all of the above (kcompactd, khugepaged, ...) it is > fully ignore by core mm. > > Thus you do not need to fix the refcount in any way. If any of the core > mm try to reclaim memory from the original vma then you will get mmu > notifier callbacks and all you have to do is clear the page table of your > device vma. > > I did exactly that as a tools in the past and it works just fine with > no change to core mm whatsoever. > > Cheers, > J?r?me >
On Wed, Oct 02, 2019 at 03:46:30PM +0200, Paolo Bonzini wrote:> On 02/10/19 21:27, Jerome Glisse wrote: > > On Tue, Sep 10, 2019 at 07:49:51AM +0000, Mircea CIRJALIU - MELIU wrote: > >>> On 05/09/19 20:09, Jerome Glisse wrote: > >>>> Not sure i understand, you are saying that the solution i outline > >>>> above does not work ? If so then i think you are wrong, in the above > >>>> solution the importing process mmap a device file and the resulting > >>>> vma is then populated using insert_pfn() and constantly keep > >>>> synchronize with the target process through mirroring which means that > >>>> you never have to look at the struct page ... you can mirror any kind > >>>> of memory from the remote process. > >>> > >>> If insert_pfn in turn calls MMU notifiers for the target VMA (which would be > >>> the KVM MMU notifier), then that would work. Though I guess it would be > >>> possible to call MMU notifier update callbacks around the call to insert_pfn. > >> > >> Can't do that. > >> First, insert_pfn() uses set_pte_at() which won't trigger the MMU notifier on > >> the target VMA. It's also static, so I'll have to access it thru vmf_insert_pfn() > >> or vmf_insert_mixed(). > > > > Why would you need to target mmu notifier on target vma ? > > If the mapping of the source VMA changes, mirroring can update the > target VMA via insert_pfn. But what ensures that KVM's MMU notifier > dismantles its own existing page tables (so that they can be recreated > with the new mapping from the source VMA)? >So just to make sure i follow we have: - qemu process on host with anonymous vma -> host cpu page table - kvm which maps host anonymous vma to guest -> kvm guest page table - kvm inspector process which mirror vma from qemu process -> inspector process page table AFAIK the KVM notifier's will clear the kvm guest page table whenever necessary (through kvm_mmu_notifier_invalidate_range_start). This is what ensure that KVM's dismatles its own mapping, it abides to mmu- notifier callbacks. If you did not you would have bugs (at least i expect so). Am i wrong here ? The mirroring kernel driver would also register the notifier against the quemu process and would also abide to notifier callbacks. What you want to maintain at all times is that none of the actors above ever look at different page for the same virtual address (ie one looking at older page while another look at new page). This is where you have helper like HMM that make sure that you can not populate the mirroring vma while a notifier is on going. Which means that everything is serialize on the notifier. Cheers, J?r?me
On 02/10/19 16:15, Jerome Glisse wrote:>>> Why would you need to target mmu notifier on target vma ? >> If the mapping of the source VMA changes, mirroring can update the >> target VMA via insert_pfn. But what ensures that KVM's MMU notifier >> dismantles its own existing page tables (so that they can be recreated >> with the new mapping from the source VMA)? >> > So just to make sure i follow we have: > - qemu process on host with anonymous vma > -> host cpu page table > - kvm which maps host anonymous vma to guest > -> kvm guest page table > - kvm inspector process which mirror vma from qemu process > -> inspector process page table > > AFAIK the KVM notifier's will clear the kvm guest page table whenever > necessary (through kvm_mmu_notifier_invalidate_range_start). This is > what ensure that KVM's dismatles its own mapping, it abides to mmu- > notifier callbacks. If you did not you would have bugs (at least i > expect so). Am i wrong here ?The KVM inspector process is also (or can be) a QEMU that will have to create its own KVM guest page table. So if a page in the source VMA is unmapped we want: - the source KVM to invalidate its guest page table (done by the KVM MMU notifier) - the target VMA to be invalidated (easy using mirroring) - the target KVM to invalidate its guest page table, as a result of invalidation of the target VMA Paolo