Michael Kelley
2021-Mar-08 19:30 UTC
[RFC PATCH 08/18] virt/mshv: map and unmap guest memory
From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Monday, March 8, 2021 11:14 AM> > On 2/8/2021 11:45 AM, Michael Kelley wrote: > > From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Friday, November > 20, 2020 4:30 PM > >>[snip]> >> @@ -245,16 +249,318 @@ hv_call_delete_partition(u64 partition_id) > >> return -hv_status_to_errno(status); > >> } > >> > >> +static int > >> +hv_call_map_gpa_pages(u64 partition_id, > >> + u64 gpa_target, > >> + u64 page_count, u32 flags, > >> + struct page **pages) > >> +{ > >> + struct hv_map_gpa_pages *input_page; > >> + int status; > >> + int i; > >> + struct page **p; > >> + u32 completed = 0; > >> + u64 hypercall_status; > >> + unsigned long remaining = page_count; > >> + int rep_count; > >> + unsigned long irq_flags; > >> + int ret = 0; > >> + > >> + while (remaining) { > >> + > >> + rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE); > >> + > >> + local_irq_save(irq_flags); > >> + input_page = (struct hv_map_gpa_pages *)(*this_cpu_ptr( > >> + hyperv_pcpu_input_arg)); > >> + > >> + input_page->target_partition_id = partition_id; > >> + input_page->target_gpa_base = gpa_target; > >> + input_page->map_flags = flags; > >> + > >> + for (i = 0, p = pages; i < rep_count; i++, p++) > >> + input_page->source_gpa_page_list[i] > >> + page_to_pfn(*p) & HV_MAP_GPA_MASK; > > > > The masking seems a bit weird. The mask allows for up to 64G page frames, > > which is 256 Tbytes of total physical memory, which is probably the current > > Hyper-V limit on memory size (48 bit physical address space, though 52 bit > > physical address spaces are coming). So the masking shouldn't ever be doing > > anything. And if it was doing something, that probably should be treated as > > an error rather than simply dropping the high bits. > > Good point - It looks like the mask isn't needed. > > > > > Note that this code does not handle the case where PAGE_SIZE !> > HV_HYP_PAGE_SIZE. But maybe we'll never run the root partition with a > > page size other than 4K. > > > > For now on x86 it won't happen, but maybe on ARM? > It shouldn't be hard to support this case, especially since > PAGE_SIZE >= HV_HYP_PAGE_SIZE. Do you think we need it in this patch set?No, from my perspective, this case does not need to be handled in this patch set.> > >> + hypercall_status = hv_do_rep_hypercall( > >> + HVCALL_MAP_GPA_PAGES, rep_count, 0, input_page, NULL); > >> + local_irq_restore(irq_flags); > >> + > >> + status = hypercall_status & HV_HYPERCALL_RESULT_MASK; > >> + completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >> > >> + HV_HYPERCALL_REP_COMP_OFFSET; > >> + > >> + if (status == HV_STATUS_INSUFFICIENT_MEMORY) { > >> + ret = hv_call_deposit_pages(NUMA_NO_NODE, > >> + partition_id, 256); > > > > Why adding 256 pages? I'm just contrasting with other places that add > > 1 page at a time. Maybe a comment to explain .... > > > > Empirically determined. I'll add a #define and comment. > > >> + if (ret) > >> + break; > >> + } else if (status != HV_STATUS_SUCCESS) { > >> + pr_err("%s: completed %llu out of %llu, %s\n", > >> + __func__, > >> + page_count - remaining, page_count, > >> + hv_status_to_string(status)); > >> + ret = -hv_status_to_errno(status); > >> + break; > >> + } > >> + > >> + pages += completed; > >> + remaining -= completed; > >> + gpa_target += completed; > >> + } > >> + > >> + if (ret && completed) { > > > > Is the above the right test? Completed could be zero from the most > > recent iteration, but still could be partially succeeded based on a previous > > successful iteration. I think this needs to check whether remaining equals > > page_count. > > > > You're right; I'll change it to (ret && remaining < page_count) > > >> + pr_err("%s: Partially succeeded; mapped regions may be in invalid state", > >> + __func__); > >> + ret = -EBADFD; > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static int > >> +hv_call_unmap_gpa_pages(u64 partition_id, > >> + u64 gpa_target, > >> + u64 page_count, u32 flags) > >> +{ > >> + struct hv_unmap_gpa_pages *input_page; > >> + int status; > >> + int ret = 0; > >> + u32 completed = 0; > >> + u64 hypercall_status; > >> + unsigned long remaining = page_count; > >> + int rep_count; > >> + unsigned long irq_flags; > >> + > >> + local_irq_save(irq_flags); > >> + input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr( > >> + hyperv_pcpu_input_arg)); > >> + > >> + input_page->target_partition_id = partition_id; > >> + input_page->target_gpa_base = gpa_target; > >> + input_page->unmap_flags = flags; > >> + > >> + while (remaining) { > >> + rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE); > >> + hypercall_status = hv_do_rep_hypercall( > >> + HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL); > > > > Similarly, this code doesn't handle PAGE_SIZE != HV_HYP_PAGE_SIZE. > > > > As above - do we need this for this patch set? This won't happen on x86.Again, not needed from my perspective.> > >> + status = hypercall_status & HV_HYPERCALL_RESULT_MASK; > >> + completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >> > >> + HV_HYPERCALL_REP_COMP_OFFSET; > >> + if (status != HV_STATUS_SUCCESS) { > >> + pr_err("%s: completed %llu out of %llu, %s\n", > >> + __func__, > >> + page_count - remaining, page_count, > >> + hv_status_to_string(status)); > >> + ret = -hv_status_to_errno(status); > >> + break; > >> + } > >> + > >> + remaining -= completed; > >> + gpa_target += completed; > >> + input_page->target_gpa_base = gpa_target; > >> + } > >> + local_irq_restore(irq_flags); > > > > I have some concern about holding interrupts disabled for this long. > > > > How about I move the interrupt enabling/disabling inside the loop? i.e.: > while (remaining) { > local_irq_save(irq_flags); > input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr( > hyperv_pcpu_input_arg)); > > input_page->target_partition_id = partition_id; > input_page->target_gpa_base = gpa_target; > input_page->unmap_flags = flags; > rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE); > status = hv_do_rep_hypercall( > HVCALL_UNMAP_GPA_PAGES, rep_count, 0, input_page, NULL); > local_irq_restore(irq_flags); > > completed = (status & HV_HYPERCALL_REP_COMP_MASK) >> > HV_HYPERCALL_REP_COMP_OFFSET; > status &= HV_HYPERCALL_RESULT_MASK; > if (status != HV_STATUS_SUCCESS) { > pr_err("%s: completed %llu out of %llu, %s\n", > __func__, > page_count - remaining, page_count, > hv_status_to_string(status)); > ret = hv_status_to_errno(status); > break; > } > > remaining -= completed; > gpa_target += completed; > } > >Yes, that would help.> >> + > >> + if (ret && completed) { > > > > Same comment as before. > > > > Ditto as above. > > >> + pr_err("%s: Partially succeeded; mapped regions may be in invalid state", > >> + __func__); > >> + ret = -EBADFD; > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static long > >> +mshv_partition_ioctl_map_memory(struct mshv_partition *partition, > >> + struct mshv_user_mem_region __user *user_mem) > >> +{ > >> + struct mshv_user_mem_region mem; > >> + struct mshv_mem_region *region; > >> + int completed; > >> + unsigned long remaining, batch_size; > >> + int i; > >> + struct page **pages; > >> + u64 page_count, user_start, user_end, gpfn_start, gpfn_end; > >> + u64 region_page_count, region_user_start, region_user_end; > >> + u64 region_gpfn_start, region_gpfn_end; > >> + long ret = 0; > >> + > >> + /* Check we have enough slots*/ > >> + if (partition->regions.count == MSHV_MAX_MEM_REGIONS) { > >> + pr_err("%s: not enough memory region slots\n", __func__); > >> + return -ENOSPC; > >> + } > >> + > >> + if (copy_from_user(&mem, user_mem, sizeof(mem))) > >> + return -EFAULT; > >> + > >> + if (!mem.size || > >> + mem.size & (PAGE_SIZE - 1) || > >> + mem.userspace_addr & (PAGE_SIZE - 1) || > > > > There's a PAGE_ALIGNED macro that expresses exactly what > > each of the previous two tests is doing. > > > > Since these need to be HV_HYP_PAGE_SIZE aligned, I will add a > HV_HYP_PAGE_ALIGNED macro for this.I was thinking that PAGE_SIZE and PAGE_ALIGNED are correct. If this code were running on an ARM64 system with a 64K page size, the 64K alignment would be fine and will make sense from the user space perspective. You don't want to be mapping part of a user space page. And 64K alignment will certainly satisfy Hyper-V's requirement for 4K alignment. The real requirement from Hyper-V's standpoint is that the alignment not be smaller than 4K. But maybe I'm misunderstanding. Michael