Dave Hansen
2016-Nov-30 19:15 UTC
[PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
On 11/30/2016 12:43 AM, Liang Li wrote:> +static void send_unused_pages_info(struct virtio_balloon *vb, > + unsigned long req_id) > +{ > + struct scatterlist sg_in; > + unsigned long pos = 0; > + struct virtqueue *vq = vb->req_vq; > + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr; > + int ret, order; > + > + mutex_lock(&vb->balloon_lock); > + > + for (order = MAX_ORDER - 1; order >= 0; order--) {I scratched my head for a bit on this one. Why are you walking over orders, *then* zones. I *think* you're doing it because you can efficiently fill the bitmaps at a given order for all zones, then move to a new bitmap. But, it would be interesting to document this.> + pos = 0; > + ret = get_unused_pages(vb->resp_data, > + vb->resp_buf_size / sizeof(unsigned long), > + order, &pos);FWIW, get_unsued_pages() is a pretty bad name. "get" usually implies bumping reference counts or consuming something. You're just "recording" or "marking" them.> + if (ret == -ENOSPC) { > + void *new_resp_data; > + > + new_resp_data = kmalloc(2 * vb->resp_buf_size, > + GFP_KERNEL); > + if (new_resp_data) { > + kfree(vb->resp_data); > + vb->resp_data = new_resp_data; > + vb->resp_buf_size *= 2;What happens to the data in ->resp_data at this point? Doesn't this just throw it away? ...> +struct page_info_item { > + __le64 start_pfn : 52; /* start pfn for the bitmap */ > + __le64 page_shift : 6; /* page shift width, in bytes */ > + __le64 bmap_len : 6; /* bitmap length, in bytes */ > +};Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right?> +static int mark_unused_pages(struct zone *zone, > + unsigned long *unused_pages, unsigned long size, > + int order, unsigned long *pos) > +{ > + unsigned long pfn, flags; > + unsigned int t; > + struct list_head *curr; > + struct page_info_item *info; > + > + if (zone_is_empty(zone)) > + return 0; > + > + spin_lock_irqsave(&zone->lock, flags); > + > + if (*pos + zone->free_area[order].nr_free > size) > + return -ENOSPC;Urg, so this won't partially fill? So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer where this simply won't work?> + for (t = 0; t < MIGRATE_TYPES; t++) { > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > + pfn = page_to_pfn(list_entry(curr, struct page, lru)); > + info = (struct page_info_item *)(unused_pages + *pos); > + info->start_pfn = pfn; > + info->page_shift = order + PAGE_SHIFT; > + *pos += 1; > + } > + }Do we need to fill in ->bmap_len here?
Li, Liang Z
2016-Dec-04 13:13 UTC
[PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
> On 11/30/2016 12:43 AM, Liang Li wrote: > > +static void send_unused_pages_info(struct virtio_balloon *vb, > > + unsigned long req_id) > > +{ > > + struct scatterlist sg_in; > > + unsigned long pos = 0; > > + struct virtqueue *vq = vb->req_vq; > > + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr; > > + int ret, order; > > + > > + mutex_lock(&vb->balloon_lock); > > + > > + for (order = MAX_ORDER - 1; order >= 0; order--) { > > I scratched my head for a bit on this one. Why are you walking over orders, > *then* zones. I *think* you're doing it because you can efficiently fill the > bitmaps at a given order for all zones, then move to a new bitmap. But, it > would be interesting to document this. >Yes, use the order is somewhat strange, but it's helpful to keep the API simple. Do you think it's acceptable?> > + pos = 0; > > + ret = get_unused_pages(vb->resp_data, > > + vb->resp_buf_size / sizeof(unsigned long), > > + order, &pos); > > FWIW, get_unsued_pages() is a pretty bad name. "get" usually implies > bumping reference counts or consuming something. You're just "recording" > or "marking" them. >Will change to mark_unused_pages().> > + if (ret == -ENOSPC) { > > + void *new_resp_data; > > + > > + new_resp_data = kmalloc(2 * vb->resp_buf_size, > > + GFP_KERNEL); > > + if (new_resp_data) { > > + kfree(vb->resp_data); > > + vb->resp_data = new_resp_data; > > + vb->resp_buf_size *= 2; > > What happens to the data in ->resp_data at this point? Doesn't this just > throw it away? >Yes, so we should make sure the data in resp_data is not inuse.> ... > > +struct page_info_item { > > + __le64 start_pfn : 52; /* start pfn for the bitmap */ > > + __le64 page_shift : 6; /* page shift width, in bytes */ > > + __le64 bmap_len : 6; /* bitmap length, in bytes */ }; > > Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right? >Currently, we just use the 8 bytes and 0 bytes bitmap, should we support more than 64 bytes?> > +static int mark_unused_pages(struct zone *zone, > > + unsigned long *unused_pages, unsigned long size, > > + int order, unsigned long *pos) > > +{ > > + unsigned long pfn, flags; > > + unsigned int t; > > + struct list_head *curr; > > + struct page_info_item *info; > > + > > + if (zone_is_empty(zone)) > > + return 0; > > + > > + spin_lock_irqsave(&zone->lock, flags); > > + > > + if (*pos + zone->free_area[order].nr_free > size) > > + return -ENOSPC; > > Urg, so this won't partially fill? So, what the nr_free pages limit where we no > longer fit in the kmalloc()'d buffer where this simply won't work? >Yes. My initial implementation is partially fill, it's better for the worst case. I thought the above code is more efficient for most case ... Do you think partially fill the bitmap is better?> > + for (t = 0; t < MIGRATE_TYPES; t++) { > > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > > + pfn = page_to_pfn(list_entry(curr, struct page, lru)); > > + info = (struct page_info_item *)(unused_pages + > *pos); > > + info->start_pfn = pfn; > > + info->page_shift = order + PAGE_SHIFT; > > + *pos += 1; > > + } > > + } > > Do we need to fill in ->bmap_len here?For integrity, the bmap_len should be filled, will add. Omit this step just because QEMU assume the ->bmp_len is 0 and ignore this field. Thanks for your comment! Liang
Dave Hansen
2016-Dec-05 17:22 UTC
[PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
On 12/04/2016 05:13 AM, Li, Liang Z wrote:>> On 11/30/2016 12:43 AM, Liang Li wrote: >>> +static void send_unused_pages_info(struct virtio_balloon *vb, >>> + unsigned long req_id) >>> +{ >>> + struct scatterlist sg_in; >>> + unsigned long pos = 0; >>> + struct virtqueue *vq = vb->req_vq; >>> + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr; >>> + int ret, order; >>> + >>> + mutex_lock(&vb->balloon_lock); >>> + >>> + for (order = MAX_ORDER - 1; order >= 0; order--) { >> >> I scratched my head for a bit on this one. Why are you walking over orders, >> *then* zones. I *think* you're doing it because you can efficiently fill the >> bitmaps at a given order for all zones, then move to a new bitmap. But, it >> would be interesting to document this. > > Yes, use the order is somewhat strange, but it's helpful to keep the API simple. > Do you think it's acceptable?Yeah, it's fine. Just comment it, please.>>> + if (ret == -ENOSPC) { >>> + void *new_resp_data; >>> + >>> + new_resp_data = kmalloc(2 * vb->resp_buf_size, >>> + GFP_KERNEL); >>> + if (new_resp_data) { >>> + kfree(vb->resp_data); >>> + vb->resp_data = new_resp_data; >>> + vb->resp_buf_size *= 2; >> >> What happens to the data in ->resp_data at this point? Doesn't this just >> throw it away? > > Yes, so we should make sure the data in resp_data is not inuse.But doesn't it have valid data that we just collected and haven't told the hypervisor about yet? Aren't we throwing away good data that cost us something to collect?>> ... >>> +struct page_info_item { >>> + __le64 start_pfn : 52; /* start pfn for the bitmap */ >>> + __le64 page_shift : 6; /* page shift width, in bytes */What does a page_shift "in bytes" mean? :)>>> + __le64 bmap_len : 6; /* bitmap length, in bytes */ }; >> >> Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right? > > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support more than 64 bytes?It just means that with this format, you end up wasting at least ~1/8th of the space with metadata. That's a bit unfortunate, but I guess it's not fatal. I'd definitely call it out in the patch description and make sure other folks take a look at it. There's a somewhat easy fix, but that would make the qemu implementation more complicated: You could just have bmap_len==0x3f imply that there's another field that contains an extended bitmap length for when you need long bitmaps. But, as you note, there's no need for it, so it's a matter of trading the extra complexity versus the desire to not habing to change the ABI again for longer (hopefully).>>> +static int mark_unused_pages(struct zone *zone, >>> + unsigned long *unused_pages, unsigned long size, >>> + int order, unsigned long *pos) >>> +{ >>> + unsigned long pfn, flags; >>> + unsigned int t; >>> + struct list_head *curr; >>> + struct page_info_item *info; >>> + >>> + if (zone_is_empty(zone)) >>> + return 0; >>> + >>> + spin_lock_irqsave(&zone->lock, flags); >>> + >>> + if (*pos + zone->free_area[order].nr_free > size) >>> + return -ENOSPC; >> >> Urg, so this won't partially fill? So, what the nr_free pages limit where we no >> longer fit in the kmalloc()'d buffer where this simply won't work? > > Yes. My initial implementation is partially fill, it's better for the worst case. > I thought the above code is more efficient for most case ... > Do you think partially fill the bitmap is better?Could you please answer the question I asked? Because if you don't get this right, it could mean that there are system that simply *fail* here.
Reasonably Related Threads
- [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
- [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
- [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
- [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
- [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration