Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 0/6] cow: Implement trimming and extents.
This patch series implements trimming and extents in nbdkit-cow-filter. The aim of this is to have feature parity with qcow2 zero clusters, which we can then use to replace the qcow2 overlays currently used to protect the source disk in virt-v2v. First 3 patches are some preparatory enhancements/clean-up. To implement extents: We iterate over the blocks in the filter, if the block is allocated (ie. contains a write which has been saved in the overlay) then the whole block is allocated data, else we have to ask the underlying plugin for the information about that block. (This ends up making lots of small extents calls into the plugin, which could be optimized in future.) To implement trim: We extend the overlay bitmap with an extra bit per block so we can store whether the block has been trimmed. A future optimization is to actually trim the overlay (it only saves space, it's not needed for correctness), but this patch series does not do this. Final patch implements a couple of tests. Rich.
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 1/6] filters: Add a function to read full extents from a plugin.
This convenience function makes it a bit simpler to read a full set of extents covering a range from a plugin, especially for plugins which don't reply with a full set of extents in a single call. --- docs/nbdkit-filter.pod | 22 +++++++++++++++++ include/nbdkit-filter.h | 5 ++++ server/extents.c | 54 +++++++++++++++++++++++++++++++++++++++++ server/nbdkit.syms | 1 + 4 files changed, 82 insertions(+) diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod index 82b1ade1..98a08ca4 100644 --- a/docs/nbdkit-filter.pod +++ b/docs/nbdkit-filter.pod @@ -857,6 +857,28 @@ Returns the number of extents in the list. Returns a copy of the C<i>'th extent. +=head3 Reading the full extents from the plugin + +A convenience function is provided to filters only which makes one or +more requests to the underlying plugin until we have a full set of +extents covering the region C<[offset..offset+count-1]>. + + struct nbdkit_extents *nbdkit_extents_full ( + struct nbdkit_next_ops *next_ops, + nbdkit_backend *nxdata, + uint32_t count, uint64_t offset, + uint32_t flags, int *err); + +Note this allocates a new C<struct nbdkit_extents> which the caller +must free. C<flags> is passed through to the underlying plugin, but +C<NBDKIT_FLAG_REQ_ONE> is removed from the set of flags so that the +plugin returns as much information as possible (this is usually what +you want). + +On error this function can return C<NULL>. In this case it calls +C<nbdkit_error> and/or C<nbdkit_set_error> as required. C<*err> will +be set to a suitable value. + =head3 Enforcing alignment of an nbdkit_extents list A convenience function is provided to filters only which makes it diff --git a/include/nbdkit-filter.h b/include/nbdkit-filter.h index afe83e0b..0964c6e7 100644 --- a/include/nbdkit-filter.h +++ b/include/nbdkit-filter.h @@ -124,6 +124,11 @@ NBDKIT_EXTERN_DECL (size_t, nbdkit_extents_count, (const struct nbdkit_extents *)); NBDKIT_EXTERN_DECL (struct nbdkit_extent, nbdkit_get_extent, (const struct nbdkit_extents *, size_t)); +NBDKIT_EXTERN_DECL (struct nbdkit_extents *, nbdkit_extents_full, + (struct nbdkit_next_ops *next_ops, + nbdkit_backend *nxdata, + uint32_t count, uint64_t offset, + uint32_t flags, int *err)); NBDKIT_EXTERN_DECL (int, nbdkit_extents_aligned, (struct nbdkit_next_ops *next_ops, nbdkit_backend *nxdata, diff --git a/server/extents.c b/server/extents.c index 5f6f3e2e..a081156e 100644 --- a/server/extents.c +++ b/server/extents.c @@ -292,3 +292,57 @@ nbdkit_extents_aligned (struct nbdkit_next_ops *next_ops, /* Once we get here, all extents are aligned. */ return 0; } + +/* This is a convenient wrapper around next_ops->extents which can be + * used from filters where you want to get a complete set of extents + * covering the region [offset..offset+count-1]. + */ +struct nbdkit_extents * +nbdkit_extents_full (struct nbdkit_next_ops *next_ops, nbdkit_backend *nxdata, + uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + struct nbdkit_extents *ret; + + /* Clear REQ_ONE to ask the plugin for as much information as it is + * willing to return (the plugin may still truncate if it is too + * costly to provide everything). + */ + flags &= ~NBDKIT_FLAG_REQ_ONE; + + ret = nbdkit_extents_new (offset, offset+count); + if (ret == NULL) goto error0; + + while (count > 0) { + const uint64_t old_offset = offset; + size_t i; + + CLEANUP_EXTENTS_FREE struct nbdkit_extents *t + = nbdkit_extents_new (offset, offset+count); + if (t == NULL) goto error1; + + if (next_ops->extents (nxdata, count, offset, flags, t, err) == -1) + goto error0; + + for (i = 0; i < nbdkit_extents_count (t); ++i) { + const struct nbdkit_extent e = nbdkit_get_extent (t, i); + if (nbdkit_add_extent (ret, e.offset, e.length, e.type) == -1) + goto error1; + + assert (e.length <= count); + offset += e.length; + count -= e.length; + } + + /* If the plugin is behaving we must make forward progress. */ + assert (offset > old_offset); + } + + return ret; + + error1: + *err = errno; + error0: + if (ret) nbdkit_extents_free (ret); + return NULL; +} diff --git a/server/nbdkit.syms b/server/nbdkit.syms index 3d6b2235..20ee27f3 100644 --- a/server/nbdkit.syms +++ b/server/nbdkit.syms @@ -51,6 +51,7 @@ nbdkit_extents_aligned; nbdkit_extents_count; nbdkit_extents_free; + nbdkit_extents_full; nbdkit_extents_new; nbdkit_get_extent; nbdkit_is_tls; -- 2.29.0.rc2
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 2/6] checkwrite: Simplify trim/zero using nbdkit_extents_full.
--- filters/checkwrite/checkwrite.c | 96 +++++++++++++-------------------- 1 file changed, 38 insertions(+), 58 deletions(-) diff --git a/filters/checkwrite/checkwrite.c b/filters/checkwrite/checkwrite.c index 68c57c37..2792d4d8 100644 --- a/filters/checkwrite/checkwrite.c +++ b/filters/checkwrite/checkwrite.c @@ -150,67 +150,47 @@ checkwrite_trim_zero (struct nbdkit_next_ops *next_ops, void *nxdata, { /* If the plugin supports extents, speed this up by using them. */ if (next_ops->can_extents (nxdata)) { - while (count > 0) { - struct nbdkit_extents *exts; - size_t i, n; - - exts = nbdkit_extents_new (offset, offset + count); - if (exts == NULL) - return -1; - if (next_ops->extents (nxdata, count, offset, 0, exts, err) == -1) - return -1; - - /* Ignore any extents or partial extents which are outside the - * offset/count that we are looking at. The plugin is required - * to return at least one relevant extent so we can assume this - * loop will make forward progress. - */ - n = nbdkit_extents_count (exts); - for (i = 0; i < n && count > 0; ++i) { - uint64_t next_extent_offset; - struct nbdkit_extent e; - - e = nbdkit_get_extent (exts, i); - - if (e.offset + e.length <= offset) - continue; - if (e.offset > offset) - break; - - next_extent_offset = e.offset + e.length; - - /* Anything that reads back as zero is good. */ - if ((e.type & NBDKIT_EXTENT_ZERO) != 0) { - const uint64_t zerolen = MIN (count, next_extent_offset - offset); - - offset += zerolen; - count -= zerolen; - continue; + size_t i, n; + CLEANUP_EXTENTS_FREE struct nbdkit_extents *exts + nbdkit_extents_full (next_ops, nxdata, count, offset, 0, err); + if (exts == NULL) + return -1; + + n = nbdkit_extents_count (exts); + for (i = 0; i < n && count > 0; ++i) { + const struct nbdkit_extent e = nbdkit_get_extent (exts, i); + const uint64_t next_extent_offset = e.offset + e.length; + + /* Anything that reads back as zero is good. */ + if ((e.type & NBDKIT_EXTENT_ZERO) != 0) { + const uint64_t zerolen = MIN (count, next_extent_offset - offset); + + offset += zerolen; + count -= zerolen; + continue; + } + + /* Otherwise we have to read the underlying data and check. */ + while (count > 0) { + size_t buflen = MIN (MAX_REQUEST_SIZE, count); + buflen = MIN (buflen, next_extent_offset - offset); + + CLEANUP_FREE char *buf = malloc (buflen); + if (buf == NULL) { + *err = errno; + nbdkit_error ("malloc: %m"); + return -1; } - /* Otherwise we have to read the underlying data and check. */ - while (count > 0) { - const size_t buflen - MIN3 (MAX_REQUEST_SIZE, count, next_extent_offset - offset); - CLEANUP_FREE char *buf = malloc (buflen); - if (buf == NULL) { - *err = errno; - nbdkit_error ("malloc: %m"); - return -1; - } - - if (next_ops->pread (nxdata, buf, buflen, offset, 0, err) == -1) - return -1; - if (! is_zero (buf, buflen)) - return data_does_not_match (err); - - count -= buflen; - offset += buflen; - } - } /* for extent */ + if (next_ops->pread (nxdata, buf, buflen, offset, 0, err) == -1) + return -1; + if (! is_zero (buf, buflen)) + return data_does_not_match (err); - nbdkit_extents_free (exts); - } /* while (count > 0) */ + count -= buflen; + offset += buflen; + } + } /* for extent */ } /* Otherwise the plugin does not support extents, so do this the -- 2.29.0.rc2
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 3/6] common/include/minmax.h: Remove MIN3/MAX3 macros.
These were added to support the checkwrite filter, but the original change didn't work on platforms that lack __auto_type, resulting in an additional hack. However as part of changing checkwrite to use nbdkit_extents_full these macros (and hack) are no longer needed, so remove them. Updates: commit acaa3db5b39759d9e7e65f4e5ef10a5383e50b29 Updates: commit 2d0954dd6431cfd8d82e873a5b99285b0fddbeb8 --- common/include/minmax.h | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/common/include/minmax.h b/common/include/minmax.h index aa95207e..d3e667ea 100644 --- a/common/include/minmax.h +++ b/common/include/minmax.h @@ -76,18 +76,4 @@ #endif -/* Helpful to have MIN3, MAX3 and maybe more in future. */ -#ifdef HAVE_AUTO_TYPE -#define MIN3(x0, x1, x2) (MIN (MIN ((x0), (x1)), (x2))) -#define MAX3(x0, x1, x2) (MAX (MAX ((x0), (x1)), (x2))) -#else -/* Unfortunately without __auto_type nesting the macros is not - * possible so we must use this unclean macro instead. - */ -#define MIN_UNCLEAN(x,y) ((x) < (y) ? (x) : (y)) -#define MAX_UNCLEAN(x,y) ((x) > (y) ? (x) : (y)) -#define MIN3(x0, x1, x2) (MIN_UNCLEAN (MIN_UNCLEAN ((x0), (x1)), (x2))) -#define MAX3(x0, x1, x2) (MAX_UNCLEAN (MAX_UNCLEAN ((x0), (x1)), (x2))) -#endif - #endif /* NBDKIT_MINMAX_H */ -- 2.29.0.rc2
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 4/6] cow: Implement extents.
We implement extents in the filter by checking block-by-block if the block is present in the overlay (in which case it's non-sparse) or if not calling through to the underlying plugin. --- filters/cow/blk.h | 3 ++ filters/cow/blk.c | 10 +++++ filters/cow/cow.c | 102 ++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 111 insertions(+), 4 deletions(-) diff --git a/filters/cow/blk.h b/filters/cow/blk.h index 28ef3b4a..5eb30794 100644 --- a/filters/cow/blk.h +++ b/filters/cow/blk.h @@ -54,6 +54,9 @@ extern void blk_free (void); /* Allocate or resize the overlay and bitmap. */ extern int blk_set_size (uint64_t new_size); +/* Returns the status of the block in the overlay. */ +extern void blk_status (uint64_t blknum, bool *present, bool *trimmed); + /* Read a single block from the overlay or plugin. */ extern int blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, uint8_t *block, int *err) diff --git a/filters/cow/blk.c b/filters/cow/blk.c index 9e85920f..4a8adfb9 100644 --- a/filters/cow/blk.c +++ b/filters/cow/blk.c @@ -177,6 +177,16 @@ blk_set_allocated (uint64_t blknum) bitmap_set_blk (&bm, blknum, true); } +/* This is a bit of a hack since usually this information is hidden in + * the blk module. However it is needed when calculating extents. + */ +void +blk_status (uint64_t blknum, bool *present, bool *trimmed) +{ + *present = blk_is_allocated (blknum); + *trimmed = false; +} + /* These are the block operations. They always read or write a single * whole block of size ?blksize?. */ diff --git a/filters/cow/cow.c b/filters/cow/cow.c index 92358375..d12565e6 100644 --- a/filters/cow/cow.c +++ b/filters/cow/cow.c @@ -141,9 +141,6 @@ cow_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, return r >= 0 ? 0 : -1; } -/* Whatever the underlying plugin can or can't do, we can write, we - * cannot trim or detect extents, and we can flush. - */ static int cow_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { @@ -159,7 +156,7 @@ cow_can_trim (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) static int cow_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { - return 0; + return 1; } static int @@ -499,6 +496,102 @@ cow_cache (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +/* Extents. */ +static int +cow_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + struct nbdkit_extents *extents, int *err) +{ + const bool can_extents = next_ops->can_extents (nxdata); + const bool req_one = flags & NBDKIT_FLAG_REQ_ONE; + uint64_t end; + uint64_t blknum; + + /* To make this easier, align the requested extents to whole blocks. */ + end = offset + count; + offset = ROUND_DOWN (offset, BLKSIZE); + end = ROUND_UP (end, BLKSIZE); + count = end - offset; + blknum = offset / BLKSIZE; + + assert (IS_ALIGNED (offset, BLKSIZE)); + assert (IS_ALIGNED (count, BLKSIZE)); + assert (count > 0); /* We must make forward progress. */ + + /* We hold the lock for the whole time, even when requesting extents + * from the plugin, because we want to present an atomic picture of + * the current state. + */ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + + while (count > 0) { + bool present, trimmed; + struct nbdkit_extent e; + + blk_status (blknum, &present, &trimmed); + + /* Present in the overlay. */ + if (present) { + e.offset = offset; + e.length = BLKSIZE; + + if (trimmed) + e.type = NBDKIT_EXTENT_HOLE|NBDKIT_EXTENT_ZERO; + else + e.type = 0; + + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) { + *err = errno; + return -1; + } + } + + /* Not present in the overlay, but we can ask the plugin */ + else if (can_extents) { + size_t i; + + CLEANUP_EXTENTS_FREE struct nbdkit_extents *extents2 + nbdkit_extents_full (next_ops, nxdata, BLKSIZE, offset, flags, err); + if (extents2 == NULL) + return -1; + + for (i = 0; i < nbdkit_extents_count (extents2); ++i) { + e = nbdkit_get_extent (extents2, i); + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) { + *err = errno; + return -1; + } + } + } + + /* Otherwise assume the block is non-sparse. */ + else { + e.offset = offset; + e.length = BLKSIZE; + e.type = 0; + + if (nbdkit_add_extent (extents, e.offset, e.length, e.type) == -1) { + *err = errno; + return -1; + } + } + + blknum++; + offset += BLKSIZE; + count -= BLKSIZE; + + /* If the caller only wanted the first extent, and we've managed + * to add at least one extent to the list, then we can drop out + * now. (Note calling nbdkit_add_extent above does not mean the + * extent got added since it might be before the first offset.) + */ + if (req_one && nbdkit_extents_count (extents) > 0) + break; + } + + return 0; +} + static struct nbdkit_filter filter = { .name = "cow", .longname = "nbdkit copy-on-write (COW) filter", @@ -521,6 +614,7 @@ static struct nbdkit_filter filter = { .zero = cow_zero, .flush = cow_flush, .cache = cow_cache, + .extents = cow_extents, }; NBDKIT_REGISTER_FILTER(filter) -- 2.29.0.rc2
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 5/6] cow: Implement efficient trimming support.
By storing an extra bit in the overlay bitmap we can mark when whole blocks have been trimmed. This allows us to match a feature of qcow2 (zero clusters). In theory we could punch holes in the overlay to save a bit of disk space. --- filters/cow/blk.h | 4 ++ filters/cow/blk.c | 106 +++++++++++++++++++++++++++++++--------------- filters/cow/cow.c | 75 +++++++++++++++++++++++++++++++- 3 files changed, 149 insertions(+), 36 deletions(-) diff --git a/filters/cow/blk.h b/filters/cow/blk.h index 5eb30794..71eeb7ad 100644 --- a/filters/cow/blk.h +++ b/filters/cow/blk.h @@ -80,4 +80,8 @@ extern int blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata, extern int blk_write (uint64_t blknum, const uint8_t *block, int *err) __attribute__((__nonnull__ (2, 3))); +/* Trim a single block. */ +extern int blk_trim (uint64_t blknum, int *err) + __attribute__((__nonnull__ (2))); + #endif /* NBDKIT_BLK_H */ diff --git a/filters/cow/blk.c b/filters/cow/blk.c index 4a8adfb9..ae0db1fe 100644 --- a/filters/cow/blk.c +++ b/filters/cow/blk.c @@ -43,16 +43,27 @@ * plugin returns the same immutable data for each pread call we make, * and optimize on this basis. * - * A block bitmap is maintained in memory recording if each block in - * the temporary file is "allocated" (1) or "hole" (0). + * A 2-bit per block bitmap is maintained in memory recording if each + * block in the temporary file is: + * + * 00 = not allocated in the overlay (read through to the plugin) + * 01 = allocated in the overlay + * 10 = <unused> + * 11 = trimmed in the overlay * * When reading a block we first check the bitmap to see if that file - * block is allocated or a hole. If allocated, we return it from the - * temporary file. If a hole, we issue a pread to the underlying - * plugin. + * block is allocated, trimmed or not. If allocated, we return it + * from the temporary file. Trimmed returns zeroes. If not allocated + * we issue a pread to the underlying plugin. * * When writing a block we unconditionally write the data to the - * temporary file, setting the bit in the bitmap. + * temporary file, setting the bit in the bitmap. (Writing zeroes is + * handled the same way.) + * + * When trimming we set the trimmed flag in the bitmap for whole + * blocks, and handle the unaligned portions like writing zeroes + * above. We could punch holes in the overlay as an optimization, but + * for simplicity we do not do that yet. * * Since the overlay is a deleted temporary file, we can ignore FUA * and flush commands. @@ -92,9 +103,26 @@ /* The temporary overlay. */ static int fd = -1; -/* Bitmap. Bit = 1 => allocated, 0 => hole. */ +/* Bitmap. */ static struct bitmap bm; +enum bm_entry { + BLOCK_NOT_ALLOCATED = 0, + BLOCK_ALLOCATED = 1, + BLOCK_TRIMMED = 3, +}; + +static const char * +state_to_string (enum bm_entry state) +{ + switch (state) { + case BLOCK_NOT_ALLOCATED: return "not allocated"; + case BLOCK_ALLOCATED: return "allocated"; + case BLOCK_TRIMMED: return "trimmed"; + default: abort (); + } +} + int blk_init (void) { @@ -102,7 +130,7 @@ blk_init (void) size_t len; char *template; - bitmap_init (&bm, BLKSIZE, 1 /* bits per block */); + bitmap_init (&bm, BLKSIZE, 2 /* bits per block */); tmpdir = getenv ("TMPDIR"); if (!tmpdir) @@ -163,28 +191,16 @@ blk_set_size (uint64_t new_size) return 0; } -/* Return true if the block is allocated. Consults the bitmap. */ -static bool -blk_is_allocated (uint64_t blknum) -{ - return bitmap_get_blk (&bm, blknum, false); -} - -/* Mark a block as allocated. */ -static void -blk_set_allocated (uint64_t blknum) -{ - bitmap_set_blk (&bm, blknum, true); -} - /* This is a bit of a hack since usually this information is hidden in * the blk module. However it is needed when calculating extents. */ void blk_status (uint64_t blknum, bool *present, bool *trimmed) { - *present = blk_is_allocated (blknum); - *trimmed = false; + enum bm_entry state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_ALLOCATED); + + *present = state != BLOCK_NOT_ALLOCATED; + *trimmed = state == BLOCK_TRIMMED; } /* These are the block operations. They always read or write a single @@ -195,15 +211,14 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, uint8_t *block, int *err) { off_t offset = blknum * BLKSIZE; - bool allocated = blk_is_allocated (blknum); + enum bm_entry state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_ALLOCATED); nbdkit_debug ("cow: blk_read block %" PRIu64 " (offset %" PRIu64 ") is %s", - blknum, (uint64_t) offset, - !allocated ? "a hole" : "allocated"); + blknum, (uint64_t) offset, state_to_string (state)); - if (!allocated) /* Read underlying plugin. */ + if (state == BLOCK_NOT_ALLOCATED) /* Read underlying plugin. */ return next_ops->pread (nxdata, block, BLKSIZE, offset, 0, err); - else { /* Read overlay. */ + else if (state == BLOCK_ALLOCATED) { /* Read overlay. */ if (pread (fd, block, BLKSIZE, offset) == -1) { *err = errno; nbdkit_error ("pread: %m"); @@ -211,6 +226,10 @@ blk_read (struct nbdkit_next_ops *next_ops, void *nxdata, } return 0; } + else /* state == BLOCK_TRIMMED */ { + memset (block, 0, BLKSIZE); + return 0; + } } int @@ -218,13 +237,12 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata, uint64_t blknum, uint8_t *block, enum cache_mode mode, int *err) { off_t offset = blknum * BLKSIZE; - bool allocated = blk_is_allocated (blknum); + enum bm_entry state = bitmap_get_blk (&bm, blknum, BLOCK_NOT_ALLOCATED); nbdkit_debug ("cow: blk_cache block %" PRIu64 " (offset %" PRIu64 ") is %s", - blknum, (uint64_t) offset, - !allocated ? "a hole" : "allocated"); + blknum, (uint64_t) offset, state_to_string (state)); - if (allocated) { + if (state == BLOCK_ALLOCATED) { #if HAVE_POSIX_FADVISE int r = posix_fadvise (fd, offset, BLKSIZE, POSIX_FADV_WILLNEED); if (r) { @@ -235,6 +253,8 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata, #endif return 0; } + if (state == BLOCK_TRIMMED) + return 0; if (mode == BLK_CACHE_IGNORE) return 0; if (mode == BLK_CACHE_PASSTHROUGH) @@ -247,7 +267,7 @@ blk_cache (struct nbdkit_next_ops *next_ops, void *nxdata, nbdkit_error ("pwrite: %m"); return -1; } - blk_set_allocated (blknum); + bitmap_set_blk (&bm, blknum, BLOCK_ALLOCATED); } return 0; } @@ -265,7 +285,23 @@ blk_write (uint64_t blknum, const uint8_t *block, int *err) nbdkit_error ("pwrite: %m"); return -1; } - blk_set_allocated (blknum); + bitmap_set_blk (&bm, blknum, BLOCK_ALLOCATED); return 0; } + +int +blk_trim (uint64_t blknum, int *err) +{ + off_t offset = blknum * BLKSIZE; + + nbdkit_debug ("cow: blk_trim block %" PRIu64 " (offset %" PRIu64 ")", + blknum, (uint64_t) offset); + + /* XXX As an optimization we could punch a whole in the overlay + * here. However it's not trivial since BLKSIZE is unrelated to the + * overlay filesystem block size. + */ + bitmap_set_blk (&bm, blknum, BLOCK_TRIMMED); + return 0; +} diff --git a/filters/cow/cow.c b/filters/cow/cow.c index d12565e6..f3f44757 100644 --- a/filters/cow/cow.c +++ b/filters/cow/cow.c @@ -150,7 +150,7 @@ cow_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) static int cow_can_trim (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle) { - return 0; + return 1; } static int @@ -428,6 +428,78 @@ cow_zero (struct nbdkit_next_ops *next_ops, void *nxdata, return 0; } +/* Trim data. */ +static int +cow_trim (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, uint32_t count, uint64_t offset, uint32_t flags, + int *err) +{ + CLEANUP_FREE uint8_t *block = NULL; + uint64_t blknum, blkoffs; + int r; + + if (!IS_ALIGNED (count | offset, BLKSIZE)) { + block = malloc (BLKSIZE); + if (block == NULL) { + *err = errno; + nbdkit_error ("malloc: %m"); + return -1; + } + } + + blknum = offset / BLKSIZE; /* block number */ + blkoffs = offset % BLKSIZE; /* offset within the block */ + + /* Unaligned head */ + if (blkoffs) { + uint64_t n = MIN (BLKSIZE - blkoffs, count); + + /* Do a read-modify-write operation on the current block. + * Hold the lock over the whole operation. + */ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + r = blk_read (next_ops, nxdata, blknum, block, err); + if (r != -1) { + memset (&block[blkoffs], 0, n); + r = blk_write (blknum, block, err); + } + if (r == -1) + return -1; + + count -= n; + offset += n; + blknum++; + } + + /* Aligned body */ + while (count >= BLKSIZE) { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + r = blk_trim (blknum, err); + if (r == -1) + return -1; + + count -= BLKSIZE; + offset += BLKSIZE; + blknum++; + } + + /* Unaligned tail */ + if (count) { + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + r = blk_read (next_ops, nxdata, blknum, block, err); + if (r != -1) { + memset (&block[count], 0, BLKSIZE - count); + r = blk_write (blknum, block, err); + } + if (r == -1) + return -1; + } + + /* flags & NBDKIT_FLAG_FUA is deliberately ignored. */ + + return 0; +} + static int cow_flush (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, uint32_t flags, int *err) @@ -612,6 +684,7 @@ static struct nbdkit_filter filter = { .pread = cow_pread, .pwrite = cow_pwrite, .zero = cow_zero, + .trim = cow_trim, .flush = cow_flush, .cache = cow_cache, .extents = cow_extents, -- 2.29.0.rc2
Richard W.M. Jones
2021-Jan-26 21:51 UTC
[Libguestfs] [PATCH nbdkit 6/6] cow: Add tests of trimming and extents.
--- tests/Makefile.am | 8 ++- tests/test-cow-extents1.sh | 102 +++++++++++++++++++++++++++++++++++++ tests/test-cow-extents2.sh | 78 ++++++++++++++++++++++++++++ 3 files changed, 187 insertions(+), 1 deletion(-) diff --git a/tests/Makefile.am b/tests/Makefile.am index 32021716..42e842b2 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -1376,11 +1376,17 @@ EXTRA_DIST += \ # cow filter test. if HAVE_MKE2FS_WITH_D -TESTS += test-cow.sh +TESTS += \ + test-cow.sh \ + test-cow-extents1.sh \ + test-cow-extents2.sh \ + $(NULL) endif TESTS += test-cow-null.sh EXTRA_DIST += \ test-cow.sh \ + test-cow-extents1.sh \ + test-cow-extents2.sh \ test-cow-null.sh \ $(NULL) diff --git a/tests/test-cow-extents1.sh b/tests/test-cow-extents1.sh new file mode 100755 index 00000000..87d7c231 --- /dev/null +++ b/tests/test-cow-extents1.sh @@ -0,0 +1,102 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2021 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -e +set -x + +requires_filter cow +requires_plugin file + +requires test -r /dev/urandom + +requires dd --version +requires nbdinfo --version +requires nbdsh --version +requires tr --version +requires truncate --version + +base=cow-extents1-base.img +pid=cow-extents1.pid +out=cow-extents1.out +sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX) +files="$base $pid $out $sock" +rm -f $files +cleanup_fn rm -f $files + +# Create a base file which is half allocated, half sparse. +dd if=/dev/urandom of=$base count=128 bs=1K +truncate -s 256K $base +lastmod="$(stat -c "%y" $base)" + +# Run nbdkit with a COW overlay. +start_nbdkit -P $pid -U $sock --filter=cow file $base +uri="nbd+unix:///?socket=$sock" + +# The map should reflect the base image. +nbdinfo --map "$uri" > $out +cat $out +if [ "$(tr -s ' ' < $out)" != " 0 131072 0 allocated + 131072 131072 3 hole,zero" ]; then + echo "$0: unexpected initial file map" + exit 1 +fi + +# Punch some holes. +nbdsh -u "$uri" \ + -c 'h.trim(4096, 4096)' \ + -c 'h.trim(4098, 16383)' \ + -c 'h.pwrite(b"1"*4096, 65536)' \ + -c 'h.trim(8192, 131072)' \ + -c 'h.pwrite(b"2"*8192, 196608)' + +# The extents map should be fully allocated. +nbdinfo --map "$uri" > $out +cat $out +if [ "$(tr -s ' ' < $out)" != " 0 4096 0 allocated + 4096 4096 3 hole,zero + 8192 8192 0 allocated + 16384 4096 3 hole,zero + 20480 110592 0 allocated + 131072 65536 3 hole,zero + 196608 8192 0 allocated + 204800 57344 3 hole,zero" ]; then + echo "$0: unexpected trimmed file map" + exit 1 +fi + +# The original file must not be modified. +currmod="$(stat -c "%y" $base)" +if [ "$lastmod" != "$currmod" ]; then + echo "$0: FAILED last modified time of base file changed" + exit 1 +fi diff --git a/tests/test-cow-extents2.sh b/tests/test-cow-extents2.sh new file mode 100755 index 00000000..c2ffd0ec --- /dev/null +++ b/tests/test-cow-extents2.sh @@ -0,0 +1,78 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2021 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +source ./functions.sh +set -e +set -x + +requires_filter cow +requires_plugin file +requires test -f disk +requires cmp --version +requires dd --version +requires guestfish --version +requires nbdcopy --version + +copy=cow-extents2.img +sparse=cow-extents2-sparse.img +pid=cow-extents2.pid +sock=$(mktemp -u /tmp/nbdkit-test-sock.XXXXXX) +files="$copy $sparse $pid $sock" +rm -f $files +cleanup_fn rm -f $files + +# Copy the original disk using 'dd' so it becomes fully allocated. +dd if=disk of=$copy + +# Run nbdkit with the cow filter on top. +start_nbdkit -P $pid -U $sock --filter=cow file $copy + +guestfish -a "nbd:///?socket=$sock" <<EOF + run +# Sparsify it into the overlay. + mount /dev/sda1 / + fstrim / + umount /dev/sda1 +# After remounting check we can list the files and read the hello.txt file. + mount /dev/sda1 / + find / + cat /hello.txt +EOF + +# Copy out the disk to a local file. +# This should exercise extents information. +nbdcopy "nbd+unix:///?socket=$sock" $sparse + +# We can only visually check that the file has been sparsified. We do +# not expect the files to be bit for bit identical after running +# fstrim. +ls -lsh disk $copy $sparse -- 2.29.0.rc2
Richard W.M. Jones
2021-Feb-02 15:21 UTC
[Libguestfs] [PATCH nbdkit 0/6] cow: Implement trimming and extents.
I pushed this series, but I made an optimization: In the patch implementing extents, I optimized it so that it asks for the largest ranges possible when querying the plugin. The reason for this is that the VDDK plugin (and probably others too) are very inefficient when you ask them repeatedly for small ranges of extents. Note this patch series is still somewhat experimental (or at least, it's very complex), so if there are any problems with nbdkit-cow-filter please let me know. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Eric Blake
2021-Feb-03 20:47 UTC
[Libguestfs] [PATCH nbdkit 1/6] filters: Add a function to read full extents from a plugin.
On 1/26/21 3:51 PM, Richard W.M. Jones wrote:> This convenience function makes it a bit simpler to read a full set of > extents covering a range from a plugin, especially for plugins which > don't reply with a full set of extents in a single call. > --- > docs/nbdkit-filter.pod | 22 +++++++++++++++++ > include/nbdkit-filter.h | 5 ++++ > server/extents.c | 54 +++++++++++++++++++++++++++++++++++++++++ > server/nbdkit.syms | 1 + > 4 files changed, 82 insertions(+) >> +++ b/server/extents.c> +struct nbdkit_extents * > +nbdkit_extents_full (struct nbdkit_next_ops *next_ops, nbdkit_backend *nxdata, > + uint32_t count, uint64_t offset, uint32_t flags, > + int *err)> + /* If the plugin is behaving we must make forward progress. */ > + assert (offset > old_offset); > + } > + > + return ret; > + > + error1: > + *err = errno; > + error0: > + if (ret) nbdkit_extents_free (ret);nbdkit_extents_free(NULL) is a no-op, do we need the 'if' here?> + return NULL; > +}-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2021-Feb-03 20:50 UTC
[Libguestfs] [PATCH nbdkit 2/6] checkwrite: Simplify trim/zero using nbdkit_extents_full.
On 1/26/21 3:51 PM, Richard W.M. Jones wrote:> --- > filters/checkwrite/checkwrite.c | 96 +++++++++++++-------------------- > 1 file changed, 38 insertions(+), 58 deletions(-) > > diff --git a/filters/checkwrite/checkwrite.c b/filters/checkwrite/checkwrite.c > index 68c57c37..2792d4d8 100644 > --- a/filters/checkwrite/checkwrite.c > +++ b/filters/checkwrite/checkwrite.c > @@ -150,67 +150,47 @@ checkwrite_trim_zero (struct nbdkit_next_ops *next_ops, void *nxdata, > { > /* If the plugin supports extents, speed this up by using them. */ > if (next_ops->can_extents (nxdata)) {> + size_t i, n; > + CLEANUP_EXTENTS_FREE struct nbdkit_extents *exts > + nbdkit_extents_full (next_ops, nxdata, count, offset, 0, err); > + if (exts == NULL) > + return -1;Should we instead behave like the entire region is data, instead of reflecting back the error? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Eric Blake
2021-Feb-03 21:32 UTC
[Libguestfs] [PATCH nbdkit 5/6] cow: Implement efficient trimming support.
On 1/26/21 3:51 PM, Richard W.M. Jones wrote:> By storing an extra bit in the overlay bitmap we can mark when whole > blocks have been trimmed. This allows us to match a feature of qcow2 > (zero clusters). > > In theory we could punch holes in the overlay to save a bit of disk > space. > --- > filters/cow/blk.h | 4 ++ > filters/cow/blk.c | 106 +++++++++++++++++++++++++++++++--------------- > filters/cow/cow.c | 75 +++++++++++++++++++++++++++++++- > 3 files changed, 149 insertions(+), 36 deletions(-)> +++ b/filters/cow/cow.c > @@ -150,7 +150,7 @@ cow_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle)> +/* Trim data. */ > +static int > +cow_trim (struct nbdkit_next_ops *next_ops, void *nxdata, > + void *handle, uint32_t count, uint64_t offset, uint32_t flags, > + int *err) > +{ > + CLEANUP_FREE uint8_t *block = NULL; > + uint64_t blknum, blkoffs; > + int r; > + > + if (!IS_ALIGNED (count | offset, BLKSIZE)) { > + block = malloc (BLKSIZE); > + if (block == NULL) { > + *err = errno; > + nbdkit_error ("malloc: %m"); > + return -1; > + } > + }Technically, we could leave the unallocated portions unchanged rather than forcing it to read as zero. Maybe it's worth a command-line option - some users want security (no stale data can leak) while others want speed (no time wasted on read-modify-write). Overall, the series looks good (and yes, I know you pushed before my review, which turned out fine). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org