thr3ads.net - Libguestfs - [Libguestfs] [PATCH nbdkit 3/4] Add map filter. [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Eric Blake

2018-Jul-31 22:22 UTC

Re: [Libguestfs] [PATCH nbdkit 3/4] Add map filter.

On 07/31/2018 02:55 PM, Richard W.M. Jones wrote:> Serve an arbitrary map of regions of the underlying plugin.
> ---
> +#define map_config_help \
> +  "map=<FILENAME>      (required) Map file."
Maybe worth a "see man page for format"?

I have not closely read the code; at this point, it is just a quick 
review on the documentation/comments.
> +/* Notes on the implementation.
> + *
> + * Throughout the filter we use the following terminology:
> + *
> + * request / requested etc: The client requested range of bytes to
> + * read or update.
> + *
> + * plugin: The target after the client request is mapped.  This is
> + * what is passed along to the underlying plugin (or next filter in
> + * the chain).
> + *
> + * mappings: Single entries (lines) in the map file.  They are of the
> + * form (plugin, request), ie. the mapping is done backwards.
> + *
> + * interval: start-end or (start, length).
> + *
> + * Only one mapping can apply to each requested byte.  This fact is
Maybe: Only the final applicable mapping can apply to each requested byte?
> + * crucial as it allows us to store the mappings in a simple array
> + * with no overlapping intervals, and use an efficient binary search
> + * to map incoming requests to the plugin.
> + *
> + * When we read the map file we start with an empty array and add the
> + * intervals to it.  At all times we must maintain the invariant that
> + * no intervals in the array may overlap, and therefore we have to
> + * split existing intervals as required.  Earlier mappings are
> + * discarded where they overlap with later mappings.
since this paragraph is all about maintaining the invariant in favor of 
each subsequent mapping, and thus the file format itself permits 
overlaps for shorthands for special-casing subregions.

Is there a syntax for explicitly mentioning a subset is unmapped even 
after a larger mapping is applied first (perhaps useful for redacting a 
portion of a disk containing sensitive information)?
> +static int
> +insert_mapping (struct map *map, const struct mapping *new_mapping)
> +{
> +  size_t i;
> +
> +  /* Adjust existing mappings if they overlap with this mapping. */
> +  for (i = 0; i < map->nr_map; ++i) {
> +    if (mappings_overlap (&map->map[i], new_mapping)) {
> +      /* The four cases are:
> +       *
> +       * existing         +---+
> +       * new        +-------------------+
> +       *                       => erase existing mapping
> +       *
> +       * existing  +-------------------+
> +       * new            +---+
> +       *                       => split existing mapping into two
should that be 'two/three'?
> +       *
> +       * existing          +-----------+
> +       * new            +-----+
> +       *                       => adjust start of existing mapping
or is it really a case that you first split into two, then adjust one of 
the two
> +       *
> +       * existing  +-----------+
> +       * new                +-----+
> +       *                       => adjust end of existing mapping
> +       */
> +++ b/filters/map/nbdkit-map-filter.pod
> @@ -0,0 +1,173 @@
> +=head1 NAME
> +
> +nbdkit-map-filter - nbdkit map filter
> +
> +=head1 SYNOPSIS
> +
> + nbdkit --filter=map plugin map=FILENAME [plugin-args...]
> +
> +=head1 DESCRIPTION
> +
> +C<nbdkit-map-filter> is a filter that can serve an arbitrary map of
> +regions of the underlying plugin.
> +
> +It is driven by a map file that contains a list of regions from the
> +plugin and where they should be served in the output.
> +
> +For example this map would divide the plugin data into two 16K halves
> +and swap them over:
> +
> + # map file
> + 0,16K   16K   # aaaaa
> + 16K,16K 0     # bbbbb
> +
> +When visualised, this map file looks like:
> +
> +                   ┌──────────────┬──────────────┬─── ─ ─ ─
> + Plugin serves ... │ aaaaaaaaaaaa │ bbbbbbbbbbbb │ (extra data)
> +                   │    16K       │    16K       │
> +                   └──────────────┴──────────────┴─── ─ ─ ─
> +                         │              │
> + Filter                  │    ┌─────────┘
> + transforms ...          └──────────────┐
> +                              │         │
> +                   ┌──────────▼───┬─────▼────────┐
> + Client sees ...   │ bbbbbbbbbbbb │ aaaaaaaaaaaa │
> +                   └──────────────┴──────────────┘
> +
> +This is how to simulate L<nbdkit-offset-filter(1)> C<offset>
and
> +C<range> parameters:
> +
> + # offset,range
> + 1M,32M         0
> +
> +                   ┌─────┬─────────────────────┬─── ─ ─ ─
> + Plugin serves ... │     │ ccccccccccccccccccc │ (extra data)
> +                   │ 1M  │        32M          │
> +                   └─────┴─────────────────────┴─── ─ ─ ─
> + Filter                            │
> + transforms ...              ┌─────┘
> +                             │
> +                   ┌─────────▼───────────┐
> + Client sees ...   │ ccccccccccccccccccc │
> +                   └─────────────────────┘
> +
> +You can also do obscure things like duplicating regions of the source:
> +
> + # map file
> + 0,16K  0
> + 0,16K  16K
> +
> +                   ┌──────────────┬─── ─ ─ ─
> + Plugin serves ... │ aaaaaaaaaaaa │ (extra data)
> +                   │    16K       │
> +                   └──────────────┴─── ─ ─ ─
> + Filter                  │
> + transforms ...          └───┬──────────┐
> +                             │          │
> +                   ┌─────────▼────┬─────▼────────┐
> + Client sees ...   │ aaaaaaaaaaaa │ aaaaaaaaaaaa │
> +                   └──────────────┴──────────────┘
> +
When duplicating things, do we want to document that a single 
transaction is carried out in the order seen by the client (where 
aliases at later bytes overwrite any data written into the earlier alias 
in a long transaction), or do we want to put in hedge wording that (in 
the future) a request might be split into smaller regions that get 
operated on in parallel (thereby making the end contents indeterminate 
when writing to two aliases of the same byte in one transaction)?

> +=head2 C<start-end>
> +
> + start-end     offset
> +
> +means that the source region starting at byte C<start> through to
byte
> +C<end> (inclusive) is mapped to C<offset> through to
> +C<offset+(end-start)> in the output.
> +
> +For example:
> +
> + 1024-2047     2048
> +
> +maps the region starting at byte 1024 and ending at byte 2047
> +(inclusive) to bytes 2048-3071 in the output.
Since you already support '2k', '2m' and such as shorthands for
the
start, is it worth creating a convenient shorthand for expressing '3M-1'
for an end rather than having to write out 3145727?

> +
> +=head2 C<start> to end of plugin
> +
> + start-        offset
> + start         offset
> +
> +If the C<end> field is omitted it means "up to the end of the
> +underlying plugin".
> +
> +=head2 Size modifiers
> +
> +You can use the usual power-of-2 size modifiers like C<K>,
C<M> etc.
> +
> +=head2 Overlapping mappings
> +
> +If there are multiple mappings in the map file that may apply to a
> +particular byte of the filter output then it is the last one in the
> +file which applies.
> +
> +=head2 Virtual size
> +
> +The virtual size of the filter output finishes at the last byte of the
> +final mapped region.  Note this is usually different from the size of
> +the underlying plugin.
Is there a syntax for explicitly adding an unmapped tail, to make the 
filter's output longer than the underlying plugin's size?
> +
> +=head2 Unmapped regions
> +
> +Any unmapped region (followed by a mapped region and therefore not
> +beyond the virtual size) reads as zero and returns an error if
> +written.
> +
> +Any mapping or part of a mapping where the source region refers beyond
> +the end of the underlying plugin reads as zero and returns an error if
> +written.
Ah, so using the '(start,length) offset' entry does allow for an 
explicit unmapped tail at the end of the underlying plugin.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Richard W.M. Jones

2018-Aug-01 10:38 UTC

head link

[Libguestfs] [PATCH nbdkit 3/4] Add map filter.

On Tue, Jul 31, 2018 at 05:22:32PM -0500, Eric Blake
wrote:> Is there a syntax for explicitly mentioning a subset is unmapped
> even after a larger mapping is applied first (perhaps useful for
> redacting a portion of a disk containing sensitive information)?
It's a good idea for a TODO item, so I'll add it there.  At the moment
it's possible to express this in the map file, but only by positively
listing the regions you want to be mapped, not by negatively listing
regions you want unmapped.
> >+static int
> >+insert_mapping (struct map *map, const struct mapping *new_mapping)
> >+{
> >+  size_t i;
> >+
> >+  /* Adjust existing mappings if they overlap with this mapping. */
> >+  for (i = 0; i < map->nr_map; ++i) {
> >+    if (mappings_overlap (&map->map[i], new_mapping)) {
> >+      /* The four cases are:
> >+       *
> >+       * existing         +---+
> >+       * new        +-------------------+
> >+       *                       => erase existing mapping
> >+       *
> >+       * existing  +-------------------+
> >+       * new            +---+
> >+       *                       => split existing mapping into two
> 
> should that be 'two/three'?
The existing mapping is split into just two pieces, I think?  The
middle bit, hidden by the new mapping, gets discarded.
> >+       *
> >+       * existing          +-----------+
> >+       * new            +-----+
> >+       *                       => adjust start of existing mapping
> 
> or is it really a case that you first split into two, then adjust
> one of the two
I think the comment is correct, unless I'm misunderstanding what you
mean.  Note that where the new mapping overlaps the existing mapping,
the existing mapping is discarded (to maintain the invariant).
> >+You can also do obscure things like duplicating regions of the source:
> >+
> >+ # map file
> >+ 0,16K  0
> >+ 0,16K  16K
> >+
> >+                   ??????????????????? ? ? ?
> >+ Plugin serves ... ? aaaaaaaaaaaa ? (extra data)
> >+                   ?    16K       ?
> >+                   ??????????????????? ? ? ?
> >+ Filter                  ?
> >+ transforms ...          ????????????????
> >+                             ?          ?
> >+                   ???????????????????????????????
> >+ Client sees ...   ? aaaaaaaaaaaa ? aaaaaaaaaaaa ?
> >+                   ???????????????????????????????
> >+
> 
> When duplicating things, do we want to document that a single
> transaction is carried out in the order seen by the client (where
> aliases at later bytes overwrite any data written into the earlier
> alias in a long transaction), or do we want to put in hedge wording
> that (in the future) a request might be split into smaller regions
> that get operated on in parallel (thereby making the end contents
> indeterminate when writing to two aliases of the same byte in one
> transaction)?
I think I'd rather leave it unspecified.  I'll add some caveat text to
the documentation.
> 
> >+=head2 C<start-end>
> >+
> >+ start-end     offset
> >+
> >+means that the source region starting at byte C<start> through
to byte
> >+C<end> (inclusive) is mapped to C<offset> through to
> >+C<offset+(end-start)> in the output.
> >+
> >+For example:
> >+
> >+ 1024-2047     2048
> >+
> >+maps the region starting at byte 1024 and ending at byte 2047
> >+(inclusive) to bytes 2048-3071 in the output.
> 
> Since you already support '2k', '2m' and such as shorthands
for the
> start, is it worth creating a convenient shorthand for expressing
> '3M-1' for an end rather than having to write out 3145727?
Yeah I thought about that.  Unfortunately you quickly get into
needing to write a parser.  (Hey, what about "2^20-1"?!)
> >+
> >+=head2 C<start> to end of plugin
> >+
> >+ start-        offset
> >+ start         offset
> >+
> >+If the C<end> field is omitted it means "up to the end of
the
> >+underlying plugin".
> >+
> >+=head2 Size modifiers
> >+
> >+You can use the usual power-of-2 size modifiers like C<K>,
C<M> etc.
> >+
> >+=head2 Overlapping mappings
> >+
> >+If there are multiple mappings in the map file that may apply to a
> >+particular byte of the filter output then it is the last one in the
> >+file which applies.
> >+
> >+=head2 Virtual size
> >+
> >+The virtual size of the filter output finishes at the last byte of the
> >+final mapped region.  Note this is usually different from the size of
> >+the underlying plugin.
> 
> Is there a syntax for explicitly adding an unmapped tail, to make
> the filter's output longer than the underlying plugin's size?
In the later version of this filter, I documented that you can use the
truncate filter to do this.

As you've probably seen from the commit date, I've been working on the
map filter for nearly a month now.  It has been a frustrating
exercise!  I had it all working yesterday (including writes) and for
some reason this morning some change I have made has completely broken
it again :-(

Thanks for the feedback,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Eric Blake

2018-Aug-01 14:04 UTC

head link

Re: [Libguestfs] [PATCH nbdkit 3/4] Add map filter.

On 08/01/2018 05:38 AM, Richard W.M. Jones wrote:> On Tue, Jul 31, 2018 at 05:22:32PM -0500, Eric Blake wrote:
>> Is there a syntax for explicitly mentioning a subset is unmapped
>> even after a larger mapping is applied first (perhaps useful for
>> redacting a portion of a disk containing sensitive information)?
> 
> It's a good idea for a TODO item, so I'll add it there.  At the
moment
> it's possible to express this in the map file, but only by positively
> listing the regions you want to be mapped, not by negatively listing
> regions you want unmapped.
> 
>>> +       * existing  +-------------------+
>>> +       * new            +---+
>>> +       *                       => split existing mapping into
two
>>
>> should that be 'two/three'?
> 
> The existing mapping is split into just two pieces, I think?  The
> middle bit, hidden by the new mapping, gets discarded.
> 
>>> +       *
>>> +       * existing          +-----------+
>>> +       * new            +-----+
>>> +       *                       => adjust start of existing
mapping
>>
>> or is it really a case that you first split into two, then adjust
>> one of the two
> 
> I think the comment is correct, unless I'm misunderstanding what you
> mean.  Note that where the new mapping overlaps the existing mapping,
> the existing mapping is discarded (to maintain the invariant).
I was thinking that 'one entry in the list becomes either two or three 
entries', rather than your 'one existing entry is split into two 
entries, then one of those two entries is further adjusted and/or 
replaced by the new entry being added'. Up to you how you word it, the 
list management itself makes sense.
>>
>> When duplicating things, do we want to document that a single
>> transaction is carried out in the order seen by the client (where
>> aliases at later bytes overwrite any data written into the earlier
>> alias in a long transaction), or do we want to put in hedge wording
>> that (in the future) a request might be split into smaller regions
>> that get operated on in parallel (thereby making the end contents
>> indeterminate when writing to two aliases of the same byte in one
>> transaction)?
> 
> I think I'd rather leave it unspecified.  I'll add some caveat text
to
> the documentation.
Always better to be fuzzy first, where we can tighten later if needed ;)
>>> +maps the region starting at byte 1024 and ending at byte 2047
>>> +(inclusive) to bytes 2048-3071 in the output.
>>
>> Since you already support '2k', '2m' and such as
shorthands for the
>> start, is it worth creating a convenient shorthand for expressing
>> '3M-1' for an end rather than having to write out 3145727?
> 
> Yeah I thought about that.  Unfortunately you quickly get into
> needing to write a parser.  (Hey, what about "2^20-1"?!)
and to some extent, "(2m, 1m)" solves that a bit nicer than
"2m-3145727"
(that is, start/length is easier to use round numbers than start/end).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Maybe Matching Threads

Search for more apparently analagous threads

Libguestfs - Aug 2018 - Re: [PATCH nbdkit 3/4] Add map filter.

Re: [Libguestfs] [PATCH nbdkit 3/4] Add map filter.

[Libguestfs] [PATCH nbdkit 3/4] Add map filter.

Re: [Libguestfs] [PATCH nbdkit 3/4] Add map filter.

Maybe Matching Threads