Shreyas Khare
2018-Jul-23 18:38 UTC
[Libguestfs] [hivex PATCH] Re-allocating unused blocks before assigning new blocks
Hello Richard As discussed in the IRC channel, when merging a moderately large reg file (~35MB) to a hiv file (~118 MB); hivex generates a huge hiv file (~580 MB). These changes address that by creating a list of unallocated blocks and reassigning unused blocks. I used https://github.com/msuhanov/regf/blob/master/Windows%20registry%20file%20format%20specification.md as a reference for the structure of the hiv file (in addition to the source code itself) Attaching the patch file. Shreyas -- NOTICE OF CONFIDENTIALITY: At Rapid7, the privacy of our customers, partners, and employees is paramount. If you received this email in error, please notify the sender and delete it from your inbox right away. Learn how Rapid7 handles privacy at rapid7.com/privacy-policy <https://www.rapid7.com/privacy-policy/>. To opt-out of Rapid7 marketing emails, please click here <https://information.rapid7.com/manage-subscription.html> or email privacy@rapid7.com <mailto:mailto:privacy@rapid7.com>.
Richard W.M. Jones
2018-Jul-25 11:32 UTC
Re: [Libguestfs] [hivex PATCH] Re-allocating unused blocks before assigning new blocks
On Mon, Jul 23, 2018 at 02:38:18PM -0400, Shreyas Khare wrote:> Hello Richard > > As discussed in the IRC channel, when merging a moderately large reg > file (~35MB) to a hiv file (~118 MB); hivex generates a huge hiv > file (~580 MB). These changes address that by creating a list of > unallocated blocks and reassigning unused blocks. I used https://github.com/msuhanov/regf/blob/master/Windows%20registry%20file%20format%20specification.md > as a reference for the structure of the hiv file (in addition to the > source code itself) > > Attaching the patch file.Just as a general comment, for the patch to go upstream at all it will need to be reformatted so it fits with the existing code style. [...]> /* Allocate a single block, first allocating an hbin (page) at the end > * of the current file if necessary. NB. To keep the implementation > * simple and more likely to be correct, we do not reuse existing freeThis comment is left in place, but it's now obviously wrong. I'm a bit confused why we need a separate free list. Can't we just use the existing bitmap? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Shreyas Khare
2018-Jul-25 14:01 UTC
Re: [Libguestfs] [hivex PATCH] Re-allocating unused blocks before assigning new blocks
Hi Richard I've added a new patch. See some in-line comments. That said, I made a couple of more changes to my patch, re-using the free_blocks for doing all allocations (i.e. not using h->endblocks at all). On 2018-07-25 07:32 AM, Richard W.M. Jones wrote:> On Mon, Jul 23, 2018 at 02:38:18PM -0400, Shreyas Khare wrote: >> Hello Richard >> >> As discussed in the IRC channel, when merging a moderately large reg >> file (~35MB) to a hiv file (~118 MB); hivex generates a huge hiv >> file (~580 MB). These changes address that by creating a list of >> unallocated blocks and reassigning unused blocks. I used https://github.com/msuhanov/regf/blob/master/Windows%20registry%20file%20format%20specification.md >> as a reference for the structure of the hiv file (in addition to the >> source code itself) >> >> Attaching the patch file. > Just as a general comment, for the patch to go upstream at all it will > need to be reformatted so it fits with the existing code style. > > [...]Will do.>> /* Allocate a single block, first allocating an hbin (page) at the end >> * of the current file if necessary. NB. To keep the implementation >> * simple and more likely to be correct, we do not reuse existing free > This comment is left in place, but it's now obviously wrong.Fixed.> I'm a bit confused why we need a separate free list. Can't we just > use the existing bitmap?I tried to use the existing bitmap, but wasn't successful initially with the 20 byte hbin headers on every page.> > Rich. >-- NOTICE OF CONFIDENTIALITY: At Rapid7, the privacy of our customers, partners, and employees is paramount. If you received this email in error, please notify the sender and delete it from your inbox right away. Learn how Rapid7 handles privacy at rapid7.com/privacy-policy <https://www.rapid7.com/privacy-policy/>. To opt-out of Rapid7 marketing emails, please click here <https://information.rapid7.com/manage-subscription.html> or email privacy@rapid7.com <mailto:mailto:privacy@rapid7.com>.
Richard W.M. Jones
2019-Jan-03 11:06 UTC
Re: [Libguestfs] [hivex PATCH] Re-allocating unused blocks before assigning new blocks
On Mon, Jul 23, 2018 at 02:38:18PM -0400, Shreyas Khare wrote:> >From 4a9f9b534ebd18254a07ce1de22c5de570cfa2f8 Mon Sep 17 00:00:00 2001 > From: Shreyas Khare <shreyas_khare@rapid7.com> > Date: Mon, 23 Jul 2018 14:28:29 -0400 > Subject: [PATCH] Re-allocating unused blocks before assigning new blocksIt would be good if you could post this next time using "git send-email" so that I can apply the patch with the commit message, date etc using git am. Otherwise I have to use patch and fake the commit message.> --- > lib/hivex-internal.h | 8 +++ > lib/write.c | 146 +++++++++++++++++++++++++++++++++++++------ > 2 files changed, 134 insertions(+), 20 deletions(-) > > diff --git a/lib/hivex-internal.h b/lib/hivex-internal.h > index a22e732..e7f5e92 100644 > --- a/lib/hivex-internal.h > +++ b/lib/hivex-internal.h > @@ -45,6 +45,13 @@ typedef enum { > nr_recode_types, > } recode_type; > > +typedef struct unallocated_block { > + size_t offset; > + size_t block_size; > + struct unallocated_block *next; > +} unallocated_block;This needs to be indented consistently with the other code. This also applies to the rest of the code.> struct hive_h { > char *filename; > int fd; > @@ -52,6 +59,7 @@ struct hive_h { > int msglvl; /* 1 = verbose, 2 or 3 = debug */ > int writable; > int unsafe; > + unallocated_block *free_blocks; > > /* Registry file, memory mapped if read-only, or malloc'd if writing. */ > union { > diff --git a/lib/write.c b/lib/write.c > index 70105c9..b72e2c8 100644 > --- a/lib/write.c > +++ b/lib/write.c > @@ -126,6 +126,97 @@ allocate_page (hive_h *h, size_t allocation_hint) > return offset + sizeof (struct ntreg_hbin_page); > } > > +/* > + * Get the first available unused block from free_blocks, using first-fit algorithm. > + * NB: Attempted to use the best-fit algorithm. This does not work too well because > + * allocating a block in the smallest possible available block causes fragmentation. > + */This needs to be wrapped so that lines are not longer than about 70-79 characters. Also make the comments consistent with existing code, so don't start with an empty line after /* .> +static size_t > +get_unused_block(hive_h *h, size_t seglen) > +{ > + unallocated_block *before = NULL, *found=NULL;Space around second "=".> + for (unallocated_block *current = h->free_blocks; current != NULL ; current = current->next) {Extra space before the second ";" and the line is too long.> + if (current->block_size >= seglen) { > + found = current; > + break; > + } > + before = current; > + } > + if (found) { > + size_t offset = found->offset; > + size_t unused = found->block_size - seglen;It looks like tabs and spaces are used inconsistently here. However if you reindent the code that should be fixed automatically.> + if (unused > 0) { > + // Need to write unused memory to hive (to preserve hiv integrity)hiv -> hive> + struct ntreg_hbin_block *blockhdr = (struct ntreg_hbin_block *) ((char *) h->addr + found->offset + seglen); > + blockhdr->seg_len = htole32 ((int32_t) unused); > + // Lets move and resize the found block > + found->offset = found->offset+seglen; > + found->block_size = unused;I don't think I understand what this block of code is doing.> + } else { > + //Full buffer was usedSpace after //> + if (before) { > + before->next = found->next; > + } else { > + h->free_blocks = found->next; > + } > + free(found);Space after free.> + } > + return offset; > + } else return -1;Probably more consistent to add an extra line break after }> +} > + > +/* > + * Add freed block to free_blocks. Block size is determined by size at the actual offset. > + * This list is ordered by offset. If the free'd block is preceded or succeded > + * by another free block, they are merged. > + */ > +static void > +set_unused_block(hive_h *h, size_t offset) > +{ > + size_t seg_len = block_len (h, offset, NULL); > + unallocated_block *current = malloc(sizeof(unallocated_block)); > + current->offset = offset; > + current->block_size = seg_len; > + current->next = NULL; > + > + unallocated_block *before = NULL, *after = NULL; > + // There are no free blocks > + if (h->free_blocks == NULL) { > + h->free_blocks = current; > + return; > + } > + > + // Find the first block _after_ this one > + for (after = h->free_blocks; after != NULL; after = after->next) { > + if (after->offset > current->offset) { > + // Can we merge with the next block? > + if (current->offset + current->block_size == after->offset) { > + current->block_size += after->block_size; > + current->next = after->next; > + free(after); > + } else { > + current->next = after; > + } > + break; > + } > + before = after; > + } > + // This was the block before this one > + if (before) { > + // Can we merge with the previous block? > + if (before->offset + before->block_size == current->offset) { > + before->block_size += current->block_size; > + before->next = current->next; > + free(current); //discard current block, lets reuse before block > + } else { > + before->next = current; > + } > + } else { > + h->free_blocks = current; > + } > +} > + > + > /* Allocate a single block, first allocating an hbin (page) at the end > * of the current file if necessary. NB. To keep the implementation > * simple and more likely to be correct, we do not reuse existing free > @@ -148,6 +239,9 @@ allocate_page (hive_h *h, size_t allocation_hint) > static size_t > allocate_block (hive_h *h, size_t seg_len, const char id[2]) > { > + bool new_block = true; > + size_t offset; > + > CHECK_WRITABLE (0); > > if (seg_len < 4) { > @@ -170,17 +264,26 @@ allocate_block (hive_h *h, size_t seg_len, const char id[2]) > */ > seg_len = (seg_len + 7) & ~7; > > - /* Allocate a new page if necessary. */ > - if (h->endblocks == 0 || h->endblocks + seg_len > h->endpages) { > - size_t newendblocks = allocate_page (h, seg_len); > - if (newendblocks == 0) > - return 0; > - h->endblocks = newendblocks; > + // Attempt to find a free block before getting a new block > + offset = get_unused_block(h, seg_len); > + if (offset != -1) { > + new_block = false; > + DEBUG(2, "Re-use offset found at 0x%zx, size %zu", offset, seg_len); > } > > - size_t offset = h->endblocks; > + if (new_block) { > + /* Allocate a new page if necessary. */ > + if (h->endblocks == 0 || h->endblocks + seg_len > h->endpages) { > + if(h->endblocks != 0) set_unused_block(h, h->endblocks); > + size_t newendblocks = allocate_page (h, seg_len); > + if (newendblocks == 0) > + return 0; > + h->endblocks = newendblocks; > + } > > - DEBUG (2, "new block at 0x%zx, size %zu", offset, seg_len); > + offset = h->endblocks; > + DEBUG (2, "new block at 0x%zx, size %zu", offset, seg_len); > + } > > struct ntreg_hbin_block *blockhdr > (struct ntreg_hbin_block *) ((char *) h->addr + offset); > @@ -195,21 +298,23 @@ allocate_block (hive_h *h, size_t seg_len, const char id[2]) > > BITMAP_SET (h->bitmap, offset); > > - h->endblocks += seg_len; > + if (new_block) { > + h->endblocks += seg_len; > > - /* If there is space after the last block in the last page, then we > - * have to put a dummy free block header here to mark the rest of > - * the page as free. > - */ > - ssize_t rem = h->endpages - h->endblocks; > - if (rem > 0) { > - DEBUG (2, "marking remainder of page free" > - " starting at 0x%zx, size %zd", h->endblocks, rem); > + /* If there is space after the last block in the last page, then we > + * have to put a dummy free block header here to mark the rest of > + * the page as free. > + */ > + ssize_t rem = h->endpages - h->endblocks; > + if (rem > 0) { > + DEBUG (2, "marking remainder of page free" > + " starting at 0x%zx, size %zd", h->endblocks, rem); > > - assert (rem >= 4); > + assert (rem >= 4); > > - blockhdr = (struct ntreg_hbin_block *) ((char *) h->addr + h->endblocks); > - blockhdr->seg_len = htole32 ((int32_t) rem); > + blockhdr = (struct ntreg_hbin_block *) ((char *) h->addr + h->endblocks); > + blockhdr->seg_len = htole32 ((int32_t) rem); > + }Seems like the patch contains unnecessary whitespace changes. Can you post a reformatted patch, then I can do a review of the code. Rich.> } > > return offset; > @@ -236,6 +341,7 @@ mark_block_unused (hive_h *h, size_t offset) > size_t seg_len = block_len (h, offset, NULL); > blockhdr->seg_len = htole32 (seg_len); > > + set_unused_block(h, offset); > BITMAP_CLR (h->bitmap, offset); > } > > -- > 2.17.0 >> _______________________________________________ > Libguestfs mailing list > Libguestfs@redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Possibly Parallel Threads
- [hivex PATCH] Re-allocating unused blocks before assigning new blocks
- Re: [libhivex] Undefined behavior when accessing invalid (too small) registry hives
- Re: [libhivex] Undefined behavior when accessing invalid (too small) registry hives
- [PATCH v3 0/2] hivex: handle corrupted hives better
- [PATCH 0/2] hivex: handle corrupted hives better