Patch ready for merging. v4: - check return code of tsk_fs_attr_walk - pass TSK_FS_FILE_WALK_FLAG_NOSPARSE as additional flag to tsk_fs_attr_walk After discussing with TSK authors the behaviour is clear. [1] In case of COMPRESSED blocks, the callback will be called for all the attributes no matter whether they are on disk or not (sparse). In such cases, the block address will be 0. [2] So we do not have to enforce the blocks to be RAW as we would be missing COMPRESSED ones (NTFS only). [1] https://github.com/sleuthkit/sleuthkit/pull/721 [2] http://www.sleuthkit.org/sleuthkit/docs/api-docs/4.2/group__fslib.html#ga3ce8349107b00e1b1502c86a5d6c0727 Matteo Cafasso (3): New API: internal_find_block New API: find_block find_block: added API tests daemon/tsk.c | 96 ++++++++++++++++++++++++++++++++++++++++++++ generator/actions.ml | 25 ++++++++++++ src/MAX_PROC_NR | 2 +- src/tsk.c | 17 ++++++++ tests/tsk/Makefile.am | 1 + tests/tsk/test-find-block.sh | 66 ++++++++++++++++++++++++++++++ 6 files changed, 206 insertions(+), 1 deletion(-) create mode 100755 tests/tsk/test-find-block.sh -- 2.9.3
Matteo Cafasso
2016-Oct-08 15:27 UTC
[Libguestfs] [PATCH v4 1/3] New API: internal_find_block
The internal_find_block command searches all entries referring to the given filesystem data block and returns a tsk_dirent structure for each of them. For filesystems such as NTFS which do not delete the block mapping when removing files, it is possible to get multiple non-allocated entries for the same block. The gathered list of tsk_dirent structs is serialised into XDR format and written to a file by the appliance. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com> --- daemon/tsk.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++ generator/actions.ml | 9 +++++ src/MAX_PROC_NR | 2 +- 3 files changed, 106 insertions(+), 1 deletion(-) diff --git a/daemon/tsk.c b/daemon/tsk.c index af803d7..afe2ed4 100644 --- a/daemon/tsk.c +++ b/daemon/tsk.c @@ -20,6 +20,7 @@ #include <stdio.h> #include <stdlib.h> +#include <stdbool.h> #include <inttypes.h> #include <string.h> #include <unistd.h> @@ -42,9 +43,16 @@ enum tsk_dirent_flags { DIRENT_COMPRESSED = 0x04 }; +typedef struct { + bool found; + uint64_t block; +} findblk_data; + static int open_filesystem (const char *, TSK_IMG_INFO **, TSK_FS_INFO **); static TSK_WALK_RET_ENUM fswalk_callback (TSK_FS_FILE *, const char *, void *); static TSK_WALK_RET_ENUM findino_callback (TSK_FS_FILE *, const char *, void *); +static TSK_WALK_RET_ENUM findblk_callback (TSK_FS_FILE *, const char *, void *); +static TSK_WALK_RET_ENUM attrwalk_callback (TSK_FS_FILE *, TSK_OFF_T , TSK_DADDR_T , char *, size_t , TSK_FS_BLOCK_FLAG_ENUM , void *); static int send_dirent_info (TSK_FS_FILE *, const char *); static char file_type (TSK_FS_FILE *); static int file_flags (TSK_FS_FILE *fsfile); @@ -109,6 +117,35 @@ do_internal_find_inode (const mountable_t *mountable, int64_t inode) return ret; } +int +do_internal_find_block (const mountable_t *mountable, int64_t block) +{ + int ret = -1; + TSK_FS_INFO *fs = NULL; + TSK_IMG_INFO *img = NULL; /* Used internally by tsk_fs_dir_walk */ + const int flags + TSK_FS_DIR_WALK_FLAG_ALLOC | TSK_FS_DIR_WALK_FLAG_UNALLOC | + TSK_FS_DIR_WALK_FLAG_RECURSE | TSK_FS_DIR_WALK_FLAG_NOORPHAN; + + ret = open_filesystem (mountable->device, &img, &fs); + if (ret < 0) + return ret; + + reply (NULL, NULL); /* Reply message. */ + + ret = tsk_fs_dir_walk (fs, fs->root_inum, flags, + findblk_callback, (void *) &block); + if (ret == 0) + ret = send_file_end (0); /* File transfer end. */ + else + send_file_end (1); /* Cancel file transfer. */ + + fs->close (fs); + img->close (img); + + return ret; +} + /* Inspect the device and initialises the img and fs structures. * Return 0 on success, -1 on error. */ @@ -172,6 +209,65 @@ findino_callback (TSK_FS_FILE *fsfile, const char *path, void *data) return (ret == 0) ? TSK_WALK_CONT : TSK_WALK_ERROR; } +/* Find block, it gets called on every FS node. + * + * Return TSK_WALK_CONT on success, TSK_WALK_ERROR on error. + */ +static TSK_WALK_RET_ENUM +findblk_callback (TSK_FS_FILE *fsfile, const char *path, void *data) +{ + findblk_data blkdata; + const TSK_FS_ATTR *fsattr = NULL; + int ret = 0, count = 0, index = 0; + const int flags = TSK_FS_FILE_WALK_FLAG_AONLY | TSK_FS_FILE_WALK_FLAG_SLACK | + TSK_FS_FILE_WALK_FLAG_NOSPARSE; + + if (entry_is_dot (fsfile)) + return TSK_WALK_CONT; + + blkdata.found = false; + blkdata.block = * (uint64_t *) data; + + /* Retrieve block list */ + count = tsk_fs_file_attr_getsize (fsfile); + + for (index = 0; index < count; index++) { + fsattr = tsk_fs_file_attr_get_idx (fsfile, index); + + if (fsattr != NULL && fsattr->flags & TSK_FS_ATTR_NONRES) { + ret = tsk_fs_attr_walk (fsattr, flags, + attrwalk_callback, (void *) &blkdata); + if (ret != 0) + return TSK_WALK_ERROR; + } + } + + if (blkdata.found) + ret = send_dirent_info (fsfile, path); + + return (ret == 0) ? TSK_WALK_CONT : TSK_WALK_ERROR; +} + +/* Attribute walk, searches the given block within the FS node attributes. + * + * Return TSK_WALK_CONT on success, TSK_WALK_ERROR on error. + */ +static TSK_WALK_RET_ENUM +attrwalk_callback (TSK_FS_FILE *fsfile, TSK_OFF_T offset, + TSK_DADDR_T blkaddr, char *buf, size_t size, + TSK_FS_BLOCK_FLAG_ENUM flags, void *data) +{ + findblk_data *blkdata = (findblk_data *) data; + + if (blkaddr == blkdata->block) { + blkdata->found = true; + + return TSK_WALK_STOP; + } + + return TSK_WALK_CONT; +} + /* Extract the information from the entry, serialize and send it out. * Return 0 on success, -1 on error. */ diff --git a/generator/actions.ml b/generator/actions.ml index 91a1819..b38a30f 100644 --- a/generator/actions.ml +++ b/generator/actions.ml @@ -13253,6 +13253,15 @@ is removed." }; shortdesc = "search the entries associated to the given inode"; longdesc = "Internal function for find_inode." }; + { defaults with + name = "internal_find_block"; added = (1, 35, 6); + style = RErr, [Mountable "device"; Int64 "block"; FileOut "filename";], []; + proc_nr = Some 471; + visibility = VInternal; + optional = Some "libtsk"; + shortdesc = "search the entries associated to the given block"; + longdesc = "Internal function for find_block." }; + ] (* Non-API meta-commands available only in guestfish. diff --git a/src/MAX_PROC_NR b/src/MAX_PROC_NR index 5f476b6..c305aa5 100644 --- a/src/MAX_PROC_NR +++ b/src/MAX_PROC_NR @@ -1 +1 @@ -470 +471 -- 2.9.3
Library's counterpart of the daemon's internal_find_block command. It writes the daemon's command output on a temporary file and parses it, deserialising the XDR formatted tsk_dirent structs. It returns to the caller the list of tsk_dirent structs generated by the internal_find_block command. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com> --- generator/actions.ml | 16 ++++++++++++++++ src/tsk.c | 17 +++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/generator/actions.ml b/generator/actions.ml index b38a30f..8947551 100644 --- a/generator/actions.ml +++ b/generator/actions.ml @@ -3729,6 +3729,22 @@ Searches all the entries associated with the given inode. For each entry, a C<tsk_dirent> structure is returned. See C<filesystem_walk> for more information about C<tsk_dirent> structures." }; + { defaults with + name = "find_block"; added = (1, 35, 6); + style = RStructList ("dirents", "tsk_dirent"), [Mountable "device"; Int64 "block";], []; + optional = Some "libtsk"; + progress = true; cancellable = true; + shortdesc = "search the entries referring to the given data block"; + longdesc = "\ +Searches all the entries referring to the given data block. + +Certain filesystems preserve the block mapping when deleting a file. +Therefore, it is possible to see multiple deleted files referring +to the same block. + +For each entry, a C<tsk_dirent> structure is returned. +See C<filesystem_walk> for more information about C<tsk_dirent> structures." }; + ] (* daemon_functions are any functions which cause some action diff --git a/src/tsk.c b/src/tsk.c index 1def9c9..7db6f71 100644 --- a/src/tsk.c +++ b/src/tsk.c @@ -72,6 +72,23 @@ guestfs_impl_find_inode (guestfs_h *g, const char *mountable, int64_t inode) return parse_dirent_file (g, tmpfile); /* caller frees */ } +struct guestfs_tsk_dirent_list * +guestfs_impl_find_block (guestfs_h *g, const char *mountable, int64_t block) +{ + int ret = 0; + CLEANUP_UNLINK_FREE char *tmpfile = NULL; + + tmpfile = make_temp_file (g, "find_block"); + if (tmpfile == NULL) + return NULL; + + ret = guestfs_internal_find_block (g, mountable, block, tmpfile); + if (ret < 0) + return NULL; + + return parse_dirent_file (g, tmpfile); /* caller frees */ +} + /* Parse the file content and return dirents list. * Return a list of tsk_dirent on success, NULL on error. */ -- 2.9.3
Matteo Cafasso
2016-Oct-08 15:27 UTC
[Libguestfs] [PATCH v4 3/3] find_block: added API tests
NTFS file system always has the Boot file at block 0. This reliable information helps testing the API. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com> --- tests/tsk/Makefile.am | 1 + tests/tsk/test-find-block.sh | 66 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100755 tests/tsk/test-find-block.sh diff --git a/tests/tsk/Makefile.am b/tests/tsk/Makefile.am index 07c74f9..44a893e 100644 --- a/tests/tsk/Makefile.am +++ b/tests/tsk/Makefile.am @@ -21,6 +21,7 @@ TESTS = \ test-download-inode.sh \ test-download-blocks.sh \ test-filesystem-walk.sh \ + test-find-block.sh \ test-find-inode.sh TESTS_ENVIRONMENT = $(top_builddir)/run --test diff --git a/tests/tsk/test-find-block.sh b/tests/tsk/test-find-block.sh new file mode 100755 index 0000000..984947d --- /dev/null +++ b/tests/tsk/test-find-block.sh @@ -0,0 +1,66 @@ +#!/bin/bash - +# libguestfs +# Copyright (C) 2016 Red Hat Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +# Test the find-block command. + +if [ -n "$SKIP_TEST_FIND_BLOCK_SH" ]; then + echo "$0: test skipped because environment variable is set." + exit 77 +fi + +# Skip if TSK is not supported by the appliance. +if ! guestfish add /dev/null : run : available "libtsk"; then + echo "$0: skipped because TSK is not available in the appliance" + exit 77 +fi + +if [ ! -s ../../test-data/phony-guests/windows.img ]; then + echo "$0: skipped because windows.img is zero-sized" + exit 77 +fi + +output=$( +guestfish --ro -a ../../test-data/phony-guests/windows.img <<EOF +run +find-block /dev/sda2 0 +EOF +) + +# test $Boot is in the list +echo $output | grep -zq '{ tsk_inode: 7 +tsk_type: r +tsk_size: .* +tsk_name: \$Boot +tsk_flags: 1 +tsk_atime_sec: .* +tsk_atime_nsec: .* +tsk_mtime_sec: .* +tsk_mtime_nsec: .* +tsk_ctime_sec: .* +tsk_ctime_nsec: .* +tsk_crtime_sec: .* +tsk_crtime_nsec: .* +tsk_nlink: 1 +tsk_link: +tsk_spare1: 0 }' +if [ $? != 0 ]; then + echo "$0: \$Boot not found in files list." + echo "File list:" + echo $output + exit 1 +fi -- 2.9.3
On Saturday, 8 October 2016 18:27:21 CEST Matteo Cafasso wrote:> Patch ready for merging. > > v4: > > - check return code of tsk_fs_attr_walk > - pass TSK_FS_FILE_WALK_FLAG_NOSPARSE as additional flag to > tsk_fs_attr_walk > > After discussing with TSK authors the behaviour is clear. [1]Thanks, this improves the situation a bit.> In case of COMPRESSED blocks, the callback will be called for all the > attributes no matter whether they are on disk or not (sparse). In > such cases, the block address will be 0. [2]Note that the API docs say: For compressed and sparse attributes, the address *may* be zero. (emphasis is mine) My concern is that, if the address in such cases is "unspecified", then the comparisons in "attrwalk_callback" are done against a random/unitialized value (which would be bad). Also, if the block address would be zero, what's the point of having it among the blocks tsk_fs_attr_walk() iterates over? Thanks, -- Pino Toscano
2016-10-11 11:56 GMT+03:00 Pino Toscano <ptoscano@redhat.com>:> On Saturday, 8 October 2016 18:27:21 CEST Matteo Cafasso wrote: > > Patch ready for merging. > > > > v4: > > > > - check return code of tsk_fs_attr_walk > > - pass TSK_FS_FILE_WALK_FLAG_NOSPARSE as additional flag to > > tsk_fs_attr_walk > > > > After discussing with TSK authors the behaviour is clear. [1] > > Thanks, this improves the situation a bit. > > > In case of COMPRESSED blocks, the callback will be called for all the > > attributes no matter whether they are on disk or not (sparse). In > > such cases, the block address will be 0. [2] > > Note that the API docs say: > For compressed and sparse attributes, the address *may* be zero. > (emphasis is mine) >> My concern is that, if the address in such cases is "unspecified", then > the comparisons in "attrwalk_callback" are done against a > random/unitialized value (which would be bad). >I understand your concerns. The data will not be wrong. Is the API documentation being misleading. The data *will* be 0 for SPARSE blocks and *might* be 0 or not for compressed blocks based on certain criteria. See below.> Also, if the block address would be zero, what's the point of having it > among the blocks tsk_fs_attr_walk() iterates over? >This is due to the way NTFS organizes information and deals with its compression and the way the API loops over them. For each file or directory, there is a MFT (Master File Table) record which consists in a linear repository of attributes (1Kb of size each). Attributes can be resident within the MFT or non-resident according to their size. The $DATA attribute storing the actual file content is an example of typically non-resident ones. Non-resident attributes are stored on disk in what is referred as data-runs (contiguous blocks) which are then mapped within the attribute itself. A typical file greater than 800 Bytes has the $DATA attribute containing a map of data runs with their location on the disk. If the map itself is too big for the $DATA attribute (this can happen if the actual content is too fragmented), then extra records are created and their mapping is placed in a special attribute called $ATTRIBUTE_LIST. [1] When the given file is compressed (native NTFS compression, not application level one), the algorithm goes on each data run within the attribute and: [2] 1 if the data run is zero filled, will set the corresponding blocks as sparse and set their address to 0. 2 if compressing the data run does not save any disk block, it will leave it as is. 3 if compressing the data run does save one or more blocks, the spared one will be again marked as sparse and their address will be 0. Note that the entire attribute will be marked as compressed no matter what happened to the clusters on disk. The logic loops through all non-resident attributes (which is what we want: we want all the disk blocks allocated for that file). For each attribute, it loops over all the blocks which that attributes maps and calls the callback. Our issue is the information at the origin of the sparse flag: the information might come from the block (BAD/ALLOC/UNALLOC), or from the file metadata (RAW,SPARSE,COMPRESSED,CONT, META). [3] The tsk_fs_attr_walk() walks over the given attribute's blocks. In case we are inspecting attributes of compressed files, the flag will report the *file* status (COMPRESSED) yet will not able to tell us what the compression algorithm did (1,2,3) to that block. It will still correctly give us the address: 0 if sparse (case 1 or 3) or the correct number otherwise (case 2 or 3). [1] https://en.wikipedia.org/wiki/NTFS#Attribute_lists.2C_attributes.2C_and_streams [2] http://www.digital-evidence.org/fsfa/ - Chapter 11 [3] http://www.sleuthkit.org/sleuthkit/docs/api-docs/4.2/tsk__fs_8h.html#a1e6bf157f5d258191bf5d8ae31ee7148 - "Note that some of these are set only by file_walk because they are file-level details, such as compression and sparse."> Thanks, > -- > Pino Toscano > _______________________________________________ > Libguestfs mailing list > Libguestfs@redhat.com > https://www.redhat.com/mailman/listinfo/libguestfs >