Richard W.M. Jones
2016-Feb-02 19:35 UTC
Re: [Libguestfs] extract NTFS Master File Table for analysis
On Tue, Feb 02, 2016 at 07:40:12PM +0200, noxdafox wrote:> Greetings, > > I'm playing around an idea and I'd like to ask you some questions. > > I'd like to extract the MFT table from a disk image file. The idea > is to employ it to build a sort of reverse lookup table which, given > a cluster, could retrieve the corresponding file with the related > metadata. > > Such table could be used to optimize the analysis of disk snapshots > in order to collect the changes which happened on the disk. As the > disk snapshots contains only the new or modified clusters, I could > avoid exploring the whole FS content and focus on what has really > changed on disk. > > Did you explore the concept anyhow?No.> Is there a way I can use libguestfs to locate and extract the MFT > table from a disk image?If there's an ntfsprogs command that does this (ntfsinfo --mft maybe?) then it's really easy to extract the output from that command. You could hack it together using `debug sh', search this page: http://libguestfs.org/guestfs-faq.1.html ... but if you wanted to do it "properly" then you could add an API modelled on one of the `FileOut' APIs, eg: https://github.com/libguestfs/libguestfs/blob/master/daemon/base64.c#L100 For information on adding APIs, see: http://libguestfs.org/guestfs-hacking.1.html#adding-a-new-api This question of how do you find which disk block is associated with a particular file comes up often enough that I have looked at it various times on my blog: https://rwmj.wordpress.com/2014/02/21/use-guestfish-and-nbdkit-to-examine-physical-disk-locations/ https://rwmj.wordpress.com/2014/11/23/mapping-files-to-disk/ Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
noxdafox
2016-Feb-18 19:41 UTC
Re: [Libguestfs] extract NTFS Master File Table for analysis
On 02/02/16 21:35, Richard W.M. Jones wrote:> On Tue, Feb 02, 2016 at 07:40:12PM +0200, noxdafox wrote: >> Greetings, >> >> I'm playing around an idea and I'd like to ask you some questions. >> >> I'd like to extract the MFT table from a disk image file. The idea >> is to employ it to build a sort of reverse lookup table which, given >> a cluster, could retrieve the corresponding file with the related >> metadata. >> >> Such table could be used to optimize the analysis of disk snapshots >> in order to collect the changes which happened on the disk. As the >> disk snapshots contains only the new or modified clusters, I could >> avoid exploring the whole FS content and focus on what has really >> changed on disk. >> >> Did you explore the concept anyhow? > No. > >> Is there a way I can use libguestfs to locate and extract the MFT >> table from a disk image? > If there's an ntfsprogs command that does this (ntfsinfo --mft maybe?) > then it's really easy to extract the output from that command. You > could hack it together using `debug sh', search this page: > > http://libguestfs.org/guestfs-faq.1.html > > ... but if you wanted to do it "properly" then you could add an API > modelled on one of the `FileOut' APIs, eg: > > https://github.com/libguestfs/libguestfs/blob/master/daemon/base64.c#L100 > > For information on adding APIs, see: > > http://libguestfs.org/guestfs-hacking.1.html#adding-a-new-apiI played around a bit and I need to confess I am impressed on how easy is to add functionalities to libguestfs. I could easily extract the Master File Table using the download API and parse it with third party tools. I'd like to extract as well the Update Sequence Number Journal ($UsnJrnl) but it seems unaccessible via it's path (C:\$Extend\$UsnJrnl). I tried on a real disk and it seems to be a limitation of the NTFS-3g driver: it can extract C:\$MTF and C:\$LogFile, it can list C:\$Extend content but it cannot access those files. Curiously enough, stat() syscall on C:\$Extend\$UsnJrnl seems to work and returns the correct inode number. Yet the size is wrong as it reports 0 while the real one is > 9Mb. The next step I tried was to use ntfscat command in the following manner: ntfscat -i <UsnJrnl inode number> /dev/sdXX and it worked flawlessly. So I proceeded adding such API to libguestfs and I could extract the journal without any issue. The UsnJrnl file is very handy to check what changes were made on disk. Not only it's faster than using virt-diff on two different snapshots but it also shows much more relevant information. I could for example track down temporary files created and deleted within the two snapshots. All of this to say I'd like to add the possibility of extracting files via their inode. This functionality has the advantage of not requiring the FS to be mounted. Would libguestfs benefit from this? If so how should I proceed? Which API names to use? Most straightforward would be something like: ntfsicat(device, inode) or ntfsidownload(device, inode) I guess also linux guest disks would benefit from this but this requires a bit more research.> > This question of how do you find which disk block is associated with a > particular file comes up often enough that I have looked at it various > times on my blog: > > https://rwmj.wordpress.com/2014/02/21/use-guestfish-and-nbdkit-to-examine-physical-disk-locations/ > > https://rwmj.wordpress.com/2014/11/23/mapping-files-to-disk/ > > Rich. >
Richard W.M. Jones
2016-Feb-19 10:51 UTC
Re: [Libguestfs] extract NTFS Master File Table for analysis
On Thu, Feb 18, 2016 at 09:41:51PM +0200, noxdafox wrote:> All of this to say I'd like to add the possibility of extracting > files via their inode. This functionality has the advantage of not > requiring the FS to be mounted. Would libguestfs benefit from this? > > If so how should I proceed? Which API names to use?We generally tend to stick to API names which are the same as the underlying utility, so "ntfscat". In this case however ntfscat has lots of different modes, so we'd use a name like "ntfscat_i" for this API.> Most straightforward would be something like: > > ntfsicat(device, inode){ defaults with name = "ntfscat_i"; style = RErr, [Mountable "device"; Int64 "inode"; FileOut "filename"], []; ... } seems like the right sort of API to use.> I guess also linux guest disks would benefit from this but this > requires a bit more research.Not sure if there is any way to download a file by inode from a Linux filesystem. But it doesn't matter for this case. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html