Hi, for a backup program I have to find all differing files (including metadata) in two snapshots taken from the same subvolume. Having looked at the find-new command I thought about this process: 1. Get the two transids when the two snapshots were created. 2. Query modifications to the original subvolume between the two transids. Is the general process corrent or have I overseen something? AFAIS the btrfs tool does not provide the required information/commands. Would it be possible to add those? Thanks in advance, Arvin -- Arvin Schnell, <aschnell@suse.de> Senior Software Engineer, Research & Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Please note that my experience with btrfs is both recent and, above all, very small. However, I''ve been wondering about the same issue for a different purpose and your question intrigues me. However, and I may be off-base here, I think that wouldn''t be trivial to achieve. Even if one would be able to differ the metadata changes between both snapshots, the problem would still be present regarding finding the changed data. It would be possible to check for changed extents, at least by comparing extent checksums, but I don''t think it would be trivial to discover where (exactly) the extent was modified. I would recommend using the generation fields, whenever applicable, but I believe these are private to each subvolume/snapshot. Anyway, I wonder if keeping a data structure (I would go with a tree) containing metadata regarding the changed files, within the file system, could be a plausible solution, but I''m in no condition (btrfs-knowledge-wise) to make such statement. Cheers. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu On Feb 25, 2011, at 9:59 AM, Arvin Schnell wrote:> Hi, > > for a backup program I have to find all differing files > (including metadata) in two snapshots taken from the same > subvolume. > > Having looked at the find-new command I thought about this > process: > > 1. Get the two transids when the two snapshots were created. > > 2. Query modifications to the original subvolume between the two > transids. > > Is the general process corrent or have I overseen something? > > AFAIS the btrfs tool does not provide the required > information/commands. Would it be possible to add those? > > Thanks in advance, > Arvin > > -- > Arvin Schnell, <aschnell@suse.de> > Senior Software Engineer, Research & Development > SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/25/2011 10:59 AM, Arvin Schnell wrote:> Hi, > > for a backup program I have to find all differing files > (including metadata) in two snapshots taken from the same > subvolume. > > Having looked at the find-new command I thought about this > process: > > 1. Get the two transids when the two snapshots were created. > > 2. Query modifications to the original subvolume between the two > transids. > > Is the general process corrent or have I overseen something?I suppose that you are thinking to something like: - record the last trans-id (trans-id1) - update the file-system - [...] - record the last trans-id (trans-id2) - update the file-system - [...] - Backup all the objects which have a trans-id between (trans-id1-trans-id2] This may lead to miss two kinds of "operations" 1) a file deletion 2) a file changed two times, the first one after the first "snapshot", and the second one after the second snapshot. In the first case you would not be able to find any key update between the two trans-id(s), because they simply doesn''t exist. In the second case the trans-id associated to the object is after trans-id2. For solving the point two you must change "Query modifications to the original subvolume" into "Query modifications to the second snapshot". This means that the second snapshot must exist (it is not sufficient to know the trans-id).. For solving the point one, it is needed to a) track the change not only of the files but also of the directory (if you remove a file, the timestamp of the directory inode is updated). b) compare the update directories with the original ones. This means that the first snapshot must exist (it is not sufficient to know the trans-id). I have to point out that for a backup purpose would be sufficient to track the changed files (and not the deleted ones). I started to develop a tool to comparing two snapshot. But I stopped when I discovered that the ioctl BTRFS_IOC_TREE_SEARCH was not robust enough for that: when I tried to find the changed inode, attribute, extended attribute... I discovered that the ioctl BTRFS_IOC_TREE_SEARCH don''t work well is some corner case [*]. I even tried to propose a patch to mitigate the problem. But at the time the develop efforts were (are) oriented to other issues, and the patch was not merged.. However if you want to start to develop something, I can go deeper in the problem. [*] see the thread "Bug in the design of the tree search ioctl API ?", http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07523.html> AFAIS the btrfs tool does not provide the required > information/commands. Would it be possible to add those? > > Thanks in advance, > Arvin >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/25/2011 08:32 PM, João Eduardo Luís wrote:> Hello, > > Please note that my experience with btrfs is both recent and, above > all, very small. However, I''ve been wondering about the same issue > for a different purpose and your question intrigues me. > > However, and I may be off-base here, I think that wouldn''t be trivial > to achieve. > > Even if one would be able to differ the metadata changes between both > snapshots, the problem would still be present regarding finding the > changed data. It would be possible to check for changed extents, at > least by comparing extent checksums, but I don''t think it would be > trivial to discover where (exactly) the extent was modified.Look at the find-new command. It returns also which part of the file is changed. I don''t remember very well the details, but also the data is stored in a tree like the metadata. Using the same strategies of comparing the keys and revid leads to discover which part of the file is changed, with minimum effort (no checksums comparing is needed).> > I would recommend using the generation fields, whenever applicable, > but I believe these are private to each subvolume/snapshot. > > > Anyway, I wonder if keeping a data structure (I would go with a tree) > containing metadata regarding the changed files, within the file > system, could be a plausible solution, but I''m in no condition > (btrfs-knowledge-wise) to make such statement. > > > Cheers. > > --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu > > > > > > On Feb 25, 2011, at 9:59 AM, Arvin Schnell wrote: > >> Hi, >> >> for a backup program I have to find all differing files (including >> metadata) in two snapshots taken from the same subvolume. >> >> Having looked at the find-new command I thought about this >> process: >> >> 1. Get the two transids when the two snapshots were created. >> >> 2. Query modifications to the original subvolume between the two >> transids. >> >> Is the general process corrent or have I overseen something? >> >> AFAIS the btrfs tool does not provide the required >> information/commands. Would it be possible to add those? >> >> Thanks in advance, Arvin >> >> -- Arvin Schnell, <aschnell@suse.de> Senior Software Engineer, >> Research & Development SUSE LINUX Products GmbH, GF: Markus Rex, >> HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the >> line "unsubscribe linux-btrfs" in the body of a message to >> majordomo@vger.kernel.org More majordomo info at >> http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I had rich text enabled by default, and the ml bounced back the email. Apparently, HTML equals spam and/or virus. :-) Here goes the plain-text version. ---------- Forwarded message ---------- From: João Eduardo Luís <jecluis@gmail.com> Date: 2011/2/25 Subject: Re: Comparing snapshots? To: kreijack@inwind.it Cc: linux-btrfs@vger.kernel.org> On Feb 25, 2011, at 8:08 PM, Goffredo Baroncelli wrote: >> On 02/25/2011 08:32 PM, João Eduardo Luís wrote: >> >> Hello, >> >> Please note that my experience with btrfs is both recent and, above >> all, very small. However, I''ve been wondering about the same issue >> for a different purpose and your question intrigues me. >> >> However, and I may be off-base here, I think that wouldn''t be trivial >> to achieve. >> >> Even if one would be able to differ the metadata changes between both >> snapshots, the problem would still be present regarding finding the >> changed data. It would be possible to check for changed extents, at >> least by comparing extent checksums, but I don''t think it would be >> trivial to discover where (exactly) the extent was modified. > > Look at the find-new command. It returns also which part of the file is > changed. I don''t remember very well the details, but also the data is > stored in a tree like the metadata. Using the same strategies of > comparing the keys and revid leads to discover which part of the file is > changed, with minimum effort (no checksums comparing is needed).You are right. I just took a peek at the code, and it seems the generation id (which IIRC is the same as the id of the last modifying transaction) is shared file system wise, instead of being snapshot or subvolume specific. I should have confirmed in the code before replying. Cheers. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html