Hello. Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu
2011/6/28 João Eduardo Luís <jecluis@gmail.com>:> Hello. > > Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? > > This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume.generic deduplication? Josef posted some patches back in Jan: http://www.spinics.net/lists/linux-btrfs/msg07818.html http://www.spinics.net/lists/linux-btrfs/msg07819.html (etc) ... if that''s what your looking for. i don''t know whats all needed to make it work at this point, ie. if you only need the patch to btrfs-progs or some combination. there could be more recent patches but i don''t recall anyone talking about any. C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote:> 2011/6/28 João Eduardo Luís <jecluis@gmail.com>: >> Hello. >> >> Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? >> >> This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume. > > generic deduplication? >I''m not sure if deduplication is what I''m looking for. What I actually want to achieve is to reconstruct a file''s data from two diverging files. I.e., two snapshots are taken from the same subvolume and, in each snapshot, a given file A is written to. Assuming different blocks were written on, and no expected semantics are violated, what I aim to achieve is the correct reconciliation of file A in one of the snapshots. Maybe this could be achieved by using deduplication. I''ll look into those patches. Even if they are not completely useful, they very well contain some neat concept that may be used to solve this little puzzle of mine. :-) Thanks. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu
João Eduardo Luís wrote:> Hello. > > Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? > > This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume. >How about the file clone ioctl? It won''t copy data, but it makes the dest file points to the same extents of the source file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Jun 28, 2011, at 8:06 PM, Hugo Mills wrote:> On Tue, Jun 28, 2011 at 06:55:41PM +0100, João Eduardo Luís wrote: >> On Jun 28, 2011, at 4:07 PM, C Anthony Risinger wrote: >> >>> 2011/6/28 João Eduardo Luís <jecluis@gmail.com>: >>>> Hello. >>>> >>>> Can anyone think of a simple way to copy a set of pages from a given file (which may or may not be scattered throughout multiple extents) from a snapshot to correct pages within another file on another snapshot? >>>> >>>> This might sound silly, but the whole purpose is to create some sort of reconciliation method between divergent snapshots taken from the same original subvolume. >>> >>> generic deduplication? >>> >> >> I''m not sure if deduplication is what I''m looking for. >> >> What I actually want to achieve is to reconstruct a file''s data from >> two diverging files. I.e., two snapshots are taken from the same >> subvolume and, in each snapshot, a given file A is written >> to. Assuming different blocks were written on, and no expected >> semantics are violated, what I aim to achieve is the correct >> reconciliation of file A in one of the snapshots. >> >> Maybe this could be achieved by using deduplication. I''ll look into >> those patches. Even if they are not completely useful, they very >> well contain some neat concept that may be used to solve this little >> puzzle of mine. :-) > > You would need to enumerate the extents on each representation of > the file, picking the ones with the latest transid in each case. You > would then need to work out what the extents on the reconstructed file > would look like, and glue them all together into a new file. >In my case, I don''t need to search the latest transid, since I keep an in-memory log of changes made within each snapshot. As these snapshots are ephemeral and created/destroyed on-demand by a user-level application, the associated cost of keeping such per-snapshot log doesn''t seem to cause much impact on the performance. However, I log operations performed on a per-page basis. Glueing modified extents on each snapshot doesn''t seem viable without deduplicating them first, or I may end up losing updates I did not intended to lose. On the other hand, I''m afraid the deduplication will lead to severe disk fragmentation when performed on a page-basis (e.g., if changes are made on several non-contiguous pages within several extents, in the same file on different snapshots, I would end up with several smaller extents scattered throughout disk). This is pretty much why I expected to be able to, literally, copy the changed pages from one snapshot to another, without deduplicating the extents. However, after spending the last couple of days looking for a simple way to do it, I now believe achieving this is far more complicated and prone to error (unless I missed something) than deduplicating the extents based on my logged information. Any thoughts would be helpful. --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu