I have been thinking about a new feature to start work on that I am interested in and I was hoping people could give me some feedback and ideas of how to tackle it. Anyways, I want to create a data deduplication system that can work in two different modes. One mode is that when the system is idle or not beyond a set load point a background process would scan the volume for duplicate blocks. The other mode would be used for systems that are nearline or backup systems that don''t really care about the performance and it would do the deduplication during block allocation. One of the ways I was thinking of to find the duplicate blocks would be to use the checksums as a quick compare. If the checksums match then do a complete compare before adjusting the nodes on the files. However, I believe that I will need to create a tree based on the checksum values. So any other ideas and thoughts about this? Thanks, Morey -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Morey Roof wrote:> I have been thinking about a new feature to start work on that I am > interested in and I was hoping people could give me some feedback and > ideas of how to tackle it. Anyways, I want to create a data > deduplication system that can work in two different modes. One mode is > that when the system is idle or not beyond a set load point a background > process would scan the volume for duplicate blocks. The other mode > would be used for systems that are nearline or backup systems that don''t > really care about the performance and it would do the deduplication > during block allocation. > > One of the ways I was thinking of to find the duplicate blocks would be > to use the checksums as a quick compare. If the checksums match then do > a complete compare before adjusting the nodes on the files. However, I > believe that I will need to create a tree based on the checksum values. > > So any other ideas and thoughts about this?This is something that I''m very interested in myself. Mainly for backup purposes but the background deduplication scheme is also interesting and something I had not thought of. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Morey Roof wrote:> I have been thinking about a new feature to start work on that I am > interested in and I was hoping people could give me some feedback and > ideas of how to tackle it. Anyways, I want to create a data > deduplication system that can work in two different modes. One mode is > that when the system is idle or not beyond a set load point a background > process would scan the volume for duplicate blocks. The other mode > would be used for systems that are nearline or backup systems that don''t > really care about the performance and it would do the deduplication > during block allocation. > > One of the ways I was thinking of to find the duplicate blocks would be > to use the checksums as a quick compare. If the checksums match then do > a complete compare before adjusting the nodes on the files. However, I > believe that I will need to create a tree based on the checksum values. > > So any other ideas and thoughts about this?Don''t do it!!! OK, I know Chris has described some block sharing. But I hate it. If I copy "resume" to "resume.save", it is because I want 2 copies for safety. I don''t want the fs to reduce it to 1 copy. And reducing the duplicates is exactly opposite to Chris''s paranoid make-multiple-copies-by-default. Now feel free to tell me I''m an idiot (other people do) :) jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Don''t do it!!! > > OK, I know Chris has described some block sharing. But I hate it. > > If I copy "resume" to "resume.save", it is because I want 2 copies > for safety. I don''t want the fs to reduce it to 1 copy. And > reducing the duplicates is exactly opposite to Chris''s paranoid > make-multiple-copies-by-default. > > Now feel free to tell me I''m an idiot (other people do) :)I would imagine that it would be optional and at worst, a specialized fork of btrfs that follows the mainline but adds the deduplication bits. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
jim owens wrote:> Don''t do it!!! > > OK, I know Chris has described some block sharing. But I hate it. > > If I copy "resume" to "resume.save", it is because I want 2 copies > for safety. I don''t want the fs to reduce it to 1 copy. And > reducing the duplicates is exactly opposite to Chris''s paranoid > make-multiple-copies-by-default.Hi Jim, My thoughts on what you are saying is that it is not generally a good idea to assume any filesystem will lay things out in any specific way, including whether it has one-to-one mapping of files to blocks. In other words, making a copy of a file on the same filesystem for safety reasons (unless you are modifying a file and want a backup of its old state, like emacs'' ~ files) is probably not a great habit to get into. The implementation details of how a filesystem makes things safer should be behind-the-scenes (like checksums, multiple-copies-by-default, mirroring, etc.). That way, you can simply rely on the filesystem to manage protection of your data rather than going to the effort of managing multiple copies of files yourself for that reason. -Joe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I was hoping to have it specified as a set of mount options or defaults controlled in the super block so that if you want to use it you can otherwise it doesn''t exist. -Morey Jeff Fisher wrote:>> Don''t do it!!! >> >> OK, I know Chris has described some block sharing. But I hate it. >> >> If I copy "resume" to "resume.save", it is because I want 2 copies >> for safety. I don''t want the fs to reduce it to 1 copy. And >> reducing the duplicates is exactly opposite to Chris''s paranoid >> make-multiple-copies-by-default. >> >> Now feel free to tell me I''m an idiot (other people do) :) > > I would imagine that it would be optional and at worst, a specialized > fork of btrfs that follows the mainline but adds the deduplication bits. > > Jeff >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Don''t do it!!! > > OK, I know Chris has described some block sharing. But I hate it. > > If I copy "resume" to "resume.save", it is because I want 2 copies > for safety. I don''t want the fs to reduce it to 1 copy. And > reducing the duplicates is exactly opposite to Chris''s paranoid > make-multiple-copies-by-default. > > Now feel free to tell me I''m an idiot (other people do) :)In situations where there''s non-trivial benefits to some workloads, but also non-trivial drawbacks, it strikes me as something that could be enabled and disabled as a mount option, like data=ordered. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joe Peterson wrote:> My thoughts on what you are saying is that it is not generally a good > idea to assume any filesystem will lay things out in any specific way, > including whether it has one-to-one mapping of files to blocks. In > other words, making a copy of a file on the same filesystem for safety > reasons (unless you are modifying a file and want a backup of its old > state, like emacs'' ~ files) is probably not a great habit to get into. > > The implementation details of how a filesystem makes things safer should > be behind-the-scenes (like checksums, multiple-copies-by-default, > mirroring, etc.). That way, you can simply rely on the filesystem to > manage protection of your data rather than going to the effort of > managing multiple copies of files yourself for that reason.I''m a filesystem guy so I only use ones I know do what I want, I never trust ones I don''t know about :) I agree with you about the danger of assuming what a filesystem will or won''t do on local copies. I also fear that 99% of normal users have an expectation that making a copy makes a new physical instance (which of course is not safe if the device crashes either). I hate dealing with customers that have lost their data because of the filesystem. Morey Roof wrote:> I was hoping to have it specified as a set of mount options or defaults > controlled in the super block so that if you want to use it you can > otherwise it doesn''t exist.I don''t have veto power in btrfs so my aversion means nothing. As you say, there are a number of good ways to control it. If you pursue this, I might suggest having a dedup limit so they can say "keep at least 2 copies" or "just one". jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This would be a kind of filesystem block level compression, right? On Wed, Aug 13, 2008 at 12:28 PM, <btrfs-devel@arbitraryconstant.com> wrote:>> Don''t do it!!! >> >> OK, I know Chris has described some block sharing. But I hate it. >> >> If I copy "resume" to "resume.save", it is because I want 2 copies >> for safety. I don''t want the fs to reduce it to 1 copy. And >> reducing the duplicates is exactly opposite to Chris''s paranoid >> make-multiple-copies-by-default. >> >> Now feel free to tell me I''m an idiot (other people do) :) > > In situations where there''s non-trivial benefits to some workloads, but > also non-trivial drawbacks, it strikes me as something that could be > enabled and disabled as a mount option, like data=ordered. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2008-08-13 at 15:28 -0400, jim owens wrote:> Joe Peterson wrote: > > My thoughts on what you are saying is that it is not generally a good > > idea to assume any filesystem will lay things out in any specific way, > > including whether it has one-to-one mapping of files to blocks. In > > other words, making a copy of a file on the same filesystem for safety > > reasons (unless you are modifying a file and want a backup of its old > > state, like emacs'' ~ files) is probably not a great habit to get into. > > > > The implementation details of how a filesystem makes things safer should > > be behind-the-scenes (like checksums, multiple-copies-by-default, > > mirroring, etc.). That way, you can simply rely on the filesystem to > > manage protection of your data rather than going to the effort of > > managing multiple copies of files yourself for that reason. > > I''m a filesystem guy so I only use ones I know do what I want, > I never trust ones I don''t know about :) > > I agree with you about the danger of assuming what a filesystem > will or won''t do on local copies. I also fear that 99% of normal > users have an expectation that making a copy makes a new physical > instance (which of course is not safe if the device crashes either). > > I hate dealing with customers that have lost their data because > of the filesystem. > > > Morey Roof wrote: > > I was hoping to have it specified as a set of mount options or defaults > > controlled in the super block so that if you want to use it you can > > otherwise it doesn''t exist. > > I don''t have veto power in btrfs so my aversion means nothing. > > As you say, there are a number of good ways to control it. > > If you pursue this, I might suggest having a dedup limit > so they can say "keep at least 2 copies" or "just one". > > jimThis is an interesting idea. Perhaps we could tie this in a little closer to the protection systems in btrfs so that depending on the type of protection chosen the dedup will try to match it. However, I need to start figuring out a prototype for this which is what I started this thread for. -Morey -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
It is the same idea for the way space efficient snapshots work currently in btrfs. I''m just planning out a way to make a process got find them and link them together and increase the reference counts and another method for having the allocator do it while writing. -Morey On Wed, 2008-08-13 at 12:35 -0700, Kevin Cantu wrote:> This would be a kind of filesystem block level compression, right? > > On Wed, Aug 13, 2008 at 12:28 PM, <btrfs-devel@arbitraryconstant.com> wrote: > >> Don''t do it!!! > >> > >> OK, I know Chris has described some block sharing. But I hate it. > >> > >> If I copy "resume" to "resume.save", it is because I want 2 copies > >> for safety. I don''t want the fs to reduce it to 1 copy. And > >> reducing the duplicates is exactly opposite to Chris''s paranoid > >> make-multiple-copies-by-default. > >> > >> Now feel free to tell me I''m an idiot (other people do) :) > > > > In situations where there''s non-trivial benefits to some workloads, but > > also non-trivial drawbacks, it strikes me as something that could be > > enabled and disabled as a mount option, like data=ordered. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Morey Roof <moreyroof@gmail.com> writes:> I have been thinking about a new feature to start work on that I am > interested in and I was hoping people could give me some feedback and > ideas of how to tackle it. Anyways, I want to create a data > deduplication system that can work in two different modes. One mode > is that when the system is idle or not beyond a set load point a > background process would scan the volume for duplicate blocks. The > other mode would be used for systems that are nearline or backup > systems that don''t really care about the performance and it would do > the deduplication during block allocation.Seems like a special case of compression? Perhaps compression would help more?> One of the ways I was thinking of to find the duplicate blocks would > be to use the checksums as a quick compare. If the checksums match > then do a complete compare before adjusting the nodes on the files. > However, I believe that I will need to create a tree based on the > checksum values.If you really want to do deduplication: It might be advantageous to do this on larger units. If you assume that data is usually shared between similar files (which is a reasonable assumption) and do the deduplication on whole files you can also use the size as an index and avoid checksumming all files with a unique size. I wrote a user level duplicated file checker some time ago that used this trick successfully. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2008-08-13 at 22:00 +0200, Andi Kleen wrote:> Morey Roof <moreyroof@gmail.com> writes: > > > I have been thinking about a new feature to start work on that I am > > interested in and I was hoping people could give me some feedback and > > ideas of how to tackle it. Anyways, I want to create a data > > deduplication system that can work in two different modes. One mode > > is that when the system is idle or not beyond a set load point a > > background process would scan the volume for duplicate blocks. The > > other mode would be used for systems that are nearline or backup > > systems that don''t really care about the performance and it would do > > the deduplication during block allocation. > > Seems like a special case of compression? Perhaps compression would help > more? > > > One of the ways I was thinking of to find the duplicate blocks would > > be to use the checksums as a quick compare. If the checksums match > > then do a complete compare before adjusting the nodes on the files. > > However, I believe that I will need to create a tree based on the > > checksum values. > > If you really want to do deduplication: It might be advantageous to do > this on larger units. > > If you assume that data is usually shared between similar files (which > is a reasonable assumption) and do the deduplication on whole files > you can also use the size as an index and avoid checksumming all files > with a unique size. I wrote a user level duplicated file checker some > time ago that used this trick successfully. > > -AndiI would like to use the tree in a similar fashion as the way snapshots are handled. Also, I want to catch blocks that exist in different files. Say, you have several Virtual Machine Disk files on the volume. Those virtual machine disk files may not be the same in terms of files but if the are virtual machines that are both running the same operating system then some of the blocks/extents are going to be the same and I want to be able to dedup them. -Morey -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2008-08-13 at 14:54 -0400, jim owens wrote:> Morey Roof wrote: > > I have been thinking about a new feature to start work on that I am > > interested in and I was hoping people could give me some feedback and > > ideas of how to tackle it. Anyways, I want to create a data > > deduplication system that can work in two different modes. One mode is > > that when the system is idle or not beyond a set load point a background > > process would scan the volume for duplicate blocks. The other mode > > would be used for systems that are nearline or backup systems that don''t > > really care about the performance and it would do the deduplication > > during block allocation. > > > > One of the ways I was thinking of to find the duplicate blocks would be > > to use the checksums as a quick compare. If the checksums match then do > > a complete compare before adjusting the nodes on the files. However, I > > believe that I will need to create a tree based on the checksum values. > > > > So any other ideas and thoughts about this? > > Don''t do it!!! > > OK, I know Chris has described some block sharing. But I hate it. > > If I copy "resume" to "resume.save", it is because I want 2 copies > for safety. I don''t want the fs to reduce it to 1 copy. And > reducing the duplicates is exactly opposite to Chris''s paranoid > make-multiple-copies-by-default. > > Now feel free to tell me I''m an idiot (other people do) :)Grin, the C in cow does stand for something after all. It is pretty darn hard to overwrite existing bytes in a file in btrfs without mount -o nodatacow. There isn''t any difference between dedup and a snapshot from a data protection point of view. With that in said, maintaining all the machinery for dedup is definitely non-trivial, and I haven''t yet convinced myself it wouldn''t be better done at higher layers. We already have the cow-single-file ioctl, why not have a userland process go around and create cow links between identical files. File granularity is not well suited to dedup when files differ by only a few blocks, but I''d want to see some numbers on how often that happens before carrying around the disk format needed to do block level dedup. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> With that in said, maintaining all the machinery for dedup is definitely > non-trivial, and I haven''t yet convinced myself it wouldn''t be better > done at higher layers. We already have the cow-single-file ioctl, why > not have a userland process go around and create cow links between > identical files.Can verification + cow-single-file be done atomically with the existing ioctl? Pushing this out into userspace seems reasonable enough, but not if it gives a malicious user a window where they can alter part of their file and have it CoW''d into everyone else''s copy. Is this a 1.0 feature? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> File granularity is not well suited to dedup when files differ by only a > few blocks, but I''d want to see some numbers on how often that happens > before carrying around the disk format needed to do block level dedup.I was imagining that one could easily make a flag to debug-tree which caused it to just dump the file block checksums from the extent items, maybe restricted to a given subvol. Pipe that through sort and uniq -c and you have a pretty easy path to a rough histogram of checksum values. But I sort of wonder if the point isn''t to dedup systems that were deployed on previous-generation file systems. If people knew that dedup worked, they might be able to more easily deploy simpler systems that didn''t have to be so careful at, say, maintaining hard link farms. I dunno, just a thought. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 14, 2008 at 12:49 PM, Zach Brown <zach.brown@oracle.com> wrote:> >> File granularity is not well suited to dedup when files differ by only a >> few blocks, but I''d want to see some numbers on how often that happens >> before carrying around the disk format needed to do block level dedup. > > I was imagining that one could easily make a flag to debug-tree which > caused it to just dump the file block checksums from the extent items, > maybe restricted to a given subvol. Pipe that through sort and uniq -c > and you have a pretty easy path to a rough histogram of checksum values. > > But I sort of wonder if the point isn''t to dedup systems that were > deployed on previous-generation file systems. If people knew that dedup > worked, they might be able to more easily deploy simpler systems that > didn''t have to be so careful at, say, maintaining hard link farms. > > I dunno, just a thought. > > - z >Well, if we look at NetApp''s claims dedup can be pretty useful. Also, it really depends on what kinds of data workload the volume is being used for. People claim that they often see a size reduction of about 40%-80% of the space used when they have volumes that store virtual machine disk files. Chris, I would like to go ahead and make a small simple prototype for this and see how it works before just ruling it out of the code base. -Morey -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2008-08-14 at 11:49 -0700, Zach Brown wrote:> > File granularity is not well suited to dedup when files differ by only a > > few blocks, but I''d want to see some numbers on how often that happens > > before carrying around the disk format needed to do block level dedup. > > I was imagining that one could easily make a flag to debug-tree which > caused it to just dump the file block checksums from the extent items, > maybe restricted to a given subvol. Pipe that through sort and uniq -c > and you have a pretty easy path to a rough histogram of checksum values. > > But I sort of wonder if the point isn''t to dedup systems that were > deployed on previous-generation file systems. If people knew that dedup > worked, they might be able to more easily deploy simpler systems that > didn''t have to be so careful at, say, maintaining hard link farms. > > I dunno, just a thought.The backup and virtualization use cases are why I''ve still got it on the table for consideration at least. Especially virtualization because there you''ll tend to have large disk image files that have tiny changes between each other. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html