valrhona at gmail.com
2010-Mar-02 04:48 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
One of the most useful things I''ve found with ZFS dedup (way to go Jeff Bonwick and Co.!) is the ability to consolidate backups. I had six different complete backups of all of my files spread out over various hard drives, and dedup allowed me to consolidate them into something that took less twice the space of the original. I was thrilled when I saw this the first time. This led me to another idea: I have been using DVDs for small backups here and there for a decade now, and have a huge pile of several hundred. They have a lot of overlapping content, so I was thinking of feeding the entire stack into some sort of DVD autoloader, which would just read each disk, and write its contents to a ZFS filesystem with dedup enabled. Even if the autoloader had to run on Windows or Linux, I could just use a mounted drive to achieve the same ends. That would allow me to consolidate a few hundred CDs and DVDs onto probably a terabyte or so, which could then be kept conveniently on a hard drive and archived to tape. Does anyone know of a DVD autoloader that would allow me to do this easily, and if someone might be willing to rent one to me (I''m in the Boston area)? I only need to do this once. -- This message posted from opensolaris.org
Thomas Burgess
2010-Mar-02 05:03 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On Mon, Mar 1, 2010 at 11:48 PM, valrhona at gmail.com <valrhona at gmail.com>wrote:> One of the most useful things I''ve found with ZFS dedup (way to go Jeff > Bonwick and Co.!) is the ability to consolidate backups. I had six different > complete backups of all of my files spread out over various hard drives, and > dedup allowed me to consolidate them into something that took less twice the > space of the original. I was thrilled when I saw this the first time. > > This led me to another idea: I have been using DVDs for small backups here > and there for a decade now, and have a huge pile of several hundred. They > have a lot of overlapping content, so I was thinking of feeding the entire > stack into some sort of DVD autoloader, which would just read each disk, and > write its contents to a ZFS filesystem with dedup enabled. Even if the > autoloader had to run on Windows or Linux, I could just use a mounted drive > to achieve the same ends. That would allow me to consolidate a few hundred > CDs and DVDs onto probably a terabyte or so, which could then be kept > conveniently on a hard drive and archived to tape. Does anyone know of a DVD > autoloader that would allow me to do this easily, and if someone might be > willing to rent one to me (I''m in the Boston area)? I only need to do this > once. > -- >This would be a kick ass project to try to make with spare parts. I might even try it now that you bring it up.> This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100302/a6b21957/attachment.html>
Kjetil Torgrim Homme
2010-Mar-02 15:15 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
"valrhona at gmail.com" <valrhona at gmail.com> writes:> I have been using DVDs for small backups here and there for a decade > now, and have a huge pile of several hundred. They have a lot of > overlapping content, so I was thinking of feeding the entire stack > into some sort of DVD autoloader, which would just read each disk, and > write its contents to a ZFS filesystem with dedup enabled. [...] That > would allow me to consolidate a few hundred CDs and DVDs onto probably > a terabyte or so, which could then be kept conveniently on a hard > drive and archived to tape.it would be inconvenient to make a dedup copy on harddisk or tape, you could only do it as a ZFS filesystem or ZFS send stream. it''s better to use a generic tool like hardlink(1), and just delete files afterwards with find . -type f -links +1 -exec rm {} \; (untested! notice that using xargs or -exec rm {} + will wipe out all copies of your duplicate files, so don''t do that!) http://linux.die.net/man/1/hardlink perhaps this is more convenient: http://netdial.caribe.net/~adrian2/fdupes.html -- Kjetil T. Homme Redpill Linpro AS - Changing the game
Freddie Cash
2010-Mar-02 18:48 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On Tue, Mar 2, 2010 at 7:15 AM, Kjetil Torgrim Homme <kjetilho at linpro.no>wrote:> "valrhona at gmail.com" <valrhona at gmail.com> writes: > > > I have been using DVDs for small backups here and there for a decade > > now, and have a huge pile of several hundred. They have a lot of > > overlapping content, so I was thinking of feeding the entire stack > > into some sort of DVD autoloader, which would just read each disk, and > > write its contents to a ZFS filesystem with dedup enabled. [...] That > > would allow me to consolidate a few hundred CDs and DVDs onto probably > > a terabyte or so, which could then be kept conveniently on a hard > > drive and archived to tape. > > it would be inconvenient to make a dedup copy on harddisk or tape, you > could only do it as a ZFS filesystem or ZFS send stream. it''s better to > use a generic tool like hardlink(1), and just delete files afterwards > with > > Why would it be inconvenient? This is pretty much exactly what ZFS +dedupe is perfect for. Since dedupe is pool-wide, you could create individual filesystems for each DVD. Or use just 1 filesystem with sub-directories. Or just one filesystem with snapshots after each DVD is copied over top. The data would be dedupe''d on write, so you would only have 1 copy of unique data. To save it to tape, just "zfs send" it, and save the stream file. ZFS dedupe would also work better than hardlinking files, as it works at the block layer, and will be able to dedupe partial files. -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100302/3641327a/attachment.html>
Kjetil Torgrim Homme
2010-Mar-02 19:13 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
Freddie Cash <fjwcash at gmail.com> writes:> Kjetil Torgrim Homme <kjetilho at linpro.no> wrote: > > it would be inconvenient to make a dedup copy on harddisk or tape, > you could only do it as a ZFS filesystem or ZFS send stream. ?it''s > better to use a generic tool like hardlink(1), and just delete > files afterwards with > > Why would it be inconvenient? ?This is pretty much exactly what ZFS + > dedupe is perfect for.the duplication is not visible, so it''s still a wilderness of duplicates when you navigate the files.> Since dedupe is pool-wide, you could create individual filesystems for > each DVD. ?Or use just 1 filesystem with sub-directories. ?Or just one > filesystem with snapshots after each DVD is copied over top. > > The data would be dedupe''d on write, so you would only have 1 copy of > unique data.for this application, I don''t think the OP *wants* COW if he changes one file. he''ll want the duplicates to be kept in sync, not diverging (in contrast to storage for VMs, for instance). with hardlinks, it is easier to identify duplicates and handle them however you like. if there is a reason for the duplicate access paths to your data, you can keep them. I would want to straighten the mess out, though, rather than keep it intact as closely as possible.> To save it to tape, just "zfs send" it, and save the stream file.the zfs stream format is not recommended for archiving.> ZFS dedupe would also work better than hardlinking files, as it works > at the block layer, and will be able to dedupe partial files.yes, but for the most part this will be negligible. copies of growing files, like log files, or perhaps your novel written as a stream of conciousness, will benefit. unrelated partially identical files are rare. -- Kjetil T. Homme Redpill Linpro AS - Changing the game
valrhona at gmail.com
2010-Mar-02 21:31 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
Freddie: I think you understand my intent correctly. This is not about a perfect backup system. The point is that I have hundreds of DVDs that I don''t particularly want to sort out, but they are pretty useless from a management standpoint in their current form. ZFS + dedup would be the way to at least get them all in one place, where at least I can search, etc.---which is pretty much impossible on a stack of disks. I also don''t want file-level dedup, as a lot of these disks are a "oh, it''s the end of the day; I''m going to burn what I worked on today, so if my computer dies I won''t be completely stuck on this project..." File-level dedup would be a nightmare to sort out, because of lots of incremental changes---exactly the point of block-level dedup. This is not an organized archive at all; I just want to consolidate a bunch of old disks, in the small case they could be useful, and do it without investing much time. So does anyone know of an autoloader solution that would do this? -- This message posted from opensolaris.org
Lori Alt
2010-Mar-02 22:42 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On 03/ 2/10 11:48 AM, Freddie Cash wrote:> On Tue, Mar 2, 2010 at 7:15 AM, Kjetil Torgrim Homme > <kjetilho at linpro.no <mailto:kjetilho at linpro.no>> wrote: > > "valrhona at gmail.com <mailto:valrhona at gmail.com>" > <valrhona at gmail.com <mailto:valrhona at gmail.com>> writes: > > > I have been using DVDs for small backups here and there for a decade > > now, and have a huge pile of several hundred. They have a lot of > > overlapping content, so I was thinking of feeding the entire stack > > into some sort of DVD autoloader, which would just read each > disk, and > > write its contents to a ZFS filesystem with dedup enabled. [...] > That > > would allow me to consolidate a few hundred CDs and DVDs onto > probably > > a terabyte or so, which could then be kept conveniently on a hard > > drive and archived to tape. > > it would be inconvenient to make a dedup copy on harddisk or tape, you > could only do it as a ZFS filesystem or ZFS send stream. it''s > better to > use a generic tool like hardlink(1), and just delete files afterwards > with > > Why would it be inconvenient? This is pretty much exactly what ZFS + > dedupe is perfect for. > > Since dedupe is pool-wide, you could create individual filesystems for > each DVD. Or use just 1 filesystem with sub-directories. Or just one > filesystem with snapshots after each DVD is copied over top. > > The data would be dedupe''d on write, so you would only have 1 copy of > unique data. > > To save it to tape, just "zfs send" it, and save the stream file.Stream dedup is largely independent of on-disk dedup. If the content is dedup''ed on disk, but you don''t specify the -D to ''zfs send'', the dedup''ed data will be re-expanded. Even if the content is NOT dedup''ed on disk, the -D option will cause the blocks to be dedup''ed in the stream. One advantage to using them both is that the ''zfs send -D'' processing doesn''t need to recalculate the block checksums if they already exist on disk. This speeds up the send stream generation code by a lot. Also, in response to another comment about the send stream format not being recommended for archiving, that all depends on how you intend to use the send stream in the future. The format IS supported going forward, and future version of zfs will continue to be capable of reading older send stream formats (the zfs(1M) man page has been modified to clarify this now). Lori -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100302/bac8c1b0/attachment.html>
Toby Thain
2010-Mar-03 00:18 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On 2-Mar-10, at 4:31 PM, valrhona at gmail.com wrote:> Freddie: I think you understand my intent correctly. > > This is not about a perfect backup system. The point is that I have > hundreds of DVDs that I don''t particularly want to sort out, but > they are pretty useless from a management standpoint in their > current form. ZFS + dedup would be the way to at least get them all > in one place, where at least I can search, etc.---which is pretty > much impossible on a stack of disks. > > I also don''t want file-level dedup, as a lot of these disks are a > "oh, it''s the end of the day; I''m going to burn what I worked on > today, so if my computer dies I won''t be completely stuck on this > project..."Wow, you are going to like snapshots and redundancy a whole lot better, as a solution to that. --Toby> File-level dedup would be a nightmare to sort out, because of lots > of incremental changes---exactly the point of block-level dedup. > > This is not an organized archive at all; I just want to consolidate > a bunch of old disks, in the small case they could be useful, and > do it without investing much time. > > So does anyone know of an autoloader solution that would do this? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
R.G. Keen
2010-Mar-03 01:35 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
This is meant with the sincerest of urges to help. I have a similar situation, and pondered much the same issues. However, I''m extremely short of time as it is. I decided that my needs would be best served leaving the data on those backup DVDs and CDs in case I needed it. The "in case I need it" is something that hasn''t happened to me for over fifteen years now, largely because I''m careful with what I''m working on at the moment and backing to other disks. You might first want to install DVDisaster to scan those old disks and see if they''re still self consistent. Some of them may not be readable at all, or only partially readable. And as to automation for reading: I recently ripped and archived my entire CD collection, some 500 titles. Not the same issue in terms of data, but much the same in terms of needing to load/unload the disks. I went as far as to think of getting/renting an autoloader, but I found that it was much more efficient to keep a stack by my desk and swap disks when the ripper beeped at me. This was a very low priority task in my personal stack, but over a few weeks, there were enough beeps and minutes to swap the disks out. It''s very tempting to use a neato tool - and zfs is a major neat one! - when there''s a task to be done. However, sometimes just scratching away at a task a little at a time is almost as fast and much cheaper. Now, how did you say you set up dedup? 8-) -- This message posted from opensolaris.org
Dan Pritts
2010-Mar-04 15:02 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On Tue, Mar 02, 2010 at 05:35:07PM -0800, R.G. Keen wrote:> And as to automation for reading: I recently ripped and archived my entire CD collection, some 500 titles. Not the same issue in terms of data, but much the same in terms of needing to load/unload the disks. I went as far as to think of getting/renting an autoloader, but I found that it was much more efficient to keep a stack by my desk and swap disks when the ripper beeped at me. This was a very low priority task in my personal stack, but over a few weeks, there were enough beeps and minutes to swap the disks out.I did something very similar but with over 1000 CDs. If you can scare up an external DVD drive, use it too - that way you''ll have to change half as many times. danno -- Dan Pritts, Sr. Systems Engineer Internet2 office: +1-734-352-4953 | mobile: +1-734-834-7224 Internet2 Spring Member Meeting April 26-28, 2010 - Arlington, Virginia http://events.internet2.edu/2010/spring-mm/
Kyle McDonald
2010-May-04 13:29 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote:> "valrhona at gmail.com" <valrhona at gmail.com> writes: > > >> I have been using DVDs for small backups here and there for a decade >> now, and have a huge pile of several hundred. They have a lot of >> overlapping content, so I was thinking of feeding the entire stack >> into some sort of DVD autoloader, which would just read each disk, and >> write its contents to a ZFS filesystem with dedup enabled. [...] That >> would allow me to consolidate a few hundred CDs and DVDs onto probably >> a terabyte or so, which could then be kept conveniently on a hard >> drive and archived to tape. >> > it would be inconvenient to make a dedup copy on harddisk or tape, you > could only do it as a ZFS filesystem or ZFS send stream. it''s better to > use a generic tool like hardlink(1), and just delete files afterwards > with > >There is a perl script floating around on the internet for years that will convert copies of files on the same FS to hardlinks (sorry I don''t have the name handy). So you don''t need ZFS. Once this is done you can even recreate an ISO and burn it back to DVD (possibly merging hundreds of CD''s into one DVD (or BD!). The script can also delete the duplicates, but there isn''t much control over which one it keeps - for backupsyou may realyl want to keep the earliest (or latest?) backup the file appeared in. Using ZFS Dedup is an interesting way of doing this. However archiving the result may be hard. If you use different datasets (FS''s) for each backup, can you only send 1 dataset at a time (since you can only snapshot on a dataset level? Won''t that ''undo'' the deduping? If you instead put all the backups on on data set, then the snapshot can theoretically contain the dedpued data. I''m not clear on whether ''send''ing it will preserve the deduping or not - or if it''s up to the receiving dataset to recognize matching blocks? If the dedup is in the stream, then you may be able to write the stream to a DVD or BD. Still if you save enough space so that you can add the required level of redundancy, you could just leave it on disk and chuck the DVD''s. Not sure I''d do that, but it might let me put the media in the basement, instead of the closet, or on the desk next to me. -Kyle
Scott Steagall
2010-May-04 13:39 UTC
[zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?
On 05/04/2010 09:29 AM, Kyle McDonald wrote:> On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote: >> "valrhona at gmail.com" <valrhona at gmail.com> writes: >> >> >>> I have been using DVDs for small backups here and there for a decade >>> now, and have a huge pile of several hundred. They have a lot of >>> overlapping content, so I was thinking of feeding the entire stack >>> into some sort of DVD autoloader, which would just read each disk, and >>> write its contents to a ZFS filesystem with dedup enabled. [...] That >>> would allow me to consolidate a few hundred CDs and DVDs onto probably >>> a terabyte or so, which could then be kept conveniently on a hard >>> drive and archived to tape. >>> >> it would be inconvenient to make a dedup copy on harddisk or tape, you >> could only do it as a ZFS filesystem or ZFS send stream. it''s better to >> use a generic tool like hardlink(1), and just delete files afterwards >> with >> >> > There is a perl script floating around on the internet for years that > will convert copies of files on the same FS to hardlinks (sorry I don''t > have the name handy). So you don''t need ZFS. Once this is done you can > even recreate an ISO and burn it back to DVD (possibly merging hundreds > of CD''s into one DVD (or BD!). The script can also delete the > duplicates, but there isn''t much control over which one it keeps - for > backupsyou may realyl want to keep the earliest (or latest?) backup the > file appeared in.I''ve used "Dirvish" http://www.dirvish.org/ and rsync to do just that...worked great! Scott> > Using ZFS Dedup is an interesting way of doing this. However archiving > the result may be hard. If you use different datasets (FS''s) for each > backup, can you only send 1 dataset at a time (since you can only > snapshot on a dataset level? Won''t that ''undo'' the deduping? > > If you instead put all the backups on on data set, then the snapshot can > theoretically contain the dedpued data. I''m not clear on whether > ''send''ing it will preserve the deduping or not - or if it''s up to the > receiving dataset to recognize matching blocks? If the dedup is in the > stream, then you may be able to write the stream to a DVD or BD. > > Still if you save enough space so that you can add the required level of > redundancy, you could just leave it on disk and chuck the DVD''s. Not > sure I''d do that, but it might let me put the media in the basement, > instead of the closet, or on the desk next to me. > > -Kyle > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss