I am setting up a backup server for the garage, to back up my HTPC in case of theft or fire. The HTPC has a 4TB RAID10 array (mdadm, JFS), and will be connected to the backup server using GB ethernet. The backup server will have a 4TB BTRFS RAID0 array. Debian Testing running on both. I want to keep a duplicate copy of the HTPC data, on the backup server, and I think a regular full file copy is not optimal and may take days to do. So I''m looking for a way to sync the arrays at some interval. Ideally the sync would scan the HTPC with a CRC check to look for differences, copy over the differences, then email me on success. Is there a BTRFS tool that would do this? Also with this system, I''m concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? How about migrating to -all- BTRFS? Would going all BTRFS give any advantages as to synching over GB ethernet? If I fail out one drive in my RAID10, add another and set up these two as a BTRFS RAID0 array, could I then copy over the data remaining on the mdadm array? Any gotchas in setting up the BTRFS RAID0 array? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 9:35 AM, Carl Cook <CACook@quantum-sci.com> wrote:> > I am setting up a backup server for the garage, to back up my HTPC in case of theft or fire. The HTPC has a 4TB RAID10 array (mdadm, JFS), and will be connected to the backup server using GB ethernet. The backup server will have a 4TB BTRFS RAID0 array. Debian Testing running on both. > > I want to keep a duplicate copy of the HTPC data, on the backup server, and I think a regular full file copy is not optimal and may take days to do. So I''m looking for a way to sync the arrays at some interval. Ideally the sync would scan the HTPC with a CRC check to look for differences, copy over the differences, then email me on success. > > Is there a BTRFS tool that would do this?No, but there''s a great tool called rsync that does exactly what you want. :) This is (basically) the same setup we use at work to backup all our remote Linux/FreeBSD systems to a central backups server (although our server runs FreeBSD+ZFS). Just run rsync on the backup server, tell it to connect via ssh to the remote server, and rsync / (root filesystem) into /backups/htpc/ (or whatever directory you want). Use an exclude file to exclude the directories you don''t want backed up (like /proc, /sys, /dev). If you are comfortable compiling software, then you should look into adding the HPN patches to OpenSSH, and enabling the None cipher. That will give you 30-40% network throughput increase. After the rsync completes, snapshot the filesystem on the backup server, using the current date for the name. Then repeat the rsync process the next day, into the exact same directory. Only files that have changed will be transferred. Then snapshot the filesystem using the current date. And repeat ad nauseum. :) Some useful rsync options to read up on: --hard-links --numeric-ids --delete-during --delete-excluded --archive The first time you run the rsync command, it will take awhile, as it transfers every file on the HTPC to the backups server. However, you can stop and restart this process as many times as you like. rsync will just pick up where it left off.> Also with this system, I''m concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover?Using snapshots on the backup server allows you to go back in time to recover files that may have been accidentally deleted, or to recover files that have been corrupted. Be sure to use rsync 3.x, as that will start transferring data a *lot* sooner, shortening the overall time needed for the sync. rsync 2.x scans the entire remote filesystem first, builds a list of files, then compares that list to the files on the backup server. rsync 3.x scans a couple directories, then starts transferring data while scanning ahead. Once you have a working command-line for rsync, adding it to a script and then using cron to schedule it completes the setup. Works beautifully. :) Saved our bacon several times over the past 2 years. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011/1/6 Freddie Cash <fjwcash@gmail.com>:> On Thu, Jan 6, 2011 at 9:35 AM, Carl Cook <CACook@quantum-sci.com> wrote: >> >> I am setting up a backup server for the garage, to back up my HTPC in case of theft or fire. The HTPC has a 4TB RAID10 array (mdadm, JFS), and will be connected to the backup server using GB ethernet. The backup server will have a 4TB BTRFS RAID0 array. Debian Testing running on both. >> >> I want to keep a duplicate copy of the HTPC data, on the backup server, and I think a regular full file copy is not optimal and may take days to do. So I''m looking for a way to sync the arrays at some interval. Ideally the sync would scan the HTPC with a CRC check to look for differences, copy over the differences, then email me on success. >> >> Is there a BTRFS tool that would do this? > > No, but there''s a great tool called rsync that does exactly what you want. :) > > This is (basically) the same setup we use at work to backup all our > remote Linux/FreeBSD systems to a central backups server (although our > server runs FreeBSD+ZFS). > > Just run rsync on the backup server, tell it to connect via ssh to the > remote server, and rsync / (root filesystem) into /backups/htpc/ (or > whatever directory you want). Use an exclude file to exclude the > directories you don''t want backed up (like /proc, /sys, /dev). > > If you are comfortable compiling software, then you should look into > adding the HPN patches to OpenSSH, and enabling the None cipher. That > will give you 30-40% network throughput increase. > > After the rsync completes, snapshot the filesystem on the backup > server, using the current date for the name. > > Then repeat the rsync process the next day, into the exact same > directory. Only files that have changed will be transferred. Then > snapshot the filesystem using the current date. > > And repeat ad nauseum. :) > > Some useful rsync options to read up on: > --hard-links > --numeric-ids > --delete-during > --delete-excluded > --archive > > The first time you run the rsync command, it will take awhile, as it > transfers every file on the HTPC to the backups server. However, you > can stop and restart this process as many times as you like. rsync > will just pick up where it left off. > >> Also with this system, I''m concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? > > Using snapshots on the backup server allows you to go back in time to > recover files that may have been accidentally deleted, or to recover > files that have been corrupted. > > Be sure to use rsync 3.x, as that will start transferring data a *lot* > sooner, shortening the overall time needed for the sync. rsync 2.x > scans the entire remote filesystem first, builds a list of files, then > compares that list to the files on the backup server. rsync 3.x scans > a couple directories, then starts transferring data while scanning > ahead. > > Once you have a working command-line for rsync, adding it to a script > and then using cron to schedule it completes the setup. > > Works beautifully. :) Saved our bacon several times over the past 2 years. > -- > Freddie Cash > fjwcash@gmail.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >Rsync is good, but not for all cases. Be aware of databases files - you should do snapshot filesystem before rsyncing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk <marcin.kuk@gmail.com> wrote:> Rsync is good, but not for all cases. Be aware of databases files - > you should do snapshot filesystem before rsyncing.We script a dump of all databases before the rsync runs, so we get both text and binary backups. If restoring the binary files doesn''t work, then we just suck in the text dumps. If the remote system supports snapshots, doing a snapshot before the rsync runs is a good idea, though. It''ll be nice when more filesystems support in-line snapshots. The LVM method is pure crap. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 1:47 PM, Freddie Cash <fjwcash@gmail.com> wrote:> On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk <marcin.kuk@gmail.com> wrote: >> Rsync is good, but not for all cases. Be aware of databases files - >> you should do snapshot filesystem before rsyncing. > > We script a dump of all databases before the rsync runs, so we get > both text and binary backups. If restoring the binary files doesn''t > work, then we just suck in the text dumps. > > If the remote system supports snapshots, doing a snapshot before the > rsync runs is a good idea, though. It''ll be nice when more > filesystems support in-line snapshots. The LVM method is pure crap.do you also use the --in-place option for rsync? i would think this is critical to getting the most out of "btrfs folding backups", ie. the most reuse between snapshots? im able to set this exact method up for my home network, thats why i ask... i have a central server that runs everything, and i want to sync a couple laptops and netbooks nightly, and a few specific directories whenever they change. btrfs on both ends. better yet, any chance you''d share some scripts? :-) as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: ) read lock and flush tables ) snapshot ) unlock tables ) mount snapshot ) rsync from snapshot ie. the same as whats needed for LVM: http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snapshots.html to get the DB file on disk consistent prior to archiving. C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 12:35 AM, Carl Cook <CACook@quantum-sci.com> wrote:> I want to keep a duplicate copy of the HTPC data, on the backup server> Is there a BTRFS tool that would do this?AFAIK zfs is the only opensource filesystem today that can transfer block-level delta between two snapshots, making it ideal for backup purposes. With other filesystems, something like rsync + LVM snapshot is probably your best bet, and it doesn''t really care what filesystem you use. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 12:07 PM, C Anthony Risinger <anthony@extof.me> wrote:> On Thu, Jan 6, 2011 at 1:47 PM, Freddie Cash <fjwcash@gmail.com> wrote: >> On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk <marcin.kuk@gmail.com> wrote: >>> Rsync is good, but not for all cases. Be aware of databases files - >>> you should do snapshot filesystem before rsyncing. >> >> We script a dump of all databases before the rsync runs, so we get >> both text and binary backups. If restoring the binary files doesn''t >> work, then we just suck in the text dumps. >> >> If the remote system supports snapshots, doing a snapshot before the >> rsync runs is a good idea, though. It''ll be nice when more >> filesystems support in-line snapshots. The LVM method is pure crap. > > do you also use the --in-place option for rsync? i would think this > is critical to getting the most out of "btrfs folding backups", ie. > the most reuse between snapshots? im able to set this exact method up > for my home network, thats why i ask... i have a central server that > runs everything, and i want to sync a couple laptops and netbooks > nightly, and a few specific directories whenever they change. btrfs > on both ends.Yes, we do use --inplace, forgot about that one. Full rsync command used: ${rsync} ${rsync_options} \ --exclude-from="${defaultsdir}/${rsync_exclude}" ${rsync_exclude_server} \ --rsync-path="${rsync_path}" --rsh="${ssh} -p ${rsync_port} -i ${defaultsdir}/${rsync_key}" \ --log-file="${logdir}/${rsync_server}.log" \ ${rsync_user}@${rsync_server}:${basedir}/ ${backupdir}/${sitedir}/${serverdir}/${basedir}/ Where rsync_options is: --archive --delete-during --delete-excluded --hard-links --inplace --numeric-ids --stats> better yet, any chance you''d share some scripts? :-)A description of what we use, including all scripts, is here: http://forums.freebsd.org/showthread.php?t=11971> as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: > > ) read lock and flush tables > ) snapshot > ) unlock tables > ) mount snapshot > ) rsync from snapshotUnfortunately, we don''t use btrfs or LVM on remote servers, so there''s no snapshotting available during the backup run. In a perfect world, btrfs would be production-ready, ZFS would be available on Linux, and we''d no longer need the abomination called LVM. :) Until then, DB text dumps are our fall-back. :) -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 2:13 PM, Freddie Cash <fjwcash@gmail.com> wrote:> On Thu, Jan 6, 2011 at 12:07 PM, C Anthony Risinger <anthony@extof.me> wrote: >> On Thu, Jan 6, 2011 at 1:47 PM, Freddie Cash <fjwcash@gmail.com> wrote: >>> On Thu, Jan 6, 2011 at 11:33 AM, Marcin Kuk <marcin.kuk@gmail.com> wrote: >>>> Rsync is good, but not for all cases. Be aware of databases files - >>>> you should do snapshot filesystem before rsyncing. >>> >>> We script a dump of all databases before the rsync runs, so we get >>> both text and binary backups. If restoring the binary files doesn''t >>> work, then we just suck in the text dumps. >>> >>> If the remote system supports snapshots, doing a snapshot before the >>> rsync runs is a good idea, though. It''ll be nice when more >>> filesystems support in-line snapshots. The LVM method is pure crap. >> >> do you also use the --in-place option for rsync? i would think this >> is critical to getting the most out of "btrfs folding backups", ie. >> the most reuse between snapshots? im able to set this exact method up >> for my home network, thats why i ask... i have a central server that >> runs everything, and i want to sync a couple laptops and netbooks >> nightly, and a few specific directories whenever they change. btrfs >> on both ends. > > Yes, we do use --inplace, forgot about that one. > > Full rsync command used: > ${rsync} ${rsync_options} \ > --exclude-from="${defaultsdir}/${rsync_exclude}" ${rsync_exclude_server} \ > --rsync-path="${rsync_path}" --rsh="${ssh} -p ${rsync_port} -i > ${defaultsdir}/${rsync_key}" \ > --log-file="${logdir}/${rsync_server}.log" \ > ${rsync_user}@${rsync_server}:${basedir}/ > ${backupdir}/${sitedir}/${serverdir}/${basedir}/ > > Where rsync_options is: > --archive --delete-during --delete-excluded --hard-links --inplace > --numeric-ids --stats > >> better yet, any chance you''d share some scripts? :-) > > A description of what we use, including all scripts, is here: > http://forums.freebsd.org/showthread.php?t=11971ah nice, i was hoping i didn''t have to write it all myself; thanks!>> as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: >> >> ) read lock and flush tables >> ) snapshot >> ) unlock tables >> ) mount snapshot >> ) rsync from snapshot > > Unfortunately, we don''t use btrfs or LVM on remote servers, so there''s > no snapshotting available during the backup run. In a perfect world, > btrfs would be production-ready, ZFS would be available on Linux, and > we''d no longer need the abomination called LVM. :)heh, ain''t ''dat the truth.> Until then, DB text dumps are our fall-back. :)always good to have that contingency plan :-) thanks again, C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Unfortunately, we don''t use btrfs or LVM on remote servers, so there''s >> no snapshotting available during the backup run. In a perfect world, >> btrfs would be production-ready, ZFS would be available on Linux, and >> we''d no longer need the abomination called LVM. :)As a matter of fact, ZFS _IS_ available on Linux: http://zfs.kqinfotech.com/ Gordan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 1:06 PM, Gordan Bobic <gordan@bobich.net> wrote:>>> Unfortunately, we don''t use btrfs or LVM on remote servers, so there''s >>> no snapshotting available during the backup run. In a perfect world, >>> btrfs would be production-ready, ZFS would be available on Linux, and >>> we''d no longer need the abomination called LVM. :) > > As a matter of fact, ZFS _IS_ available on Linux: > http://zfs.kqinfotech.com/"Available", "usable", and "production-ready" are not synonymous. :) ZFS on Linux is not even in the experimental/testing stage right now. ZFS-fuse is good for proof-of-concept stuff, but chokes on heavy usage, especially with dedupe enabled. We tried it for a couple weeks to see what was available in ZFS versions above 14, but couldn''t keep it running for more than a day or two at a time. Supposedly, things are better now, but I wouldn''t trust 15 TB of backups to it. :) The Lawrence-Liverpool ZFS module for Linux doesn''t support ZFS filesystems yet, only ZFS volumes. It should be usable as an LVM replacement, though, or as an iSCSI target box. Haven''t tried it yet. The Middle-East (forget which country it''s from) ZFS module for Linux is in the private beta stage, but only available for a few distros and kernel versions, and is significantly slower than ZFS on FreeBSD. Hopefully, it will enter public beta this year, it sounds promising. Don''t think I''d trust 15 TB of backups to it for at least another year, though. If btrfs gets dedupe, "nicer" disk management (it''s hard to use non-pooled storage now), a working fsck (or similar), and integration into Debian, then we may look at that as well. :) -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 06 January 2011 11:16:49 Freddie Cash wrote:> Just run rsync on the backup server, tell it to connect via ssh to the > remote server, and rsync / (root filesystem) into /backups/htpc/ (or > whatever directory you want). Use an exclude file to exclude the > directories you don''t want backed up (like /proc, /sys, /dev).> Then repeat the rsync process the next day, into the exact same > directory. Only files that have changed will be transferred. Then > snapshot the filesystem using the current date.Kool.> > Also with this system, I''m concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? > > Using snapshots on the backup server allows you to go back in time to > recover files that may have been accidentally deleted, or to recover > files that have been corrupted.How? I can see that rsync will not transfer the files that have not changed, but I assume it transfers the changed ones. How can you go back in time? Is there like a snapshot file that records the state of all files there? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2011 06:35 PM, Carl Cook wrote:> I want to keep a duplicate copy of the HTPC data, on the backup > server, and I think a regular full file copy is not optimal and may > take days to do. So I''m looking for a way to sync the arrays at some > interval. Ideally the sync would scan the HTPC with a CRC check to > look for differences, copy over the differences, then email me on > success. > > Is there a BTRFS tool that would do this?There is the command btrfs subvolume find-new which lists the file which have the data (but not the metadata) changed. But it is a very low level tool. I tried to enhance this command ( see a my post titled "[RFC] Improve btrfs subvolume find-new command"), but I never finished this work. Regards G.Baroncelli -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 06 January 2011 12:12:13 Fajar A. Nugraha wrote:> With other filesystems, something like rsync + LVM snapshot is > probably your best bet, and it doesn''t really care what filesystem you > use.I''m not running LVM though. Is this where the snapshotting ability comes from? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 06 January 2011 12:07:17 C Anthony Risinger wrote:> as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: > > ) read lock and flush tables > ) snapshot > ) unlock tables > ) mount snapshot > ) rsync from snapshot > > ie. the same as whats needed for LVM: > > http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snapshots.html > > to get the DB file on disk consistent prior to archiving.I''m a little alarmed by this. Running a mysql server for MythTV database. Do these operations need to somehow be done before rsync? Or Else? I don''t understand what you''re saying. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 1:42 PM, Carl Cook <CACook@quantum-sci.com> wrote:> On Thu 06 January 2011 11:16:49 Freddie Cash wrote: >> > Also with this system, I''m concerned that if there is corruption on the HTPC, it could be propagated to the backup server. Is there some way to address this? Longer intervals to sync, so I have a chance to discover? >> >> Using snapshots on the backup server allows you to go back in time to >> recover files that may have been accidentally deleted, or to recover >> files that have been corrupted. > > How? I can see that rsync will not transfer the files that have not changed, but I assume it transfers the changed ones. How can you go back in time? Is there like a snapshot file that records the state of all files there?I don''t know the specifics of how it works in btrfs, but it should be similar to how ZFS does it. The gist of it is: Each snapshot gives you a point-in-time view of the entire filesystem. Each snapshot can be mounted (ZFS is read-only; btrfs is read-only or read-write). So, you mount the snapshot for 2010-12-15 onto /mnt, then cd to the directory you want (/mnt/htpc/home/fcash/videos/) and copy the file out that you want to restore (cp coolvid.avi ~/). With ZFS, things are nice and simple: - each filesystem has a .zfs/snapshot directory - in there are sub-directories, each named after the snapshot name - cd into the snapshot name, the OS auto-mounts the snapshot, and off you go Btrfs should be similar? Don''t know the specifics. How it works internally, is some of the magic and the beauty of Copy-on-Write filesystems. :) -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2011 09:44 PM, Carl Cook wrote:> On Thu 06 January 2011 12:07:17 C Anthony Risinger wrote: >> as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: >> >> ) read lock and flush tables >> ) snapshot >> ) unlock tables >> ) mount snapshot >> ) rsync from snapshot >> >> ie. the same as whats needed for LVM: >> >> http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snapshots.html >> >> to get the DB file on disk consistent prior to archiving. > > I''m a little alarmed by this. Running a mysql server for MythTV database. Do these operations need to somehow be done before rsync? Or Else? > > I don''t understand what you''re saying.If you take a snapshot and back that up, the consistency of the data will be the same as you would expect it to be if you just yanked the power plug on the machine. Some databases tolerate this better than others, depending on how you have them configured. The date will be recoverable, but you will likely use no more than the transactions that were in-flight when it happend. If you just rsync the data without taking a snapshot first, the data in your backup will likely be completely hosed and unusable unless you''re lucky and the machine was idle at the time. Gordan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 6, 2011 at 1:44 PM, Carl Cook <CACook@quantum-sci.com> wrote:> On Thu 06 January 2011 12:07:17 C Anthony Risinger wrote: >> as for the DB stuff, you definitely need to snapshot _before_ rsync. roughly: >> >> ) read lock and flush tables >> ) snapshot >> ) unlock tables >> ) mount snapshot >> ) rsync from snapshot >> >> ie. the same as whats needed for LVM: >> >> http://blog.dbadojo.com/2007/09/mysql-backups-using-lvm-snapshots.html >> >> to get the DB file on disk consistent prior to archiving. > > I''m a little alarmed by this. Running a mysql server for MythTV database. Do these operations need to somehow be done before rsync? Or Else? > > I don''t understand what you''re saying.Simplest solution is to write a script to create a mysqldump of all databases into a directory, add that to cron so that it runs at the same time everyday, 10-15 minutes before the rsync run is done. That way, rsync to the backup server picks up both the text dump of the database(s), along with the binary files under /var/lib/mysql/* (the actual running database). When you need to restore the HTPC due to failed harddrive or what not, you just rsync everything back to the new harddrive and try to run MythTV. If things work, great, done. If something is wonky, then delete all the MySQL tables/databases, and use the dump file to recreate things. Something like this: #!/bin/bash # Backup mysql databases. # # Take a list of databases, and dump each one to a separate file. debug=0 while getopts "hv" OPTION; do case "${OPTION}" in h) echo "Usage: $0 [-h] [-v]" echo "" echo "-h show this help blurb" echo "-v be verbose about what''s happening" exit 0 ;; v) debug=1 ;; esac done for I in $( mysql -u root --password=blahblahblah -Bse "show databases" ); do OUTFILE=/var/backups/$I.sql if [ $debug = 1 ]; then echo -n "Doing backup for $I:" fi /usr/bin/mysqldump -u root --password=blahblahblah --opt $I > "$OUTFILE" /bin/chmod 600 $OUTFILE if [ $debug = 1 ]; then echo " done." fi done exit 0 That will create a text dump of everything in each database, creating a separate file per database. It can be used via the "mysql" command to recreate the database at a later date. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 06 January 2011 13:58:41 Freddie Cash wrote:> Simplest solution is to write a script to create a mysqldump of all > databases into a directory, add that to cron so that it runs at the > same time everyday, 10-15 minutes before the rsync run is done. That > way, rsync to the backup server picks up both the text dump of the > database(s), along with the binary files under /var/lib/mysql/* (the > actual running database).I am sure glad you guys mentioned database backup in relation to rsync. I would never have guessed. When I do my regular backups I back up the export dump and binary of the database. So overall I do the export dump of the database 15 minutes before rsync. Then snapshot the destination array. Then do the rsync. Right? But how does merely backing up the database prevent it from being hosed in the rsync? Or does snapshot do that? Or does snapshot prevent other data on the disk from getting hosed? I''m about to install the two new 2TB drives in the HTPC to make a BTRFS Raid0 array. Hope it goes According To Doyle... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/06/2011 10:26 PM, Carl Cook wrote:> On Thu 06 January 2011 13:58:41 Freddie Cash wrote: >> Simplest solution is to write a script to create a mysqldump of all >> databases into a directory, add that to cron so that it runs at the >> same time everyday, 10-15 minutes before the rsync run is done. That >> way, rsync to the backup server picks up both the text dump of the >> database(s), along with the binary files under /var/lib/mysql/* (the >> actual running database). > > I am sure glad you guys mentioned database backup in relation to rsync. I would never have guessed. > > When I do my regular backups I back up the export dump and binary of the database. > > So overall I do the export dump of the database 15 minutes before rsync. > Then snapshot the destination array. > Then do the rsync. > Right?Yes, that should be fine. Not sure there''s much point in backing up the binary if you''re backing up the dump. Note that you should be locking all tables before doing a full dump. Otherwise, the dumped tables may be inconsistent with each other (orphaned records).> But how does merely backing up the database prevent it from being > hosed in the rsync? Or does snapshot do that? Or does snapshot > prevent other data on the disk from getting hosed?The data on the disk is only being read, it won''t be damaged. The snapshot ensure that the image you get of the DB is consistent with itself (i.e. no records got written to table A while you were backing up table B). As I said, the consistency with a snapshot is equivalent to the degree of consistency you will get if you just yank the power. Gordan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 06 January 2011 14:26:30 Carl Cook wrote:> According To Doyle...Er, Hoyle... I am trying to create a multi-device BTRFS system using two identical drives. I want them to be raid 0 for no redunancy, and a total of 4TB. But in the wiki it says nothing about using fdisk to set up the drive first. It just basically says for me to: mkfs.btrfs -m raid0 /dev/sdc /dev/sdd Seems to me that for mdadm I had to set each drive as a raid member, assemble the array, then format. Is this not the case with BTRFS? Also in the wiki it says "After a reboot or reloading the btrfs module, you''ll need to use btrfs device scan to discover all multi-device filesystems on the machine". Is this not done automatically? Do I have to set up some script to do this? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 5:26 AM, Carl Cook <CACook@quantum-sci.com> wrote:> On Thu 06 January 2011 13:58:41 Freddie Cash wrote: >> Simplest solution is to write a script to create a mysqldump of all >> databases into a directory, add that to cron so that it runs at the >> same time everyday, 10-15 minutes before the rsync run is done. That >> way, rsync to the backup server picks up both the text dump of the >> database(s), along with the binary files under /var/lib/mysql/* (the >> actual running database). > > I am sure glad you guys mentioned database backup in relation to rsync. I would never have guessed. > > When I do my regular backups I back up the export dump and binary of the database.When dealing with database, binary backup is only usable if all the files backed up is from the same point in time. That means you need either: - tell the database server you''re going to do backup, so it doesn''t change the datafile and store changes temporarily elsewhere (Oracle DB can do this), or - snapshot the storage, whether at block level (e.g. using LVM) or filesystem level (e.g. btrfs and zfs have snapshot capability) - shutdown the database before backup, or The first two options will require some kind of log replay during restore operation, but it doesn''t need downtime on the source, and is much faster than restoring from export dump.> > So overall I do the export dump of the database 15 minutes before rsync.If you''re talking about MySQL, add snapshot the source before rsync. Otherwise your binary backup will be useless.> Then snapshot the destination array. > Then do the rsync. > Right? >Don''t forget --inplace. Very important if you''re using snapshot on destination. Otherwise disk usage will skyrocket.> But how does merely backing up the database prevent it from being hosed in the rsync? Or does snapshot do that? Or does snapshot prevent other data on the disk from getting hosed?what do you mean "being hosed in the rsync"? Rsync shouldn''t destroy anything. Snapshot in the source is necessary to have a consistent point-in-time view of database files.> > I''m about to install the two new 2TB drives in the HTPC to make a BTRFS Raid0 array. Hope it goes According To Doyle...Generally I''d not recommed using Raid0; It''s asking for trouble. Use btrfs raid 10, or use Linux md raid. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, January 07, 2011 00:07:37 Carl Cook wrote:> On Thu 06 January 2011 14:26:30 Carl Cook wrote: > > According To Doyle... > > Er, Hoyle... > > I am trying to create a multi-device BTRFS system using two identical > drives. I want them to be raid 0 for no redunancy, and a total of 4TB. > But in the wiki it says nothing about using fdisk to set up the drive > first. It just basically says for me to: mkfs.btrfs -m raid0 /dev/sdc > /dev/sddI''d suggest at least mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd if you really want raid0> > Seems to me that for mdadm I had to set each drive as a raid member, > assemble the array, then format. Is this not the case with BTRFS? > > Also in the wiki it says "After a reboot or reloading the btrfs module, > you''ll need to use btrfs device scan to discover all multi-device > filesystems on the machine". Is this not done automatically? Do I have > to set up some script to do this? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday, January 06, 2011 22:52:25 Freddie Cash wrote:> On Thu, Jan 6, 2011 at 1:42 PM, Carl Cook <CACook@quantum-sci.com> wrote: > > On Thu 06 January 2011 11:16:49 Freddie Cash wrote: > >> > Also with this system, I''m concerned that if there is corruption on > >> > the HTPC, it could be propagated to the backup server. Is there some > >> > way to address this? Longer intervals to sync, so I have a chance to > >> > discover? > >> > >> Using snapshots on the backup server allows you to go back in time to > >> recover files that may have been accidentally deleted, or to recover > >> files that have been corrupted. > > > > How? I can see that rsync will not transfer the files that have not > > changed, but I assume it transfers the changed ones. How can you go > > back in time? Is there like a snapshot file that records the state of > > all files there? > > I don''t know the specifics of how it works in btrfs, but it should be > similar to how ZFS does it. The gist of it is: > > Each snapshot gives you a point-in-time view of the entire filesystem. > Each snapshot can be mounted (ZFS is read-only; btrfs is read-only or > read-write). So, you mount the snapshot for 2010-12-15 onto /mnt, > then cd to the directory you want (/mnt/htpc/home/fcash/videos/) and > copy the file out that you want to restore (cp coolvid.avi ~/). > > With ZFS, things are nice and simple: > - each filesystem has a .zfs/snapshot directory > - in there are sub-directories, each named after the snapshot name > - cd into the snapshot name, the OS auto-mounts the snapshot, and off you > go > > Btrfs should be similar? Don''t know the specifics. > > How it works internally, is some of the magic and the beauty of > Copy-on-Write filesystems. :)I usually create subvolumes in btrfs root volume: /mnt/btrfs/ |- server-a |- server-b \- server-c then create snapshots of these directories: /mnt/btrfs/ |- server-a |- server-b |- server-c |- snapshots-server-a |- @GMT-2010.12.21-16.48.09 \- @GMT-2010.12.22-16.45.14 |- snapshots-server-b \- snapshots-server-c This way I can use the shadow_copy module for samba to publish the snapshots to windows clients. -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/01/11 16:20, Hubert Kario wrote:> > I usually create subvolumes in btrfs root volume: > > /mnt/btrfs/ > |- server-a > |- server-b > \- server-c > > then create snapshots of these directories: > > /mnt/btrfs/ > |- server-a > |- server-b > |- server-c > |- snapshots-server-a > |- @GMT-2010.12.21-16.48.09 > \- @GMT-2010.12.22-16.45.14 > |- snapshots-server-b > \- snapshots-server-c > > This way I can use the shadow_copy module for samba to publish the snapshots > to windows clients. >Can you post some actual commands to do this part I am extremely confused about btrfs subvolumes v the root filesystem and mounting, particularly in relation to the default subvolume. For instance, if I create the initial file system using mkfs.btrfs and then mount it on /mnt/btrfs is there already a default subvolume? or do I have to make one? What happens when you unmount the whole filesystem and then come back The wiki also makes the following statement *"Note:* to be mounted the subvolume or snapshot have to be in the root of the btrfs filesystem." but you seems to have snapshots at one layer down from the root. I am trying to use this method for my offsite backups - to a large spare sata disk loaded via a usb port. I want to create the main filesystem (and possibly a subvolume - this is where I start to get confused) and rsync my current daily backup files to it. I would then also (just so I get the correct time - rather than do it at the next cycle, as explained below) take a snapshot with a time label. I would transport this disk offsite. I would repeat this in a months time with a totally different disk In a couple of months time - when I come to recycle the first disk for my offsite backup, I would mount the retrieved disk (and again I am confused - mount the complete filesystem or the subvolume?) rsync (--inplace ? - is this necessary) again the various backup files from my server and take another snapshot. I am hoping that this would effectively allow me to leave the snapshot I took last time in place, as because not everything will have changed it won''t have used much space - so effectively I can keep quite a long stream of backup snapshots in place offsite. Eventually of course the disk will start to become full, but I assume I can reclaim the space by deleting some of the old snapshots. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 9, 2011 at 6:46 PM, Alan Chandler <alan@chandlerfamily.org.uk> wrote:>> then create snapshots of these directories: >> >> /mnt/btrfs/ >> |- server-a >> |- server-b >> |- server-c >> |- snapshots-server-a >> |- @GMT-2010.12.21-16.48.09 >> \- @GMT-2010.12.22-16.45.14 >> |- snapshots-server-b >> \- snapshots-server-c> For instance, if I create the initial file system using mkfs.btrfs and then > mount it on /mnt/btrfs is there already a default subvolume? or do I have > to make one?from btrfs FAQ: "A subvolume is like a directory - it has a name, there''s nothing on it when it is created, and it can hold files and other directories. There''s at least one subvolume in every Btrfs filesystem, the "default" subvolume. The equivalent in Ext4 would be a filesystem. Each subvolume behaves as a individual filesystem. "> What happens when you unmount the whole filesystem and then > come backwhatever subvolume and snapshot you already have will still be there.> > The wiki also makes the following statement > > *"Note:* to be mounted the subvolume or snapshot have to be in the root of > the btrfs filesystem." > > > but you seems to have snapshots at one layer down from the root.By default, when you do something like mount /dev/sdb1 /mnt/btrfs the default subvolume will be mounted under /mnt/btrfs. Snapshots and subvolumes will be visible as subdirectories under it, regardless whether it''s in the root or several directories under it. Most likely this is enough for what you need, no need to mess with mounting subvolumes. Mounting subvolumes allows you to see a particular subvolume directly WITHOUT having to see the default subvolume or other subvolumes. This is particularly useful when you use btrfs as "/" or "/home" and want to "rollback" to a previous snapshot. So assuming "snapshots-server-b" above is a snapshot, you can run mount /dev/sdb1 /mnt/btrfs -o subvol=snapshots-server-b and what previously was in /mnt/btrfs/snapshots-server-b will now be accessible under /mnt/btrfs directly, and you can NOT see what was previously under /mnt/btrfs/snapshots-server-c. Also on a side note, you CAN mount subvolumes not located in the root of btrfs filesystem using "subvolid" instead of "subvol". It might require a newer kernel/btrfs-progs version though (works fine in Ubuntu maverick.) -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/01/11 13:54, Fajar A. Nugraha wrote:> > By default, when you do something like > > mount /dev/sdb1 /mnt/btrfs > > the default subvolume will be mounted under /mnt/btrfs. Snapshots and > subvolumes will be visible as subdirectories under it, regardless > whether it''s in the root or several directories under it. Most likely > this is enough for what you need, no need to mess with mounting > subvolumes. > > Mounting subvolumes allows you to see a particular subvolume directly > WITHOUT having to see the default subvolume or other subvolumes. This > is particularly useful when you use btrfs as "/" or "/home" and want > to "rollback" to a previous snapshot. So assuming "snapshots-server-b" > above is a snapshot, you can run > > >I think I start to get it now. Its the fact that subvolumes can be snapshotted etc without mounting them that is the difference. I guess I am too used to thinking like LVM and I was thinking subvolumes where like an LV. They are, but not quite the same. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 9, 2011 at 7:32 AM, Alan Chandler <alan@chandlerfamily.org.uk> wrote:> I think I start to get it now. Its the fact that subvolumes can be > snapshotted etc without mounting them that is the difference. I guess I am > too used to thinking like LVM and I was thinking subvolumes where like an > LV. They are, but not quite the same.Let see if I can match up the terminology and layers a bit: LVM Physical Volume == Btrfs disk == ZFS disk / vdevs LVM Volume Group == Btrfs "filesystem" == ZFS storage pool LVM Logical Volume == Btrfs subvolume == ZFS volume ''normal'' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem Does that look about right? LVM: A physical volume is the lowest layer in LVM and they are combined into a volume group which is then split up into logical volumes, and formatted with a filesystem. Btrfs: A bunch of disks are "formatted" into a btrfs "filesystem" which is then split up into sub-volumes (sub-volumes are auto-formatted with a btrfs filesystem). ZFS: A bunch of disks are combined into virtual devices, then combined into a ZFS storage pool, which can be split up into either volumes formatted with any filesystem, or ZFS filesystems. Just curious, why all the new terminology in btrfs for things that already existed? And why are old terms overloaded with new meanings? I don''t think I''ve seen a write-up about that anywhere (or I don''t remember it if I have). Perhaps it''s time to start looking at separating the btrfs pool creation tools out of mkfs (or renaming mkfs.btrfs), since you''re really building a a storage pool, and not a filesystem. It would prevent a lot of confusion with new users. It''s great that there''s a separate btrfs tool for manipulating btrfs setups, but "mkfs.btrfs" is just wrong for creating the btrfs setup. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote:> On Sun, Jan 9, 2011 at 7:32 AM, Alan Chandler > <alan@chandlerfamily.org.uk> wrote: > > I think I start to get it now. Its the fact that subvolumes can be > > snapshotted etc without mounting them that is the difference. I guess I am > > too used to thinking like LVM and I was thinking subvolumes where like an > > LV. They are, but not quite the same. > > Let see if I can match up the terminology and layers a bit: > > LVM Physical Volume == Btrfs disk == ZFS disk / vdevs > LVM Volume Group == Btrfs "filesystem" == ZFS storage pool > LVM Logical Volume == Btrfs subvolume == ZFS volume > ''normal'' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem > > Does that look about right?Kind of. The thing is that the way that btrfs works is massively different to the way that LVM works (and probably massively different to the way that ZFS works, but I don''t know much about ZFS, so I can''t comment there). I think that trying to think of btrfs in LVM terms is going to lead you to a large number of incorrect conclusions. It''s just not a good model to use.> LVM: A physical volume is the lowest layer in LVM and they are > combined into a volume group which is then split up into logical > volumes, and formatted with a filesystem. > > Btrfs: A bunch of disks are "formatted" into a btrfs "filesystem" > which is then split up into sub-volumes (sub-volumes are > auto-formatted with a btrfs filesystem).No, subvolumes are a part of the whole filesystem. In btrfs, there is only one filesystem. There are 6 main B-trees that store metadata in btrfs (plus a couple of others). One of those is the "filesystem tree" (or FS tree), which contains all the metadata associated with the normal POSIX directory/file namespace (basically all the inode and xattr data). When you create a subvolume, a new FS tree is created, but it shares *all* of the other btrfs B-trees. There is only one filesystem, but there may be distinct namespaces within that filesystem that can be mounted as if they were filesystems. Think of it more like NFSv4, where there''s one overall namespace exported per server, but clients can mount subsections of it.> ZFS: A bunch of disks are combined into virtual devices, then combined > into a ZFS storage pool, which can be split up into either volumes > formatted with any filesystem, or ZFS filesystems.OK, this is _definitely_ not the way that btrfs works. As I said above, a btrfs subvolume is just a namespace that is mountable in its own right. It''s *not* a block device, and can''t be formatted with any other filesystem.> Just curious, why all the new terminology in btrfs for things that > already existed? And why are old terms overloaded with new meanings? > I don''t think I''ve seen a write-up about that anywhere (or I don''t > remember it if I have).The main awkward piece of btrfs terminology is the use of "RAID" to describe btrfs''s replication strategies. It''s not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely.> Perhaps it''s time to start looking at separating the btrfs pool > creation tools out of mkfs (or renaming mkfs.btrfs), since you''re > really building a a storage pool, and not a filesystem. It would > prevent a lot of confusion with new users. It''s great that there''s a > separate btrfs tool for manipulating btrfs setups, but "mkfs.btrfs" is > just wrong for creating the btrfs setup.I think this is the wrong thing to do. I hope my explanation above helps. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. ---
On 09/01/11 18:30, Hugo Mills wrote:> > No, subvolumes are a part of the whole filesystem. In btrfs, there > is only one filesystem. There are 6 main B-trees that store metadata > in btrfs (plus a couple of others). One of those is the "filesystem > tree" (or FS tree), which contains all the metadata associated with > the normal POSIX directory/file namespace (basically all the inode and > xattr data). When you create a subvolume, a new FS tree is created, > but it shares *all* of the other btrfs B-trees. > > There is only one filesystem, but there may be distinct namespaces > within that filesystem that can be mounted as if they were > filesystems. Think of it more like NFSv4, where there''s one overall > namespace exported per server, but clients can mount subsections of > it. > >I think this explanation is still missing the key piece that has confused me despite trying very hard to understand it by reading the wiki. You talk about "Distinct Namespaces", but what I learnt from further up the thread is that this "namespace" is also inside the the "namespace" that makes up the whole filesystem. I mount the whole filesystem, and all my subvolumes are automatically there (at least that is what I find in practice). Its this duality of namespace that is the difficult concept. I am still not sure of there is a default subvolume, and the other subvolumes are defined within its namespace, or whether there is an overall filesystem namespace and subvolumes defined within it and if you mount the default subvolume you would then loose the overall filesystem namespace and hence no longer see the subvolumes. I find the wiki also confusing because it talks about subvolumes having to be at the first level of the filesystem, but again further up this thread there is an example which is used for real of it not being at the first level, but at one level down inside a directory. What it means is that I don''t have a mental picture of how this all works, and all use cases could then be worked out by following this mental picture. I think it would be helpful if the Wiki contained some of the use cases that we have been talking about in this thread - but with more detailed information - like the actual commands used to mount the filesystems like this, and information as to in what circumstances you would perform each action.> The main awkward piece of btrfs terminology is the use of "RAID" to > describe btrfs''s replication strategies. It''s not RAID, and thinking > of it in RAID terms is causing lots of confusion. Most of the other > things in btrfs are, I think, named relatively sanely. >I don''t find this AS confusing, although there is still information missing which I asked in another post that wasn''t answered. I still can''t understand if its possible to initialise a filesystem in degraded mode. If you create the filesystem so that -m RAID1 and -d RAID1 but only have one device - it implies that it writes two copies of both metadata and data to that one device. However if you successfully create the filesystem on two devices and then fail one and mount it -o degraded it appears to suggest it will only write the one copy. I was considering how to migrate from an existing mdmadm Raid1 /lvm arrangement I suppose I could fail one device of the mdm pair and initialise the btrfs filesystem with this one device as the first half of a raid1 mirror and the other as a usb stick, then remove the usb stick and mount the filesystem -o degraded. Copy data to it from the still working half available lv and then dispose of mdmadm device completely and add in the freed up device using btrfs device add -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 09, 2011 at 08:57:12PM +0000, Alan Chandler wrote:> On 09/01/11 18:30, Hugo Mills wrote: > > > > No, subvolumes are a part of the whole filesystem. In btrfs, there > >is only one filesystem. There are 6 main B-trees that store metadata > >in btrfs (plus a couple of others). One of those is the "filesystem > >tree" (or FS tree), which contains all the metadata associated with > >the normal POSIX directory/file namespace (basically all the inode and > >xattr data). When you create a subvolume, a new FS tree is created, > >but it shares *all* of the other btrfs B-trees. > > > > There is only one filesystem, but there may be distinct namespaces > >within that filesystem that can be mounted as if they were > >filesystems. Think of it more like NFSv4, where there''s one overall > >namespace exported per server, but clients can mount subsections of > >it. > > I think this explanation is still missing the key piece that has > confused me despite trying very hard to understand it by reading the > wiki. You talk about "Distinct Namespaces", but what I learnt from > further up the thread is that this "namespace" is also inside the > the "namespace" that makes up the whole filesystem. I mount the > whole filesystem, and all my subvolumes are automatically there (at > least that is what I find in practice). Its this duality of > namespace that is the difficult concept. I am still not sure of > there is a default subvolume, and the other subvolumes are defined > within its namespace, or whether there is an overall filesystem > namespace and subvolumes defined within it and if you mount the > default subvolume you would then lose the overall filesystem > namespace and hence no longer see the subvolumes.There is a root subvolume namespace (subvolid=0), which may contain files, directories, and other subvolumes. This root subvolume is what you see when you mount a newly-created btrfs filesystem. The default subvolume is simply what you get when you mount the filesystem without a subvol or subvolid parameter to mount. Initially, the default subvolume is set to be the root subvolume. If another subvolume is set to be the default, then the root subvolume can only be mounted with the subvolid=0 mount option.> I find the wiki > also confusing because it talks about subvolumes having to be at the > first level of the filesystem, but again further up this thread > there is an example which is used for real of it not being at the > first level, but at one level down inside a directory.Try it, see what happens, and fix the wiki where it''s wrong? :) Or at least say what page this is on, and I can try the experiment and fix it later...> What it means is that I don''t have a mental picture of how this all > works, and all use cases could then be worked out by following this > mental picture. I think it would be helpful if the Wiki contained > some of the use cases that we have been talking about in this thread > - but with more detailed information - like the actual commands used > to mount the filesystems like this, and information as to in what > circumstances you would perform each action.I''ve written a chunk of text about how btrfs''s storage, RAID and subvolumes work. At the moment, though, the wiki is somewhat broken and I can''t actually create the page to put it on... There''s also a page of recipes[1], which is probably the place that the examples you mentioned should go.> > The main awkward piece of btrfs terminology is the use of "RAID" to > >describe btrfs''s replication strategies. It''s not RAID, and thinking > >of it in RAID terms is causing lots of confusion. Most of the other > >things in btrfs are, I think, named relatively sanely. > > I don''t find this AS confusing, although there is still information > missing which I asked in another post that wasn''t answered. I still > can''t understand if its possible to initialise a filesystem in > degraded mode. If you create the filesystem so that -m RAID1 and -d > RAID1 but only have one device - it implies that it writes two > copies of both metadata and data to that one device. However if you > successfully create the filesystem on two devices and then fail one > and mount it -o degraded it appears to suggest it will only write > the one copy.From trying it a while ago, I don''t think it is possible to create a filesystem in degraded mode. Again, I''ll try it again when I have the time to do some experimentation and see what actually happens. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/UseCases -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A clear conscience. Where did you get this taste --- for luxuries, Bernard?
On 09/01/11 22:01, Hugo Mills wrote:>> I find the wiki >> also confusing because it talks about subvolumes having to be at the >> first level of the filesystem, but again further up this thread >> there is an example which is used for real of it not being at the >> first level, but at one level down inside a directory. > > Try it, see what happens, and fix the wiki where it''s wrong? :) > > Or at least say what page this is on, and I can try the experiment > and fix it later...I don''t have an account right now, but the page its on is here. https://btrfs.wiki.kernel.org/index.php/Getting_started#Basic_Filesystem_Commands ...> From trying it a while ago, I don''t think it is possible to create > a filesystem in degraded mode. Again, I''ll try it again when I have > the time to do some experimentation and see what actually happens.As I wondered before it might be possible to fake it by using something like a USB stick initially and then failing it, and replacing it with the real device when ready. If thats possible, then perhaps functionality to do it without faking it could be added to the "to do" list. It sure would be useful in migrating from mdmadm/lvm setup. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 10, 2011 at 5:01 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote:> There is a root subvolume namespace (subvolid=0), which may contain > files, directories, and other subvolumes. This root subvolume is what > you see when you mount a newly-created btrfs filesystem.Is there a detailed explanation in the wiki about subvolid=0? What does "top level 5" in the output of "btrfs subvolume list" mean (I thought "5" was subvolid for root subvolume)? # btrfs subvolume list / ID 256 top level 5 path maverick-base ID 257 top level 5 path kernel-2.6.37> > The default subvolume is simply what you get when you mount the > filesystem without a subvol or subvolid parameter to mount. Initially, > the default subvolume is set to be the root subvolume. If another > subvolume is set to be the default, then the root subvolume can only > be mounted with the subvolid=0 mount option.... and mounting with either subvolid=5 and subvolid=0 gives the same result in my case. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sunday 09 of January 2011 12:46:59 Alan Chandler wrote:> On 07/01/11 16:20, Hubert Kario wrote: > > I usually create subvolumes in btrfs root volume: > > > > /mnt/btrfs/ > > > > |- server-a > > |- server-b > > > > \- server-c > > > > then create snapshots of these directories: > > > > /mnt/btrfs/ > > > > |- server-a > > |- server-b > > |- server-c > > |- snapshots-server-a > > | > > |- @GMT-2010.12.21-16.48.09 > > > > \- @GMT-2010.12.22-16.45.14 > > | > > |- snapshots-server-b > > > > \- snapshots-server-c > > > > This way I can use the shadow_copy module for samba to publish the > > snapshots to windows clients. > > Can you post some actual commands to do this part# create the default subvolume and mount it mkfs.btrfs /dev/sdx mount /dev/sdx /mnt/btrfs # to be able to snapshot individual servers we have to put them to individual # subvolumes btrfs subvolume create /mnt/btrfs/server-a btrfs subvolume create /mnt/btrfs/server-b btrfs subvolume create /mnt/btrfs/server-c # copy data over rsync --exclude /proc [...] root@server-a:/ /mnt/btrfs/server-a rsync --exclude /proc [...] root@server-b:/ /mnt/btrfs/server-b rsync --exclude /proc [...] root@server-c:/ /mnt/btrfs/server-c # create snapshot directories (in the default subvolume) mkdir /mnt/btrfs/{snapshots-server-a,snapshots-server-b,snapshots-server-c} # create snapshot from the synced data: btrfs subvolume snapshot /mnt/btrfs/server-a /mnt/btrfs/snapshots-server- a/@GMT-2010.12.21-16.48.09 # copy new data over: rsync --inplace --exclude /proc [...] root@server-a:/ /mnt/btrfs/server-a # make a new snapshot btrfs subvolume snapshot /mnt/btrfs/server-a /mnt/btrfs/snapshots-server- a/@GMT-2010.12.22-16.45.14 in the end we have 5 subvolumes, 2 of witch are snapshots of the server-a> > I am extremely confused about btrfs subvolumes v the root filesystem and > mounting, particularly in relation to the default subvolume. > > For instance, if I create the initial file system using mkfs.btrfs and > then mount it on /mnt/btrfs is there already a default subvolume? or do > I have to make one? What happens when you unmount the whole filesystem > and then come back > > The wiki also makes the following statement > > *"Note:* to be mounted the subvolume or snapshot have to be in the root > of the btrfs filesystem." > > > but you seems to have snapshots at one layer down from the root. > > > I am trying to use this method for my offsite backups - to a large spare > sata disk loaded via a usb port. > > I want to create the main filesystem (and possibly a subvolume - this is > where I start to get confused) and rsync my current daily backup files > to it. I would then also (just so I get the correct time - rather than > do it at the next cycle, as explained below) take a snapshot with a time > label. I would transport this disk offsite. > > I would repeat this in a months time with a totally different disk > > In a couple of months time - when I come to recycle the first disk for > my offsite backup, I would mount the retrieved disk (and again I am > confused - mount the complete filesystem or the subvolume?) rsync > (--inplace ? - is this necessary) again the various backup files from my > server and take another snapshot.you mount the default, this way you have access to all the data on the HDD, -- inplace is necessary> > I am hoping that this would effectively allow me to leave the snapshot I > took last time in place, as because not everything will have changed it > won''t have used much space - so effectively I can keep quite a long > stream of backup snapshots in place offsite.yes> > Eventually of course the disk will start to become full, but I assume I > can reclaim the space by deleting some of the old snapshots.yes, of course: btrfs subvolume delete /mnt/btrfs/snapshots-server-a/@GMT-2010.12.21-16.48.09 will reclaim the space used up by the deltas -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 09, 2011 at 11:32:20PM +0000, Alan Chandler wrote:> On 09/01/11 22:01, Hugo Mills wrote: > >> I find the wiki > >>also confusing because it talks about subvolumes having to be at the > >>first level of the filesystem, but again further up this thread > >>there is an example which is used for real of it not being at the > >>first level, but at one level down inside a directory. > > > > Try it, see what happens, and fix the wiki where it''s wrong? :) > > > > Or at least say what page this is on, and I can try the experiment > >and fix it later... > I don''t have an account right now, but the page its on is here. > > https://btrfs.wiki.kernel.org/index.php/Getting_started#Basic_Filesystem_CommandsOK, I''ve just tried this. The page is actually accurate, but doesn''t tell the whole story. You can *create* subvolumes and snapshots anywhere: hrm@molinar:~ $ sudo btrfs sub list /mnt ID 256 top level 5 path snap1 ID 257 top level 5 path foo/snap1 ID 258 top level 5 path snap1/snap2 However, you can''t *mount* one by name unless it''s at the top level: hrm@molinar:~ $ sudo mount /dev/vdb /media/btr1 -o subvol=snap1 hrm@molinar:~ $ sudo mount /dev/vdb /media/btr2 -o subvol=foo/snap1 mount: block device /dev/vdb is write-protected, mounting read-only mount: /dev/vdb already mounted or /media/btr2 busy mount: according to mtab, /dev/vdb is mounted on /mnt Mounting by ID works, though: hrm@molinar:~ $ sudo mount /dev/vdb /media/btr2 -o subvolid=257 hrm@molinar:~ $ ls /media/btr2/ foo linux-image.deb snap1 subdir This would seem to imply that the limitation is in mount, rather than in the btrfs kernel implementation. I''ve clarified that particular piece of text on the wiki.> ... > > From trying it a while ago, I don''t think it is possible to create > >a filesystem in degraded mode. Again, I''ll try it again when I have > >the time to do some experimentation and see what actually happens. > > As I wondered before it might be possible to fake it by using > something like a USB stick initially and then failing it, and > replacing it with the real device when ready. > > If thats possible, then perhaps functionality to do it without > faking it could be added to the "to do" list. It sure would be > useful in migrating from mdmadm/lvm setup.I haven''t got to this one yet -- someone on IRC just asked the same thing, though, so I''ll probably have a stab at it tomorrow. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn''t have made us of meat.
On Mon, Jan 10, 2011 at 09:22:49AM +0700, Fajar A. Nugraha wrote:> On Mon, Jan 10, 2011 at 5:01 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote: > > There is a root subvolume namespace (subvolid=0), which may contain > > files, directories, and other subvolumes. This root subvolume is what > > you see when you mount a newly-created btrfs filesystem. > > Is there a detailed explanation in the wiki about subvolid=0? What > does "top level 5" in the output of "btrfs subvolume list" mean (I > thought "5" was subvolid for root subvolume)? > > # btrfs subvolume list / > ID 256 top level 5 path maverick-base > ID 257 top level 5 path kernel-2.6.37 > > > The default subvolume is simply what you get when you mount the > > filesystem without a subvol or subvolid parameter to mount. Initially, > > the default subvolume is set to be the root subvolume. If another > > subvolume is set to be the default, then the root subvolume can only > > be mounted with the subvolid=0 mount option. > > ... and mounting with either subvolid=5 and subvolid=0 gives the same > result in my case.OK, having read through some of the code, it looks like the "5" comes from it being the root FS tree object ID. So, it''s probably quite hard to change that number without making incompatible filesystems. However, the _documented_ (and official) way to mount the root subvolume is to use subvolid=0... :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn''t have made us of meat.
On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote:> On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote: >> Let see if I can match up the terminology and layers a bit: >> >> LVM Physical Volume == Btrfs disk == ZFS disk / vdevs >> LVM Volume Group == Btrfs "filesystem" == ZFS storage pool >> LVM Logical Volume == Btrfs subvolume == ZFS volume >> ''normal'' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem >> >> Does that look about right? > > Kind of. The thing is that the way that btrfs works is massively > different to the way that LVM works (and probably massively different > to the way that ZFS works, but I don''t know much about ZFS, so I can''t > comment there). I think that trying to think of btrfs in LVM terms is > going to lead you to a large number of incorrect conclusions. It''s > just not a good model to use.My biggest issue trying to understand Btrfs is figuring out the layers involved. With ZFS, it''s extremely easy: disks --> vdev --> pool --> filesystems With LVM, it''s fairly easy: disks -> volume group --> volumes --> filesystems But, Btrfs doesn''t make sense to me: disks --> filesystem --> sub-volumes??? So, is Btrfs pooled storage or not? Do you throw 24 disks into a single Btrfs filesystem, and then split that up into separate sub-volumes as needed? From the looks of things, you don''t have to partition disks or worry about sizes before formatting (if the space is available, Btrfs will use it). But it also looks like you still have to manage disks. Or, maybe it''s just that the initial creation is done via mkfs (as in, formatting a partition with a filesystem) that''s tripping me up after using ZFS for so long (zpool creates the storage pool, manages the disks, sets up redundancy levels, etc; zfs creates filesystems and volumes, and sets properties; no newfs/mkfs involved). It looks like ZFS, Btrfs, and LVM should work in similar manners, but the overloaded terminology (pool, volume, sub-volume, filesystem are different in all three) and new terminology that''s only in Btrfs is confusing.>> Just curious, why all the new terminology in btrfs for things that >> already existed? And why are old terms overloaded with new meanings? >> I don''t think I''ve seen a write-up about that anywhere (or I don''t >> remember it if I have). > > The main awkward piece of btrfs terminology is the use of "RAID" to > describe btrfs''s replication strategies. It''s not RAID, and thinking > of it in RAID terms is causing lots of confusion. Most of the other > things in btrfs are, I think, named relatively sanely.No, the main awkward piece of btrfs terminology is overloading "filesystem" to mean "collection of disks" and creating "sub-volume" to mean "filesystem". At least, that''s how it looks from way over here. :)>> Perhaps it''s time to start looking at separating the btrfs pool >> creation tools out of mkfs (or renaming mkfs.btrfs), since you''re >> really building a a storage pool, and not a filesystem. It would >> prevent a lot of confusion with new users. It''s great that there''s a >> separate btrfs tool for manipulating btrfs setups, but "mkfs.btrfs" is >> just wrong for creating the btrfs setup. > > I think this is the wrong thing to do. I hope my explanation above > helps.As I understand it, the mkfs.btrfs is used to create the initial filesystem across X disks with Y redundancy. For everthing else afterward, the btrfs tool is used to add disks, create snapshots, delete snapshots, change redundancy settings, create sub-volumes, etc. Why not just add a "create" option to btrfs and retire mkfs.btrfs completely. Or rework mkfs.btrfs to create sub-volumes of an existing btrfs setup? What would be great is if there was an image that showed the layers in Btrfs and how they interacted with the userspace tools. Having a set of graphics that compared the layers in Btrfs with the layers in the "normal" Linux disk/filesystem partitioning scheme, and the LVM layering, would be best. There''s lots of info in the wiki, but no images, ASCII-art, graphics, etc. Trying to picture this mentally is not working. :) -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 21, 2011 at 11:28:19AM -0800, Freddie Cash wrote:> On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote: > > On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote: > >> Let see if I can match up the terminology and layers a bit: > >> > >> LVM Physical Volume == Btrfs disk == ZFS disk / vdevs > >> LVM Volume Group == Btrfs "filesystem" == ZFS storage pool > >> LVM Logical Volume == Btrfs subvolume == ZFS volume > >> ''normal'' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem > >> > >> Does that look about right? > > > > Kind of. The thing is that the way that btrfs works is massively > > different to the way that LVM works (and probably massively different > > to the way that ZFS works, but I don''t know much about ZFS, so I can''t > > comment there). I think that trying to think of btrfs in LVM terms is > > going to lead you to a large number of incorrect conclusions. It''s > > just not a good model to use. > > My biggest issue trying to understand Btrfs is figuring out the layers involved. > > With ZFS, it''s extremely easy: > > disks --> vdev --> pool --> filesystems > > With LVM, it''s fairly easy: > > disks -> volume group --> volumes --> filesystems > > But, Btrfs doesn''t make sense to me: > > disks --> filesystem --> sub-volumes??? > > So, is Btrfs pooled storage or not? Do you throw 24 disks into a > single Btrfs filesystem, and then split that up into separate > sub-volumes as needed?Yes, except that the subvolumes aren''t quite as separate as you seem to think that they are. There''s no preallocation of storage to a subvolume (in the way that LVM works), so you''re only limited by the amount of free space in the whole pool. Also, data stored in the pool is actually free for use by any subvolume, and can be shared (see the deeper explanation below).> From the looks of things, you don''t have to > partition disks or worry about sizes before formatting (if the space > is available, Btrfs will use it). But it also looks like you still > have to manage disks. > > Or, maybe it''s just that the initial creation is done via mkfs (as in, > formatting a partition with a filesystem) that''s tripping me up after > using ZFS for so long (zpool creates the storage pool, manages the > disks, sets up redundancy levels, etc; zfs creates filesystems and > volumes, and sets properties; no newfs/mkfs involved).So potentially zpool -> mkfs.btrfs, and zfs -> btrfs. However, I don''t know enough about ZFS internals to know whether this is a reasonable analogy to make or not.> It looks like ZFS, Btrfs, and LVM should work in similar manners, but > the overloaded terminology (pool, volume, sub-volume, filesystem are > different in all three) and new terminology that''s only in Btrfs is > confusing. > > >> Just curious, why all the new terminology in btrfs for things that > >> already existed? And why are old terms overloaded with new meanings? > >> I don''t think I''ve seen a write-up about that anywhere (or I don''t > >> remember it if I have). > > > > The main awkward piece of btrfs terminology is the use of "RAID" to > > describe btrfs''s replication strategies. It''s not RAID, and thinking > > of it in RAID terms is causing lots of confusion. Most of the other > > things in btrfs are, I think, named relatively sanely. > > No, the main awkward piece of btrfs terminology is overloading > "filesystem" to mean "collection of disks" and creating "sub-volume" > to mean "filesystem". At least, that''s how it looks from way over > here. :)As I''ve tried to explain, that''s the wrong way of looking at it. Let me have another go in more detail. There''s *one* filesystem. It contains: - *One* set of metadata about the underlying disks (the dev tree). - *One* set of metadata about the distribution of the storage pool on those disks (the chunk tree) - *One* set of metadata about extents within that storage pool (the extent tree). - *One* set of metadata about checksums for each 4k chunk of data within an extent (the checksum tree). - *One* set of metadata about where to find all the other metadata (the root tree). Note that an extent is a sequence of blocks which is both contiguous on disk, and contiguous within one *or more* files. In addition to the above globally-shared metadata, there are multiple metadata sets, each representing a mountable namespace -- these are the subvolumes. Each of these subvolumes holds a directory structure, and all of the POSIX information for each file name within that structure. For each file within a subvolume, there''s a sequence of pointers to the shared extent pool, indicating what blocks on disk are actually holding the data for that file. Note that the actual file data, and the management of its location on the disk (and its replication), is completely shared across subvolumes. The same extent may be used multiple times by different files, and those files may be in any subvolumes on the filesystem. In theory, the same extent could even appear several times in the same file. This sharing is how snapshots and COW copies are implemented. It''s also the basis for Josef''s dedup implementation. Each subvolume (barring the root subvolume) is rooted in some other subvolume, and appears within the namespace of its parent.> >> Perhaps it''s time to start looking at separating the btrfs pool > >> creation tools out of mkfs (or renaming mkfs.btrfs), since you''re > >> really building a a storage pool, and not a filesystem. It would > >> prevent a lot of confusion with new users. It''s great that there''s a > >> separate btrfs tool for manipulating btrfs setups, but "mkfs.btrfs" is > >> just wrong for creating the btrfs setup. > > > > I think this is the wrong thing to do. I hope my explanation above > > helps. > > As I understand it, the mkfs.btrfs is used to create the initial > filesystem across X disks with Y redundancy. For everthing else > afterward, the btrfs tool is used to add disks, create snapshots, > delete snapshots, change redundancy settings, create sub-volumes, etc. > Why not just add a "create" option to btrfs and retire mkfs.btrfs > completely. Or rework mkfs.btrfs to create sub-volumes of an existing > btrfs setup?Because creating a subvolume isn''t making a btrfs filesystem. It''s simply creating a new namespace tree.> What would be great is if there was an image that showed the layers in > Btrfs and how they interacted with the userspace tools. > > Having a set of graphics that compared the layers in Btrfs with the > layers in the "normal" Linux disk/filesystem partitioning scheme, and > the LVM layering, would be best.There''s a diagram at [1], which shows all of the on-disk data structures. It''s somewhat too detailed for this discussion, but in conjunction with the above explanation, it might make more sense to you. If it does, I''ll have a go at putting together a simpler version.> There''s lots of info in the wiki, but no images, ASCII-art, graphics, > etc. Trying to picture this mentally is not working. :)Understood. :) Hugo. [1] https://btrfs.wiki.kernel.org/index.php/Data_Structures -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If the first-ever performance is the première, is the --- last-ever performance the derrière?
On Friday 21 of January 2011 20:28:19 Freddie Cash wrote:> On Sun, Jan 9, 2011 at 10:30 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote: > > On Sun, Jan 09, 2011 at 09:59:46AM -0800, Freddie Cash wrote: > >> Let see if I can match up the terminology and layers a bit: > >> > >> LVM Physical Volume == Btrfs disk == ZFS disk / vdevs > >> LVM Volume Group == Btrfs "filesystem" == ZFS storage pool > >> LVM Logical Volume == Btrfs subvolume == ZFS volume > >> ''normal'' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem > >> > >> Does that look about right? > > > > Kind of. The thing is that the way that btrfs works is massively > > different to the way that LVM works (and probably massively different > > to the way that ZFS works, but I don''t know much about ZFS, so I can''t > > comment there). I think that trying to think of btrfs in LVM terms is > > going to lead you to a large number of incorrect conclusions. It''s > > just not a good model to use. > > My biggest issue trying to understand Btrfs is figuring out the layers > involved. > > With ZFS, it''s extremely easy: > > disks --> vdev --> pool --> filesystems > > With LVM, it''s fairly easy: > > disks -> volume group --> volumes --> filesystems > > But, Btrfs doesn''t make sense to me: > > disks --> filesystem --> sub-volumes??? > > So, is Btrfs pooled storage or not? Do you throw 24 disks into a > single Btrfs filesystem, and then split that up into separate > sub-volumes as needed? From the looks of things, you don''t have to > partition disks or worry about sizes before formatting (if the space > is available, Btrfs will use it). But it also looks like you still > have to manage disks. > > Or, maybe it''s just that the initial creation is done via mkfs (as in, > formatting a partition with a filesystem) that''s tripping me up after > using ZFS for so long (zpool creates the storage pool, manages the > disks, sets up redundancy levels, etc; zfs creates filesystems and > volumes, and sets properties; no newfs/mkfs involved). > > It looks like ZFS, Btrfs, and LVM should work in similar manners, but > the overloaded terminology (pool, volume, sub-volume, filesystem are > different in all three) and new terminology that''s only in Btrfs is > confusing.With btrfs you need to have *a* filesystem, once you have it, you can add and remove disks/partitions from it, no need to use ''mkfs.btrfs'', just ''btrfs''. As for managing storage space: you don''t. There''s one single pool of storage that you can''t divide. Quota support is also absent. The only thing you can do with storage is add more or remove some.> >> Just curious, why all the new terminology in btrfs for things that > >> already existed? And why are old terms overloaded with new meanings? > >> I don''t think I''ve seen a write-up about that anywhere (or I don''t > >> remember it if I have). > > > > The main awkward piece of btrfs terminology is the use of "RAID" to > > describe btrfs''s replication strategies. It''s not RAID, and thinking > > of it in RAID terms is causing lots of confusion. Most of the other > > things in btrfs are, I think, named relatively sanely. > > No, the main awkward piece of btrfs terminology is overloading > "filesystem" to mean "collection of disks" and creating "sub-volume" > to mean "filesystem". At least, that''s how it looks from way over > here. :)subvolumes are made to be able to snapshot only part of files residing on a filesystem, that''s their only feature right now> > >> Perhaps it''s time to start looking at separating the btrfs pool > >> creation tools out of mkfs (or renaming mkfs.btrfs), since you''re > >> really building a a storage pool, and not a filesystem. It would > >> prevent a lot of confusion with new users. It''s great that there''s a > >> separate btrfs tool for manipulating btrfs setups, but "mkfs.btrfs" is > >> just wrong for creating the btrfs setup. > > > > I think this is the wrong thing to do. I hope my explanation above > > helps. > > As I understand it, the mkfs.btrfs is used to create the initial > filesystem across X disks with Y redundancy. For everthing else > afterward, the btrfs tool is used to add disks, create snapshots, > delete snapshots, change redundancy settings, create sub-volumes, etc. > Why not just add a "create" option to btrfs and retire mkfs.btrfs > completely. Or rework mkfs.btrfs to create sub-volumes of an existing > btrfs setup?all linux file systems use mkfs.<fs name>, there''s no reason why btrfs shouldn''t. For creation of FS you use one command, for management you use other command. I''d say that''s a pretty sane division.> > What would be great is if there was an image that showed the layers in > Btrfs and how they interacted with the userspace tools.It would either be * very complicated (if it included different allocation groups and how they interact) and useless for users * very simple (you put one fs on many disks, snapshotable part of FS is called subvolume) and pointless...> Having a set of graphics that compared the layers in Btrfs with the > layers in the "normal" Linux disk/filesystem partitioning scheme, and > the LVM layering, would be best.btrfs doesn''t have layers to compare so it''s rather hard to make such graph.> There''s lots of info in the wiki, but no images, ASCII-art, graphics, > etc. Trying to picture this mentally is not working. :)-- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 22, 2011 at 5:45 AM, Hugo Mills <hugo-lkml@carfax.org.uk> wrote:> On Fri, Jan 21, 2011 at 11:28:19AM -0800, Freddie Cash wrote: >> So, is Btrfs pooled storage or not? Do you throw 24 disks into a >> single Btrfs filesystem, and then split that up into separate >> sub-volumes as needed? > > Yes, except that the subvolumes aren''t quite as separate as you > seem to think that they are. There''s no preallocation of storage to a > subvolume (in the way that LVM works), so you''re only limited by the > amount of free space in the whole pool. Also, data stored in the pool > is actually free for use by any subvolume, and can be shared (see the > deeper explanation below).Ah, perfect, that I understand. :) It''s the same with ZFS: you add storage to a pool, filesystems in the pool are free to use as much as there is available, you don''t have to pre-allocate or partition or anything that. ZFS supports quotas and reservations, though, so you can (if you want/need) allocate bytes to specific filesystems.>> From the looks of things, you don''t have to >> partition disks or worry about sizes before formatting (if the space >> is available, Btrfs will use it). But it also looks like you still >> have to manage disks. >> >> Or, maybe it''s just that the initial creation is done via mkfs (as in, >> formatting a partition with a filesystem) that''s tripping me up after >> using ZFS for so long (zpool creates the storage pool, manages the >> disks, sets up redundancy levels, etc; zfs creates filesystems and >> volumes, and sets properties; no newfs/mkfs involved). > > So potentially zpool -> mkfs.btrfs, and zfs -> btrfs. However, I > don''t know enough about ZFS internals to know whether this is a > reasonable analogy to make or not.That''s what I figured. It''s not a perfect analogue, but it''s close enough. Clears things up a bit. The big different is that ZFS separates storage management (the pool) from filesystem management; while btrfs "creates a pool" underneath one filesystem, and allows you to split it up via sub-volumes. I think I''m figuring this out. :)> Note that the actual file data, and the management of its location > on the disk (and its replication), is completely shared across > subvolumes. The same extent may be used multiple times by different > files, and those files may be in any subvolumes on the filesystem. In > theory, the same extent could even appear several times in the same > file. This sharing is how snapshots and COW copies are implemented. > It''s also the basis for Josef''s dedup implementation.That''s similar to how ZFS works, only they use "blocks" instead of "extents", but it works in a similar manner. I think I''ve got this mostly figured out. Now, to just wait for multiple parity redundancy (RAID5/6/+) support to hit the tree, so I can start playing around with it. :) Thanks for taking the time to explain some things. Sorry if I came across as being harsh or whatnot. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/22/2011 02:55 PM, Hubert Kario wrote:>> It looks like ZFS, Btrfs, and LVM should work in similar manners, but >> the overloaded terminology (pool, volume, sub-volume, filesystem are >> different in all three) and new terminology that''s only in Btrfs is >> confusing. > > With btrfs you need to have *a* filesystem, once you have it, you can add and > remove disks/partitions from it, no need to use ''mkfs.btrfs'', just ''btrfs''.That''s just a design decision, right? There''s no need for a "default" or "root" subvolume. It should be rather easy to change btrfs so that you first have to create a "storage pool" which combines disks for btrfs, and on top of that you can create "filesystems" which are just subvolumes. The creation of a "storage pool" could be very similar to the current mkfs, just without the creation of a root subvolume. A new, simpler mkfs would then just create a subvolume on top of the "storage pool" that can be mounted. Regards, Kaspar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, January 25, 2011 18:29:35 Kaspar Schleiser wrote:> On 01/22/2011 02:55 PM, Hubert Kario wrote: > >> It looks like ZFS, Btrfs, and LVM should work in similar manners, but > >> the overloaded terminology (pool, volume, sub-volume, filesystem are > >> different in all three) and new terminology that''s only in Btrfs is > >> confusing. > > > > With btrfs you need to have *a* filesystem, once you have it, you can add > > and remove disks/partitions from it, no need to use ''mkfs.btrfs'', just > > ''btrfs''. > > That''s just a design decision, right? There''s no need for a "default" or > "root" subvolume. > > It should be rather easy to change btrfs so that you first have to > create a "storage pool" which combines disks for btrfs, and on top of > that you can create "filesystems" which are just subvolumes. > > The creation of a "storage pool" could be very similar to the current > mkfs, just without the creation of a root subvolume. > > A new, simpler mkfs would then just create a subvolume on top of the > "storage pool" that can be mounted. > > Regards, > KasparI''m not sure, but for btrfs to support storage pools the way ZFS does would require change in disk layout. Besides, I don''t see *why* this should be done... And as far as I know ZFS doesn''t support different reduncancy levels for different files residing in the same directory. You can have ~/1billion$-project.tar.gz with triple redundancy and ~/temp.video.mkv with no reduncancy with btrfs... Regards, -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 25, 2011 at 9:43 AM, Hubert Kario <hka@qbs.com.pl> wrote:> Besides, I don''t see *why* this should be done... > > And as far as I know ZFS doesn''t support different reduncancy levels for > different files residing in the same directory. You can have > ~/1billion$-project.tar.gz with triple redundancy and ~/temp.video.mkv with no > reduncancy with btrfs...With ZFS, redundancy (mirror, raidz1, raidz2, raidz3) is done at the storage pool layer, and affects the entire pool. You can mix and match redundancy levels (combine mirror vdevs and raidz vdevs in the same pool), but there''s no way to control what data blocks go to which vdev, as it''s all just one giant pool of storage. However, there is a "copies" property for each filesystem that affects how many copies of data blocks are stored, to increase the redundancy for that filesystem. For example, you can create a storage pool using 2 mirror vdevs (4 drives; equivalent to a RAID10 setup); then create a filesystem with copies=2. Thus, any blocks written to that filesystem will be stored twice, each of which is then striped across the two vdevs, and then mirrored to each disk in the vdevs, potentially leading to 4 (or more) blocks of data written to disk. This is similar to using Linux md to create RAID arrays underneath LVM volume groups. The redundancy is managed via md; the filesystems just see a collection of blocks to write to. The big difference (from what I understand) between ZFS and Btrfs is the layering. ZFS separate storage management from filesystem management, so redundancy happens at lower layers and the filesystem just sends blocks to the pool. Whereas Btrfs combines them into one, so that redundancy is managed at the filesystem level and can be changed on a per-directory (or per-sub-volume?) basis, with the filesystem handling the writes and the redundancy. I don''t pretend to understand all the intricacies of how Btrfs works (I''m working on it), but the layering in ZFS is very nice and easy to work with in comparison. Interesting how ZFS is considered the "rampant layering violation", though. ;) :) :D -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday, January 25, 2011 18:59:39 Freddie Cash wrote:> On Tue, Jan 25, 2011 at 9:43 AM, Hubert Kario <hka@qbs.com.pl> wrote: > > Besides, I don''t see *why* this should be done... > > > > And as far as I know ZFS doesn''t support different reduncancy levels for > > different files residing in the same directory. You can have > > ~/1billion$-project.tar.gz with triple redundancy and ~/temp.video.mkv > > with no reduncancy with btrfs... > > With ZFS, redundancy (mirror, raidz1, raidz2, raidz3) is done at the > storage pool layer, and affects the entire pool. You can mix and > match redundancy levels (combine mirror vdevs and raidz vdevs in the > same pool), but there''s no way to control what data blocks go to which > vdev, as it''s all just one giant pool of storage. > > However, there is a "copies" property for each filesystem that affects > how many copies of data blocks are stored, to increase the redundancy > for that filesystem. For example, you can create a storage pool using > 2 mirror vdevs (4 drives; equivalent to a RAID10 setup); then create a > filesystem with copies=2. Thus, any blocks written to that filesystem > will be stored twice, each of which is then striped across the two > vdevs, and then mirrored to each disk in the vdevs, potentially > leading to 4 (or more) blocks of data written to disk. > > This is similar to using Linux md to create RAID arrays underneath LVM > volume groups. The redundancy is managed via md; the filesystems just > see a collection of blocks to write to. > > The big difference (from what I understand) between ZFS and Btrfs is > the layering. ZFS separate storage management from filesystem > management, so redundancy happens at lower layers and the filesystem > just sends blocks to the pool. Whereas Btrfs combines them into one, > so that redundancy is managed at the filesystem level and can be > changed on a per-directory (or per-sub-volume?) basis, with the > filesystem handling the writes and the redundancy.Right now you can''t change the raid level at all but there are hooks planned to enable selecting raid level on a per file basis. btrfs allows for better management of space ond less over provisioning. So I''d say that management of storage space with btrfs is even easier than with ZFS: admin sets the default redundancy level for whole file system (let''s say that it''s a 4 disk system) to a RAID1 with two copies. After seting up the system sets the redundancy level in directories with databases to RAID10 Users storing big files use RAID5 for some files. one of the drives fails, admin removes the drive from set, schedules reballance. the set is smaller but all reduncancy is preserved New drives arrive, they are added to fs. FS is reballanced for the second time to achive better performance (the space would be usable even without it).> I don''t pretend to understand all the intricacies of how Btrfs works > (I''m working on it), but the layering in ZFS is very nice and easy to > work with in comparison. Interesting how ZFS is considered the > "rampant layering violation", though. ;) :) :Dbtrfs is much simpler from user point of view :) as for rampant layering violation: most of the code that deals with stored data isn''t concerned with raid level, in contrast with zfs. In other words, its in the code, not interface. -- Hubert Kario QBS - Quality Business Software 02-656 Warszawa, ul. Ksawerów 30/85 tel. +48 (22) 646-61-51, 646-74-24 www.qbs.com.pl -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html