On Fri 07 January 2011 08:14:17 Hubert Kario wrote:> I''d suggest at least > mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd > if you really want raid0I don''t fully understand -m or -d. Why would this make a truer raid0 that with no options? Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive array? Or is this all that''s needed: # mkfs.btrfs /dev/sdb /dev/sdc # btrfs filesystem show Is this related to ''subvolumes''? The FAQ implies that a subvolume is like a directory, but also like a partition. What''s the rationale for being able to create a subvolume under a subvolume, as Hubert says so he can "use the shadow_copy module for samba to publish the snapshots to windows clients." I don''t have any windows clients, but what difference does his structure make? I know that if using SATA+LVM, turn off the writeback cache on the drive, as it doesn''t do cash flushing, and ensure NCQ is on. But does this also apply to a BTRFS array? If so, is this done in rc.local with hdparm -I /dev/sdb hdparm -I /dev/sdc How do you know what options to rsync are on by default? I can''t find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync? If using the --numeric-ids switch for rsync, do you just have to manually make sure the IDs and usernames are the same on source and destination machines? For files that fail to transfer, wouldn''t it be wise to use --partial-dir=DIR to at least recover part of lost files? The rsync man page says that rsync uses ssh by default, but is that the case? I think -e may be related to engaging ssh, but don''t understand the explanation. So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it''s time for a backup (cron) they: - stop mysql, dump the database somewhere, start mysql; - connect to the backup server''s rsync port and dump their data to (hopefully) some specific place there. Right? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 11:15 AM, Carl Cook <CACook@quantum-sci.com> wrote:> > On Fri 07 January 2011 08:14:17 Hubert Kario wrote: >> I''d suggest at least >> mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd >> if you really want raid0 > > I don''t fully understand -m or -d. Why would this make a truer raid0 that with no options?this will give you RAID0 for your data, but RAID1 for your metadata, making it less likely that the FS itself gets corrupted, even though you will lose some data in crash cases, if i understand correctly.> Is it necessary to use fdisk on new drives in creating a BTRFS multi-drive array? Or is this all that''s needed: > # mkfs.btrfs /dev/sdb /dev/sdc > # btrfs filesystem showdepends on whether you need /boot partitions or other partitions. what you have works fine though.> Is this related to ''subvolumes''? The FAQ implies that a subvolume is like a directory, but also like a partition. What''s the rationale for being able to create a subvolume under a subvolume, as Hubert says so he can "use the shadow_copy module for samba to publish the snapshots to windows clients." I don''t have any windows clients, but what difference does his structure make?just his preference to put it there... the snapshot of a snapshot can go anywhere. it doesn''t have to reside under it''s "parent", the parent was just used as a base, it''s not bound to it in any way AFAIK.> How do you know what options to rsync are on by default? I can''t find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync?the links and command Freddie Cash posted are a really good base to work from.> So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it''s time for a backup (cron) they: > - stop mysql, dump the database somewhere, start mysql; > - connect to the backup server''s rsync port and dump their data to (hopefully) some specific place there. > Right?you don''t have to stop mysql, you just need to "freeze" any new, incoming writes, and flush (ie. let finish) whatever is happening right now. this ensures mysql is _internally_ consistent on the disk. see comment by Lloyd Standish here: http://dev.mysql.com/doc/refman/5.1/en/backup-methods.html C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 7, 2011 at 9:15 AM, Carl Cook <CACook@quantum-sci.com> wrote:> How do you know what options to rsync are on by default? I can''t find this anywhere. For example, it seems to me that --perms -ogE --hard-links and --delete-excluded should be on by default, for a true sync?Who cares which ones are on by default? List the ones you want to use on the command-line, everytime. That way, if the defaults change, your setup won''t.> If using the --numeric-ids switch for rsync, do you just have to manually make sure the IDs and usernames are the same on source and destination machines?You use the --numeric-ids switch so that it *doesn''t* matter if the IDs/usernames are the same. It just sends the ID number on the wire. Sure, if you do an ls on the backup box, the username will appear to be messed up. But if you compare the user ID assigned to the file, and the user ID to the backed up etc/passwd file, they are correct. Then, if you ever need to restore the HTPC from backups, the etc/passwd file is transferred over, the user IDs are transferred over, and when you do an ls on the HTPC, everything matches up correctly.> For files that fail to transfer, wouldn''t it be wise to use --partial-dir=DIR to at least recover part of lost files?Or, just run rsync again, if the connection is dropped.> The rsync man page says that rsync uses ssh by default, but is that the case? I think -e may be related to engaging ssh, but don''t understand the explanation.Does it matter what the default is, if you specify exactly how you want it to work on the command-line?> So for my system where there is a backup server, I guess I run the rsync daemon on the backup server which presents a port, then when the other systems decide it''s time for a backup (cron) they: > - stop mysql, dump the database somewhere, start mysql; > - connect to the backup server''s rsync port and dump their data to (hopefully) some specific place there. > Right?That''s one way (push backups). It works ok for small numbers of systems being backed up. But get above a handful of machines, and it gets very hard to time everything so that you don''t hammer the disks on the backup server. Pull backups (backups server does everything) works better, in my experience. Then you just script things up once, run 1 script, worry about 1 schedule, and everything is stored on the backups server. No need to run rsync daemons everywhere, just run the rsync client, using -e ssh, and let it do everything. If you need it to run a script on the remote machine first, that''s easy enough to do: - ssh to remote system, run script to stop DBs, dump DBs, snapshot FS, whatever - then run rsync - ssh to remote system run script to start DBs, delete snapshot, whatever You''re starting to over-think things. Keep it simple, don''t worry about defaults, specify everything you want to do, and do it all from the backups box. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wow, this rsync and backup system is pretty amazing. I''ve always just tarred each directory manually, but now find I can RELIABLY automate backups, and have SOLID versioning to boot. Thanks to everyone who advised, especially Freddie and Anthony. I am still waiting for hardware for my backup server, but have been preparing. On the backup server I''ll be doing pull backups for everything except my phone (which is connected intermittently). I''m going to set up a cron script on the backup server to pull backups once a week (as opposed to once/mo which I''ve done for 12 years). I am at a loss how to to lock the database on the HTPC while exporting the dump, as per Lloyd Standish, but will study it. (Freddie gave a nice script, but it doesn''t seem to lock/flush first) Also don''t know how to email results/success/fail on completion, as I''ve not a very good coder. But here is my proposed cron: btrfs subvolume snapshot hex:///home /media/backups/snapshots/hex-{DATE} rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-hex hex:///home /media/backups/hex btrfs subvolume snapshot droog:///home /media/backups/snapshots/droog-{DATE} rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-droog droog:///home /media/backups/droog My root filesystems are ext4, so I guess they cannot be snapshotted before backup. My home directories are/will be BTRFS though. On Fri 07 January 2011 08:14:17 Hubert Kario wrote:>> I''d suggest at least >> mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd >> if you really want raid0 > > I don''t fully understand -m or -d. Why would this make a truer raid0 that with no options?I am beginning to suspect that this is the -default- behavior, as described in the wiki: "# Create a filesystem across four drives (metadata mirrored, data striped)" Should I turn off the writeback cache on each drive when running BTRFS? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home sending incremental file list sent 658660 bytes received 2433 bytes 1322186.00 bytes/sec total size is 1355368091626 speedup is 2050192.77 # df /media/disk Filesystem 1K-blocks Used Available Use% Mounted on /dev/md2 1868468340 1315408384 553059956 71% /media/disk # df /home Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb 3907029168 1325491836 2581537332 34% /home On Fri 07 January 2011 10:55:43 Carl Cook wrote:> > Wow, this rsync and backup system is pretty amazing. I''ve always just tarred each directory manually, but now find I can RELIABLY automate backups, and have SOLID versioning to boot. Thanks to everyone who advised, especially Freddie and Anthony. > > I am still waiting for hardware for my backup server, but have been preparing. On the backup server I''ll be doing pull backups for everything except my phone (which is connected intermittently). I''m going to set up a cron script on the backup server to pull backups once a week (as opposed to once/mo which I''ve done for 12 years). I am at a loss how to to lock the database on the HTPC while exporting the dump, as per Lloyd Standish, but will study it. (Freddie gave a nice script, but it doesn''t seem to lock/flush first) Also don''t know how to email results/success/fail on completion, as I''ve not a very good coder. > > But here is my proposed cron: > btrfs subvolume snapshot hex:///home /media/backups/snapshots/hex-{DATE} > rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-hex hex:///home /media/backups/hex > btrfs subvolume snapshot droog:///home /media/backups/snapshots/droog-{DATE} > rsync --archive --hard-links --delete-during --delete-excluded --inplace --numeric-ids -e ssh --exclude-from=/media/backups/exclude-droog droog:///home /media/backups/droog > > My root filesystems are ext4, so I guess they cannot be snapshotted before backup. My home directories are/will be BTRFS though. > > > On Fri 07 January 2011 08:14:17 Hubert Kario wrote: > >> I''d suggest at least > >> mkfs.btrfs -m raid1 -d raid0 /dev/sdc /dev/sdd > >> if you really want raid0 > > > > I don''t fully understand -m or -d. Why would this make a truer raid0 that with no options? > > I am beginning to suspect that this is the -default- behavior, as described in the wiki: > "# Create a filesystem across four drives (metadata mirrored, data striped)" > > Should I turn off the writeback cache on each drive when running BTRFS? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 08, 2011 at 05:25:19AM -0800, Carl Cook wrote:> In addition to the questions below, if anyone has a chance could you > advise on why my destination drive has more data than the source after > this command: > # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home > sending incremental file list > sent 658660 bytes received 2433 bytes 1322186.00 bytes/sec > total size is 1355368091626 speedup is 2050192.77 > > # df /media/disk > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/md2 1868468340 1315408384 553059956 71% /media/disk > # df /home > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdb 3907029168 1325491836 2581537332 34% /homeThis has little to do with btrfs; it happens with many file systems due to file system infrastructure details such as directory sizes, sparse file handling, file fragmentation, etc. For example: If you have a directory with a huge number of file names in it, the actual directory disk space used will be large and will not be reclaimed when you delete all the file names from the directory. You would have to remove the directory itself and recreate it to reclaim that space. Also, using rsync without --sparse (which can''t work with --inplace), sparse files on the source may get expanded to take real disk blocks on the destination. Unless you use "dd" to copy a partition exactly, including all the file system infrastructure details, any copy you make will be subject to the vagaries of how the file system decides to lay out the data. -- | Ian! D. Allen - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook <CACook@quantum-sci.com> wrote:> > In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: > # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home > sending incremental file listWhat happens if you delete /home, then run the command again, but without the *? You generally don''t use wildcards for the source or destination when using rsync. You just tell it which directory to start in. If you do an "ls /home" and "ls /media/disk" are they different? -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I''d rather not do the copy again unless necessary, as it took a day. Directories look identical, but who knows? I''m going to try and figure out how to do a file-by-file crc check, for peace of mind. On Sat 08 January 2011 17:26:25 Freddie Cash wrote:> On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook <CACook@quantum-sci.com> wrote: > > > > In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: > > # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home > > sending incremental file list > > What happens if you delete /home, then run the command again, but > without the *? You generally don''t use wildcards for the source or > destination when using rsync. You just tell it which directory to > start in. > > If you do an "ls /home" and "ls /media/disk" are they different? > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 9, 2011 at 8:16 PM, Carl Cook <CACook@quantum-sci.com> wrote:> > I''d rather not do the copy again unless necessary, as it took a day. > > Directories look identical, but who knows? I''m going to try and figure out how to do a file-by-file crc check, for peace of mind.try "du --apparent-size -slh" It should rule out any differences caused by sparse files and hardlinks.> > > On Sat 08 January 2011 17:26:25 Freddie Cash wrote: >> On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook <CACook@quantum-sci.com> wrote: >> > >> > In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: >> > # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /homeAre you SURE you don''t get the command mixed up? The last argument to rsync should be the destination. Your command looks like you''re copying things to /home. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/01/11 13:37, Fajar A. Nugraha wrote:> On Sun, Jan 9, 2011 at 8:16 PM, Carl Cook<CACook@quantum-sci.com> wrote: > >> I''d rather not do the copy again unless necessary, as it took a day. >> >> Directories look identical, but who knows? I''m going to try and figure out how to do a file-by-file crc check, for peace of mind. >> > try "du --apparent-size -slh" > It should rule out any differences caused by sparse files and hardlinks. > > >> >> On Sat 08 January 2011 17:26:25 Freddie Cash wrote: >> >>> On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook<CACook@quantum-sci.com> wrote: >>> >>>> In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: >>>> # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home >>>> > Are you SURE you don''t get the command mixed up? The last argument to > rsync should be the destination. Your command looks like you''re > copying things to /home. >What is also important is that use of * - it means all the . files at the top level are NOT being copied rsync is clever enough to notice if you have the / at the end of the source to know whether you want the directory to be put into the destination or the contents of the directory. The / at the end of the source means copy the contents. This could be (I am not sure of the exact scope of --delete) the reason why the destination has more data than the source. If --delete is not deleting /home/.* files (if there any there). -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html