I want to write a differential backup tool for btrfs snapshots. The new "btrfs subvol find-new"-command sounds great on first encounter, but I''m missing informations about updated directories. I would need a list of updated directories to scan for deleted files. I had a look at find_updated_files() in btrfs-list.c. To me it seems as if the ioctl would only return the extents of regular files. The function find_root_gen() in btrfs-list.c seems to return the newest generation in a given snapshot. It would be nice to have this exported as a user command (e.g. "btrfs subvol newest-gen") then one could use the output of btrfs subvol newest-gen <old snapshot> (plus 1) as the input generation number to btrfs subvol find-new <new snapshot> <gen+1> (I''m using kernel 2.6.32.10 with the most current btrfs-kernel modules and userland tools as of last Saturday.) Greetings, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 26, 2010 at 11:18:07AM +0100, Michael Niederle wrote:> I want to write a differential backup tool for btrfs snapshots. > > The new "btrfs subvol find-new"-command sounds great on first encounter, but I''m > missing informations about updated directories. I would need a list of updated > directories to scan for deleted files. > > I had a look at find_updated_files() in btrfs-list.c. To me it seems as if > the ioctl would only return the extents of regular files.Well, the ioctl is actually returning all the updated inodes, but the command ignores them. Every piece of metadata in the btrfs btree has a key, and every key has a type field. It''s the type field that makes keys for inodes different from keys for file extents or directory items. In find_udpated_files, it does this: sk->min_type = 0; sk->max_type = BTRFS_EXTENT_DATA_KEY; This means the search ioctl in the kernel won''t return anything with a key bigger than BTRFS_EXTENT_DATA_KEY. If you look in ctree.h, you''ll see that BTRFS_EXTENT_DATA_KEY is actually bigger than inodes and directory items, so we''re getting most of the file and directory metadata with this search. In the loop in find_updates_files, it does this: if (sh->type == BTRFS_EXTENT_DATA_KEY && Which limits the output to only extent data keys.> > The function find_root_gen() in btrfs-list.c seems to return the newest > generation in a given snapshot. It would be nice to have this exported as a > user command (e.g. "btrfs subvol newest-gen") then one could use the output of > > btrfs subvol newest-gen <old snapshot>That was definitely the plan. If you''re interested in coding this, please remember that you have to record the generation before you start to backup, so that you catch everything that changed during the backup next time around. When we find an inode in the output, it doesn''t mean that inode has changed. It just means the btree block holding that inode has changed. So we''ll want to add limiting based on the ctime/mtime of the inode as well. Inodes have type BTRFS_INODE_ITEM_KEY, the same inode format is used for both files and directories. Inside a directory we have the files listed twice, once under items of type BTRFS_DIR_ITEM_KEY, and once under items of type BTRFS_DIR_INDEX_KEY. The duplicate index helps with NFS and helps us do sequential directory reads. You''ll want to pick the BTRFS_DIR_INDEX_KEY because they are in a better order for backing up.> > (plus 1) as the input generation number to > > btrfs subvol find-new <new snapshot> <gen+1> >To be on the safe side (not miss any updates) we want to use gen, not gen+1. We''ll get some duplicates, but it is the only way to be sure we don''t miss anything. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have added a command btrfs subvolume find-modified <path> <last_gen> List the recently modified files and directories in a filesystem. It''s similar to find-new with the following differences: * in addition to modified files it will also display modified directories * it lists only the paths of the modified files and directories (no extent information) Directories "." and ".." are filtered. I will do extensive testing this weekend and then post the patch to this list if wanted - if I''m able to master git until then ... ^^ Greetings, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 26, 2010 at 07:51:47PM +0100, Dipl.-Ing. Michael Niederle wrote:> I have added a command > > btrfs subvolume find-modified <path> <last_gen> > List the recently modified files and directories in a filesystem. > > It''s similar to find-new with the following differences:Ok, I''d suggest two changes. Add an optional timestamp field to filter files that have changed since a given timestamp. Also make it take -e that prints the extents. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Chris!> Add an optional timestamp field to filter > files that have changed since a given timestamp.Is there a possibility to derive the timestamp directly from the generation number? If we have a "-e"-switch for printing extent-information we could also have another switch to decide whether to print directory-information or not and combine find-new and find-modified into a single command. Meanwhile I have implemented the (very simple) command btrfs subvolume max-gen <path> Print the highest generation number in a filesystem. Greetings, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote:> Hi, Chris! > > > Add an optional timestamp field to filter > > files that have changed since a given timestamp. > > Is there a possibility to derive the timestamp directly from the generation > number?I''m afraid not.> > If we have a "-e"-switch for printing extent-information we could also have > another switch to decide whether to print directory-information or not and > combine find-new and find-modified into a single command.Yes, that''s the direction I''d like to see.> > Meanwhile I have implemented the (very simple) command > > btrfs subvolume max-gen <path> > Print the highest generation number in a filesystem. >Great. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 26 March 2010, Chris Mason wrote:> On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote: > > Hi, Chris! > > > > > Add an optional timestamp field to filter > > > files that have changed since a given timestamp. > > > > Is there a possibility to derive the timestamp directly from thegeneration> > number? > > I''m afraid not. > > > > > If we have a "-e"-switch for printing extent-information we could alsohave> > another switch to decide whether to print directory-information or not and > > combine find-new and find-modified into a single command. > > Yes, that''s the direction I''d like to see. > > > > > Meanwhile I have implemented the (very simple) command > > > > btrfs subvolume max-gen <path> > > Print the highest generation number in a filesystem. > >It is possible to combine the commands max-gen and find-new ? Something like: $ btrfs subvol find-new subvol1 snap1 I think that the generation number is useful only from a developer point of view. But from an user point of view a command which is able to compare two snapshot if more useful.> > Great. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 26, 2010 at 09:36:31PM +0100, Goffredo Baroncelli wrote:> On Friday 26 March 2010, Chris Mason wrote: > > On Fri, Mar 26, 2010 at 09:22:07PM +0100, Dipl.-Ing. Michael Niederle wrote: > > > Hi, Chris! > > > > > > > Add an optional timestamp field to filter > > > > files that have changed since a given timestamp. > > > > > > Is there a possibility to derive the timestamp directly from the > generation > > > number? > > > > I''m afraid not. > > > > > > > > If we have a "-e"-switch for printing extent-information we could also > have > > > another switch to decide whether to print directory-information or not and > > > combine find-new and find-modified into a single command. > > > > Yes, that''s the direction I''d like to see. > > > > > > > > Meanwhile I have implemented the (very simple) command > > > > > > btrfs subvolume max-gen <path> > > > Print the highest generation number in a filesystem. > > > > > It is possible to combine the commands max-gen and find-new ? Something like: > > $ btrfs subvol find-new subvol1 snap1 > > I think that the generation number is useful only from a developer point of > view. But from an user point of view a command which is able to compare two > snapshot if more useful.In general, the end goal is backing up a snapshot the changes from a point in time to right now. We don''t actually need a snapshot to do this, we just need the generation number and (optionally) a timestamp. So, we could store these things into a state file that gets fed into the next backup, but I''d like to keep a command that can print them as well. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> It is possible to combine the commands max-gen and find-new ? Something like: > > $ btrfs subvol find-new subvol1 snap1I had very similar thoughts myself. If we compare two snapshots (of the same subvolume) we wouldn''t need timestamps either, e.g.: btrfs subvol diff <old_snapshot> <new_snapshot> The output could be a list of files (and directories); each line prefixed with a plus sign (for new or modified files) or a minus sign (for deleted files). The output could be easily postprocessed using grep and cut. Greetings, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 26 March 2010, Chris Mason wrote:> In general, the end goal is backing up a snapshot the changes from a > point in time to right now. We don''t actually need a snapshot to do > this, we just need the generation number and (optionally) a timestamp.I think that backup the difference between two snapshot has a big advantage: the snapshot is a coherent state. For example what if we are doing a backup during a package installation or during a database working ? The risk is to take some files from an old "state" and other files from a new "state"...> > So, we could store these things into a state file that gets fed into the > next backup, but I''d like to keep a command that can print them as well. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijackATinwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Chris! I''m writing the btrfs snapshot diff tool and I would like to know, whether the entries in a btrfs directory are ordered in some way. I want to find missing entries in a new snapshot''s directory. Can I do a "linear compare" of the old and new directories or do I have to sort the entries first (or do some kind of hashing)? Greetings, Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html