Hi Jan, Alex, I have seen some discussions about btrfs send/receive functionality being developed by you. I have also been interested in this. I spent some time coding a prototype doing something like Alex described in http://www.spinics.net/lists/linux-btrfs/msg16175.html, i.e., walking over FS tree and pulling those items that have transid/generation larger than a particular value. I realized though, that there are many issues with that approach, and also probably there are many issues I am not aware of. Some of the issues I realized: # How does one track changes in generic INODE_ITEM properties, like "mode" or "uid/gid"? Whenever such property gets changed, INODE_ITEM gets stamped with a new transid, but do we need to compare it with the previous version on the receive side to realize what has changed? # File size - is it required, again, to compare vs previous size, to realize file truncation? (file grow perhaps can be realized via new EXTENT_DATAs) # What should be done if INODE_ITEM::flags change (e.g., inode gets nodatacow/nodatasum flags set). What should be done at receive side? # How does one track deletion of INODE_ITEMs? Or, deletion and re-creation of a INODE_ITEM with the same inode number? (I saw that inode_cache mount option allows to re-use inode numbers, so I think it can happen.) Does this mean that on receive side, it is required to compare contents of each directory vs previous version? # What should be done with INODE_ITEMs like block/char device, FIFO or a socket? # XATTR_ITEMs: although they have a transid stamp, again, need to track deletion/re-creation of them. Again by comparing? # INODE_REFs: these seem most tough to me, because they don''t have transid stamps. How such scenario can be handled: an INODE_ITEM had two INODE_REFs with names N1 and N2. But now on the send side, both those INODE_REFs were deleted and INODE_REFs N3 and N4 were created. Does that mean we need to always compare all INODE_REFs for each INODE_ITEM, or we perhaps can use DIR_ITEMs/DIR_INDEXs of parent INODE_ITEM to detect changes in INODE_REFs? All in all, it looks like the approach of navigating the FS tree and trying to *understand* specifically which modifications were performed, is quite error-prone. And I am sure there are modifications I am not aware about. I was wondering, what state your work is in? Is it possible to look at some code or prototype, to understand what approach have you taken, or perhaps an overall description of the approach? Jan, I saw that you provided some new code for backref resolving. Can you give a hint of how is that related to the send/receive functionality? Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04.06.2012 14:39, Alex Lyakas wrote:> > # How does one track changes in generic INODE_ITEM properties, like > "mode" or "uid/gid"? Whenever such property gets changed, INODE_ITEM > gets stamped with a new transid, but do we need to compare it with the > previous version on the receive side to realize what has changed? > # File size - is it required, again, to compare vs previous size, to > realize file truncation? (file grow perhaps can be realized via new > EXTENT_DATAs)The basic idea of send/receive is not to find anything that has changed since a given transid number, but to find the differences between 2 snapshots. This way you always have access to the old values.> # What should be done if INODE_ITEM::flags change (e.g., inode gets > nodatacow/nodatasum flags set). What should be done at receive side?> # How does one track deletion of INODE_ITEMs? Or, deletion and > re-creation of a INODE_ITEM with the same inode number? (I saw that > inode_cache mount option allows to re-use inode numbers, so I think it > can happen.) Does this mean that on receive side, it is required to > compare contents of each directory vs previous version?A recreated inode gets a new inode generation number. That''s needed for NFS, otherwise NFS could also not detect this case.> # What should be done with INODE_ITEMs like block/char device, FIFO or a socket?Everything that can be created on the dest side, like device files, should be created.> # XATTR_ITEMs: although they have a transid stamp, again, need to > track deletion/re-creation of them. Again by comparing?as long as they end up identical on the destination, delete/recreate shouldn''t matter. The rest of the question I leave for Jan and Alexander :) -Arne> # INODE_REFs: these seem most tough to me, because they don''t have > transid stamps. How such scenario can be handled: an INODE_ITEM had > two INODE_REFs with names N1 and N2. But now on the send side, both > those INODE_REFs were deleted and INODE_REFs N3 and N4 were created. > Does that mean we need to always compare all INODE_REFs for each > INODE_ITEM, or we perhaps can use DIR_ITEMs/DIR_INDEXs of parent > INODE_ITEM to detect changes in INODE_REFs? > > All in all, it looks like the approach of navigating the FS tree and > trying to *understand* specifically which modifications were > performed, is quite error-prone. And I am sure there are modifications > I am not aware about. > > I was wondering, what state your work is in? Is it possible to look at > some code or prototype, to understand what approach have you taken, or > perhaps an overall description of the approach? > > Jan, I saw that you provided some new code for backref resolving. Can > you give a hint of how is that related to the send/receive > functionality? > > Thanks, > Alex. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 2:39 PM, Alex Lyakas <alex.bolshoy.btrfs@gmail.com> wrote:> Hi Jan, Alex, > > I have seen some discussions about btrfs send/receive functionality > being developed by you. I have also been interested in this. I spent > some time coding a prototype doing something like Alex described in > http://www.spinics.net/lists/linux-btrfs/msg16175.html, i.e., walking > over FS tree and pulling those items that have transid/generation > larger than a particular value. I realized though, that there are many > issues with that approach, and also probably there are many issues I > am not aware of. Some of the issues I realized:Well, you are, like me in the beginning, on a wrong track ;) Using the transid only is not the way send/receive will be implemented when it''s done.> > # How does one track changes in generic INODE_ITEM properties, like > "mode" or "uid/gid"? Whenever such property gets changed, INODE_ITEM > gets stamped with a new transid, but do we need to compare it with the > previous version on the receive side to realize what has changed? > # File size - is it required, again, to compare vs previous size, to > realize file truncation? (file grow perhaps can be realized via new > EXTENT_DATAs) > # What should be done if INODE_ITEM::flags change (e.g., inode gets > nodatacow/nodatasum flags set). What should be done at receive side? > # How does one track deletion of INODE_ITEMs? Or, deletion and > re-creation of a INODE_ITEM with the same inode number? (I saw that > inode_cache mount option allows to re-use inode numbers, so I think it > can happen.) Does this mean that on receive side, it is required to > compare contents of each directory vs previous version? > # What should be done with INODE_ITEMs like block/char device, FIFO or a socket? > # XATTR_ITEMs: although they have a transid stamp, again, need to > track deletion/re-creation of them. Again by comparing? > # INODE_REFs: these seem most tough to me, because they don''t have > transid stamps. How such scenario can be handled: an INODE_ITEM had > two INODE_REFs with names N1 and N2. But now on the send side, both > those INODE_REFs were deleted and INODE_REFs N3 and N4 were created. > Does that mean we need to always compare all INODE_REFs for each > INODE_ITEM, or we perhaps can use DIR_ITEMs/DIR_INDEXs of parent > INODE_ITEM to detect changes in INODE_REFs? > > All in all, it looks like the approach of navigating the FS tree and > trying to *understand* specifically which modifications were > performed, is quite error-prone. And I am sure there are modifications > I am not aware about.The problem with the transid only approach is, that there is no way to find out "what" has changed in an inode. You only know which inode has changed. You could probably determine which extents have changed, but this is unreliable as you''ve already read in my older mail. There is also absolutely no way to detect deleted/moved files/dirs. When send/receive is released and working, I may try to implement a mode that only relies on the transid, but this has low priority for me and also needs some changes to other parts of btrfs. If I implement that, it would however still be unable to detect what has changed and would also miss deleted/moved dirs. You could compare it to rsync without the --delete option (which I use frequently to transfer VM images).> > I was wondering, what state your work is in? Is it possible to look at > some code or prototype, to understand what approach have you taken, or > perhaps an overall description of the approach?Currently, most things work as expected. But, the code is not in a state to be released. Jan, Arne, David and Stefan are currently reviewing my code and I have a lot of TODO''s due to the suggestions they all made.> > Jan, I saw that you provided some new code for backref resolving. Can > you give a hint of how is that related to the send/receive > functionality?It is not only related to the send/receive code. It is currently mainly related to the upcoming qgroups patches that Jan is preparing. It may also be related to other parts of btrfs (as far as I know, scrub is also using the backref resolving code). His patches are however a requirement for send/receive to work properly when I release my first patches).> > Thanks, > Alex.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Alex, Jan, I was also interested in send/receive semantics & was thinking that if we adhere to the semantics as in http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07482.html of: it is impossible to track the "deleted" items (files,dirs, eXtended attributes). I can develop a command which compare two subvolumes an extract all of this kind of information. But this command would return correct information *only if* A) a subvolume is a snapshot of the other one B.1) the reference snapshot is not touched OR B.2) I have the lastgen "when" the snapshot is taken then transid/generation number could be made to work to determine the differences between two subvolumes including the deletes that were involved. I agree that file-clone is tricky as you pointed, but we dont use it in our environment. Do you still see issues with it that I am missing? Also out of curiosity, would you mind sharing how you avoided looking at transid? Would it be that all meta changes are logged in tree-modification-log or equivalent? Thanks. PS: Alex/Jan, sorry for the multiple mails. including the linux-btrfs list refused HTML msgs. --Shyam On Mon, Jun 4, 2012 at 6:52 PM, Alexander Block <ablock84@googlemail.com> wrote:> > On Mon, Jun 4, 2012 at 2:39 PM, Alex Lyakas > <alex.bolshoy.btrfs@gmail.com> wrote: > > Hi Jan, Alex, > > > > I have seen some discussions about btrfs send/receive functionality > > being developed by you. I have also been interested in this. I spent > > some time coding a prototype doing something like Alex described in > > http://www.spinics.net/lists/linux-btrfs/msg16175.html, i.e., walking > > over FS tree and pulling those items that have transid/generation > > larger than a particular value. I realized though, that there are many > > issues with that approach, and also probably there are many issues I > > am not aware of. Some of the issues I realized: > Well, you are, like me in the beginning, on a wrong track ;) Using the > transid only is not the way send/receive will be implemented when it''s > done. > > > > # How does one track changes in generic INODE_ITEM properties, like > > "mode" or "uid/gid"? Whenever such property gets changed, INODE_ITEM > > gets stamped with a new transid, but do we need to compare it with the > > previous version on the receive side to realize what has changed? > > # File size - is it required, again, to compare vs previous size, to > > realize file truncation? (file grow perhaps can be realized via new > > EXTENT_DATAs) > > # What should be done if INODE_ITEM::flags change (e.g., inode gets > > nodatacow/nodatasum flags set). What should be done at receive side? > > # How does one track deletion of INODE_ITEMs? Or, deletion and > > re-creation of a INODE_ITEM with the same inode number? (I saw that > > inode_cache mount option allows to re-use inode numbers, so I think it > > can happen.) Does this mean that on receive side, it is required to > > compare contents of each directory vs previous version? > > # What should be done with INODE_ITEMs like block/char device, FIFO or a socket? > > # XATTR_ITEMs: although they have a transid stamp, again, need to > > track deletion/re-creation of them. Again by comparing? > > # INODE_REFs: these seem most tough to me, because they don''t have > > transid stamps. How such scenario can be handled: an INODE_ITEM had > > two INODE_REFs with names N1 and N2. But now on the send side, both > > those INODE_REFs were deleted and INODE_REFs N3 and N4 were created. > > Does that mean we need to always compare all INODE_REFs for each > > INODE_ITEM, or we perhaps can use DIR_ITEMs/DIR_INDEXs of parent > > INODE_ITEM to detect changes in INODE_REFs? > > > > All in all, it looks like the approach of navigating the FS tree and > > trying to *understand* specifically which modifications were > > performed, is quite error-prone. And I am sure there are modifications > > I am not aware about. > The problem with the transid only approach is, that there is no way > to find out "what" has changed in an inode. You only know which inode > has changed. You could probably determine which extents have > changed, but this is unreliable as you''ve already read in my older mail. > There is also absolutely no way to detect deleted/moved files/dirs. > When send/receive is released and working, I may try to implement > a mode that only relies on the transid, but this has low priority for me > and also needs some changes to other parts of btrfs. If I implement > that, it would however still be unable to detect what has changed and > would also miss deleted/moved dirs. You could compare it to rsync > without the --delete option (which I use frequently to transfer VM > images). > > > > I was wondering, what state your work is in? Is it possible to look at > > some code or prototype, to understand what approach have you taken, or > > perhaps an overall description of the approach? > Currently, most things work as expected. But, the code is not in a state > to be released. Jan, Arne, David and Stefan are currently reviewing my > code and I have a lot of TODO''s due to the suggestions they all made. > > > > Jan, I saw that you provided some new code for backref resolving. Can > > you give a hint of how is that related to the send/receive > > functionality? > It is not only related to the send/receive code. It is currently mainly related > to the upcoming qgroups patches that Jan is preparing. It may also be > related to other parts of btrfs (as far as I know, scrub is also using the > backref resolving code). His patches are however a requirement for > send/receive to work properly when I release my first patches). > > > > Thanks, > > Alex. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 5:10 PM, shyam btrfs <shyam.btrfs@gmail.com> wrote:> Hi Alex, Jan, > > I was also interested in send/receive semantics & was thinking that if > we adhere to the semantics as in > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07482.html > of: > > it is impossible to track the "deleted" items (files,dirs, eXtended > attributes). I can develop a command which compare two subvolumes an > extract all of this kind of information. But this command would return > correct information *only if* > A) a subvolume is a snapshot of the other one > B.1) the reference snapshot is not touched OR > B.2) I have the lastgen "when" the snapshot is takenWhat I implement is basically the same, but with no transid involved. The comparison however takes place inside the kernel and not in user space.> > then transid/generation number could be made to work to determine the > differences between two subvolumes including the deletes that were > involved. I agree that file-clone is tricky as you pointed, but we > dont use it in our environment. > > Do you still see issues with it that I am missing? > > Also out of curiosity, would you mind sharing how you avoided looking > at transid? Would it be that all meta changes are logged in > tree-modification-log or equivalent?The btrees in btrfs have a nice feature: If you compare two trees and encounter a tree block that is shared with the other tree, the whole subtree below is shared and thus can be regarded as unchanged. Checking the transids does not guarantee that you skip those unchanged subtrees, as the transids may also change when none of the items change. btrfs balance may do this for example. So you would unnecessarily iterate and compare the whole tree in many cases, which will give bad performance. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Arne, On Mon, Jun 4, 2012 at 4:01 PM, Arne Jansen <sensille@gmx.net> wrote:> On 04.06.2012 14:39, Alex Lyakas wrote: > >> >> # How does one track changes in generic INODE_ITEM properties, like >> "mode" or "uid/gid"? Whenever such property gets changed, INODE_ITEM >> gets stamped with a new transid, but do we need to compare it with the >> previous version on the receive side to realize what has changed? >> # File size - is it required, again, to compare vs previous size, to >> realize file truncation? (file grow perhaps can be realized via new >> EXTENT_DATAs) > > The basic idea of send/receive is not to find anything that has changed > since a given transid number, but to find the differences between 2 > snapshots. This way you always have access to the old values. > >> # What should be done if INODE_ITEM::flags change (e.g., inode gets >> nodatacow/nodatasum flags set). What should be done at receive side? > >> # How does one track deletion of INODE_ITEMs? Or, deletion and >> re-creation of a INODE_ITEM with the same inode number? (I saw that >> inode_cache mount option allows to re-use inode numbers, so I think it >> can happen.) Does this mean that on receive side, it is required to >> compare contents of each directory vs previous version? > > A recreated inode gets a new inode generation number. That''s needed > for NFS, otherwise NFS could also not detect this case.So what you are saying is that a tuple (inode number, generation) is unique within a subvolume. That''s a good thing to keep in mind!> >> # What should be done with INODE_ITEMs like block/char device, FIFO or a socket? > > Everything that can be created on the dest side, like device files, > should be created. > >> # XATTR_ITEMs: although they have a transid stamp, again, need to >> track deletion/re-creation of them. Again by comparing? > > as long as they end up identical on the destination, delete/recreate > shouldn''t matter. > > The rest of the question I leave for Jan and Alexander :) > > -ArneThanks, Alex.> >> # INODE_REFs: these seem most tough to me, because they don''t have >> transid stamps. How such scenario can be handled: an INODE_ITEM had >> two INODE_REFs with names N1 and N2. But now on the send side, both >> those INODE_REFs were deleted and INODE_REFs N3 and N4 were created. >> Does that mean we need to always compare all INODE_REFs for each >> INODE_ITEM, or we perhaps can use DIR_ITEMs/DIR_INDEXs of parent >> INODE_ITEM to detect changes in INODE_REFs? >> >> All in all, it looks like the approach of navigating the FS tree and >> trying to *understand* specifically which modifications were >> performed, is quite error-prone. And I am sure there are modifications >> I am not aware about. >> >> I was wondering, what state your work is in? Is it possible to look at >> some code or prototype, to understand what approach have you taken, or >> perhaps an overall description of the approach? >> >> Jan, I saw that you provided some new code for backref resolving. Can >> you give a hint of how is that related to the send/receive >> functionality? >> >> Thanks, >> Alex. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 6:33 PM, Alexander Block <ablock84@googlemail.com> wrote:> On Mon, Jun 4, 2012 at 5:10 PM, shyam btrfs <shyam.btrfs@gmail.com> wrote: >> Hi Alex, Jan, >> >> I was also interested in send/receive semantics & was thinking that if >> we adhere to the semantics as in >> http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07482.html >> of: >> >> it is impossible to track the "deleted" items (files,dirs, eXtended >> attributes). I can develop a command which compare two subvolumes an >> extract all of this kind of information. But this command would return >> correct information *only if* >> A) a subvolume is a snapshot of the other one >> B.1) the reference snapshot is not touched OR >> B.2) I have the lastgen "when" the snapshot is taken > What I implement is basically the same, but with no transid involved. The > comparison however takes place inside the kernel and not in user space. >> >> then transid/generation number could be made to work to determine the >> differences between two subvolumes including the deletes that were >> involved. I agree that file-clone is tricky as you pointed, but we >> dont use it in our environment. >> >> Do you still see issues with it that I am missing? >> >> Also out of curiosity, would you mind sharing how you avoided looking >> at transid? Would it be that all meta changes are logged in >> tree-modification-log or equivalent? > The btrees in btrfs have a nice feature: If you compare two trees and > encounter a tree block that is shared with the other tree, the whole > subtree below is shared and thus can be regarded as unchanged. > Checking the transids does not guarantee that you skip those > unchanged subtrees, as the transids may also change when none > of the items change. btrfs balance may do this for example. So you > would unnecessarily iterate and compare the whole tree in many > cases, which will give bad performance.Yes, I also noticed that sometimes transid gets bumped up, but there is no actual change. So let''s say you identify that a particular part of the tree is not shared anymore, and, say, eventually you get to a particular leaf within that part of the tree. How would you detect that, say, a particular INODE_ITEM (or, more difficult, an INODE_REF) is missing from that leaf WRT to previous tree? The property you described perhaps suggests another method to find leafs, in which *something* has changed. (Although within a leaf, does it make sense to decode all items and to look at their transid - those that have them - to filter out even more?) And yes, perhaps, looking at transid alone will bring more such potential leafs into consideration. However, how does this property help to determine *what* actually has changed between the two trees? Like, for example, being able to tell over which range of keys there possibly was a change, and iterate within that range? Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 4, 2012 at 7:33 PM, Alex Lyakas <alex.bolshoy.btrfs@gmail.com> wrote:> > Yes, I also noticed that sometimes transid gets bumped up, but there > is no actual change. > > So let''s say you identify that a particular part of the tree is not > shared anymore, and, say, eventually you get to a particular leaf > within that part of the tree. How would you detect that, say, a > particular INODE_ITEM (or, more difficult, an INODE_REF) is missing > from that leaf WRT to previous tree? > > The property you described perhaps suggests another method to find > leafs, in which *something* has changed. (Although within a leaf, does > it make sense to decode all items and to look at their transid - those > that have them - to filter out even more?) And yes, perhaps, looking > at transid alone will bring more such potential leafs into > consideration. > > However, how does this property help to determine *what* actually has > changed between the two trees? Like, for example, being able to tell > over which range of keys there possibly was a change, and iterate > within that range?When doing incremental sends, we always have two trees at hand. One where we know that it is already on the receiving side (we did already send it) and the one that we want to send now. To find the changes, we simply compare those trees. If an items misses on one tree, we know it''s either new or deleted (depending on the tree the item lies in). I would suggest you to not put too much work into send. As already said, the btrfs send/receive patches are close to be posted to the mailing list. It''s currently reviewed and when I get a "looks good now", I''ll post it.> > Alex.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 5, 2012 at 12:16 PM, Alexander Block <ablock84@googlemail.com> wrote:> On Mon, Jun 4, 2012 at 7:33 PM, Alex Lyakas > <alex.bolshoy.btrfs@gmail.com> wrote: >> >> Yes, I also noticed that sometimes transid gets bumped up, but there >> is no actual change. >> >> So let''s say you identify that a particular part of the tree is not >> shared anymore, and, say, eventually you get to a particular leaf >> within that part of the tree. How would you detect that, say, a >> particular INODE_ITEM (or, more difficult, an INODE_REF) is missing >> from that leaf WRT to previous tree? >> >> The property you described perhaps suggests another method to find >> leafs, in which *something* has changed. (Although within a leaf, does >> it make sense to decode all items and to look at their transid - those >> that have them - to filter out even more?) And yes, perhaps, looking >> at transid alone will bring more such potential leafs into >> consideration. >> >> However, how does this property help to determine *what* actually has >> changed between the two trees? Like, for example, being able to tell >> over which range of keys there possibly was a change, and iterate >> within that range? > When doing incremental sends, we always have two trees at hand. One > where we know that it is already on the receiving side (we did already > send it) and the one that we want to send now. To find the changes, we > simply compare those trees. If an items misses on one tree, we know > it''s either new or deleted (depending on the tree the item lies in). > > I would suggest you to not put too much work into send. As already > said, the btrfs send/receive patches are close to be posted to the > mailing list. It''s currently reviewed and when I get a "looks good now", > I''ll post it. >> >> Alex.Thanks for the update, Alex. Efficiently comparing the trees is the crux of what we were looking into. Thanks, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html