Hello all, in IRC we had a discussion on how we could solve sending live subvolumes and how to send subvolumes without the need to administrate/keep old snapshots for incremental sends. One of the ideas was to introduce "sendshots", which are basically snapshots where no refs are counted for file data. This means, that when file data is changed in the sendshot origin, we do not consume extra space for two copies of the data. We would only have the metadata duplicated. For the initial btrfs send we could do this: 1. Create a hidden read-only snapshot of the subvolume to send. Hidden means that it''s not referenced by any subvolume. It is however still a normal snapshot (not a sendshot!). Hidden snapshots are not possible atm so we would have to implement that. This step allows us to send read-write subvolumes, because we have a freezed version of it. 2. Send this new snapshot. 3. When we''re done with sending, create a "sendshot" from the snapshot and delete the invisible snapshot. As an alternative, we could convert the invisible snapshot to a sendshot...but not sure if that would be easy to implement. When we later do an incremental send we can do this: 1. Do the same as point 1. from above. 2. Determine which of the previous sendshots is the correct one for the incremental send. We could use some magic auto detection here or the user has to specify it by himself. 3. Use the hidden snapshot from 1. and the determined sendshot from 2. to find the incremental changes and do the send. 4. Do the same as point 3. from above. Every incremental send will add a new sendshot for later use. To avoid having millions of such sendshots after some time, btrfs-progs would need to delete old ones. That''s something the user needs 100% control of, as only he knows which ones can be deleted. He could either delete them by hand or let btrfs send do that automatically with a parameter that for example says how much sendshots to keep. The above steps would already make the use of btrfs send/receive a bit easier. The next step would be to implement a network protocol that allows on-the-fly sending/receiving without piping to a file in-between. The protocol would allow the sending and receiving side to agree on the sendshot to use for the incremental send. It would also allow the sending side to do all the sendshot cleanups on its own, because it would know which state is present on the receiving side. What do you guys think? Problem is, I probably won''t be able to implement this due to missing time for the rest of this year...going on a world trip and I don''t know when I''m back :) So, if anyone wants too take this idea and implement it, feel free to do so :) Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Jul-05 22:34 UTC
Re: [RFC] Btrfs "sendshots" and hidden snapshots
On 07/05/2012 06:51 PM, Alexander Block wrote:> Hello all, > > in IRC we had a discussion on how we could solve sending live > subvolumes and how to send subvolumes without the need to > administrate/keep old snapshots for incremental sends. One of the > ideas was to introduce "sendshots", which are basically snapshots > where no refs are counted for file data. This means, that when file > data is changed in the sendshot origin, we do not consume extra space > for two copies of the data. We would only have the metadata > duplicated. > > For the initial btrfs send we could do this: > 1. Create a hidden read-only snapshot of the subvolume to send. Hidden > means that it''s not referenced by any subvolume. It is however still a > normal snapshot (not a sendshot!). Hidden snapshots are not possible > atm so we would have to implement that. This step allows us to send > read-write subvolumes, because we have a freezed version of it.Why we should want/need an hidden snapshot ? We could put this kind of hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes)> 2. Send this new snapshot. > 3. When we''re done with sending, create a "sendshot" from the snapshot > and delete the invisible snapshot. As an alternative, we could convert > the invisible snapshot to a sendshot...but not sure if that would be > easy to implement. > > When we later do an incremental send we can do this: > 1. Do the same as point 1. from above. > 2. Determine which of the previous sendshots is the correct one for > the incremental send. We could use some magic auto detection here or > the user has to specify it by himself. > 3. Use the hidden snapshot from 1. and the determined sendshot from 2. > to find the incremental changes and do the send.I can understand how a sendshot could be used to compute the metadata delta. But how compute the data delta ?> 4. Do the same as point 3. from above. > > Every incremental send will add a new sendshot for later use. To avoid > having millions of such sendshots after some time, btrfs-progs would > need to delete old ones. That''s something the user needs 100% control > of, as only he knows which ones can be deleted. He could either delete > them by hand or let btrfs send do that automatically with a parameter > that for example says how much sendshots to keep. > > The above steps would already make the use of btrfs send/receive a bit > easier. The next step would be to implement a network protocol that > allows on-the-fly sending/receiving without piping to a file > in-between. The protocol would allow the sending and receiving side to > agree on the sendshot to use for the incremental send. It would also > allow the sending side to do all the sendshot cleanups on its own, > because it would know which state is present on the receiving side. > > What do you guys think? Problem is, I probably won''t be able to > implement this due to missing time for the rest of this year...going > on a world trip and I don''t know when I''m back :) > So, if anyone wants too take this idea and implement it, feel free to do so :) > > Alex. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > . >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote:> On 07/05/2012 06:51 PM, Alexander Block wrote: >> Hello all, >> >> in IRC we had a discussion on how we could solve sending live >> subvolumes and how to send subvolumes without the need to >> administrate/keep old snapshots for incremental sends. One of the >> ideas was to introduce "sendshots", which are basically snapshots >> where no refs are counted for file data. This means, that when file >> data is changed in the sendshot origin, we do not consume extra space >> for two copies of the data. We would only have the metadata >> duplicated. >> >> For the initial btrfs send we could do this: >> 1. Create a hidden read-only snapshot of the subvolume to send. Hidden >> means that it''s not referenced by any subvolume. It is however still a >> normal snapshot (not a sendshot!). Hidden snapshots are not possible >> atm so we would have to implement that. This step allows us to send >> read-write subvolumes, because we have a freezed version of it. > > Why we should want/need an hidden snapshot ? We could put this kind of > hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes)That would have the problem that the user may modify the subvolume in-between (by removing the ro flag). Or he could simple cd into it and we would later fail to delete it.> >> 2. Send this new snapshot. >> 3. When we''re done with sending, create a "sendshot" from the snapshot >> and delete the invisible snapshot. As an alternative, we could convert >> the invisible snapshot to a sendshot...but not sure if that would be >> easy to implement. >> >> When we later do an incremental send we can do this: >> 1. Do the same as point 1. from above. >> 2. Determine which of the previous sendshots is the correct one for >> the incremental send. We could use some magic auto detection here or >> the user has to specify it by himself. >> 3. Use the hidden snapshot from 1. and the determined sendshot from 2. >> to find the incremental changes and do the send. > > I can understand how a sendshot could be used to compute the metadata > delta. But how compute the data delta ?We still would have the file extent data found in the metadata. When we see that logical addresses or generations have changed, we know the data has changed. This may however be problematic in case a defrag or balance was performed, for this we should probably introduce a data only transid or something like that which is preserved on such operations.> >> 4. Do the same as point 3. from above. >> >> Every incremental send will add a new sendshot for later use. To avoid >> having millions of such sendshots after some time, btrfs-progs would >> need to delete old ones. That''s something the user needs 100% control >> of, as only he knows which ones can be deleted. He could either delete >> them by hand or let btrfs send do that automatically with a parameter >> that for example says how much sendshots to keep. >> >> The above steps would already make the use of btrfs send/receive a bit >> easier. The next step would be to implement a network protocol that >> allows on-the-fly sending/receiving without piping to a file >> in-between. The protocol would allow the sending and receiving side to >> agree on the sendshot to use for the incremental send. It would also >> allow the sending side to do all the sendshot cleanups on its own, >> because it would know which state is present on the receiving side. >> >> What do you guys think? Problem is, I probably won''t be able to >> implement this due to missing time for the rest of this year...going >> on a world trip and I don''t know when I''m back :) >> So, if anyone wants too take this idea and implement it, feel free to do so :) >> >> Alex. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> . >> >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 06, 2012 at 02:51:43AM -0600, Alexander Block wrote:> On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote: > > On 07/05/2012 06:51 PM, Alexander Block wrote: > >> Hello all, > >> > >> in IRC we had a discussion on how we could solve sending live > >> subvolumes and how to send subvolumes without the need to > >> administrate/keep old snapshots for incremental sends. One of the > >> ideas was to introduce "sendshots", which are basically snapshots > >> where no refs are counted for file data. This means, that when file > >> data is changed in the sendshot origin, we do not consume extra space > >> for two copies of the data. We would only have the metadata > >> duplicated. > >> > >> For the initial btrfs send we could do this: > >> 1. Create a hidden read-only snapshot of the subvolume to send. Hidden > >> means that it''s not referenced by any subvolume. It is however still a > >> normal snapshot (not a sendshot!). Hidden snapshots are not possible > >> atm so we would have to implement that. This step allows us to send > >> read-write subvolumes, because we have a freezed version of it. > > > > Why we should want/need an hidden snapshot ? We could put this kind of > > hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes) > That would have the problem that the user may modify the subvolume > in-between (by removing the ro flag). Or he could simple cd into it > and we would later fail to delete it.I prefer to make this more explicit. We could add a hard-readonly flag that cannot be cleared. Having the snapshot show in the FS lets the admin know what things are really using space. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Jul-06 12:00 UTC
Re: [RFC] Btrfs "sendshots" and hidden snapshots
On 07/06/2012 10:51 AM, Alexander Block wrote:> On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote: >> On 07/05/2012 06:51 PM, Alexander Block wrote: >>> Hello all, >>>[....]>>> When we later do an incremental send we can do this: >>> 1. Do the same as point 1. from above. >>> 2. Determine which of the previous sendshots is the correct one for >>> the incremental send. We could use some magic auto detection here or >>> the user has to specify it by himself. >>> 3. Use the hidden snapshot from 1. and the determined sendshot from 2. >>> to find the incremental changes and do the send. >> >> I can understand how a sendshot could be used to compute the metadata >> delta. But how compute the data delta ? > We still would have the file extent data found in the metadata. When > we see that logical addresses or generations have changed, we know the > data has changed. This may however be problematic in case a defrag or > balance was performed, for this we should probably introduce a data > only transid or something like that which is preserved on such > operations.Yes, this makes sense. The data that has to be collected is only the new one, if I can track which data is changed (comparing the extent data ) then I need only the new data.>>-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Jul-06 12:03 UTC
Re: [RFC] Btrfs "sendshots" and hidden snapshots
On 07/06/2012 01:55 PM, Chris Mason wrote:> On Fri, Jul 06, 2012 at 02:51:43AM -0600, Alexander Block wrote: >> On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote: >>> On 07/05/2012 06:51 PM, Alexander Block wrote: >>>> Hello all, >>>> >>>> in IRC we had a discussion on how we could solve sending live >>>> subvolumes and how to send subvolumes without the need to >>>> administrate/keep old snapshots for incremental sends. One of the >>>> ideas was to introduce "sendshots", which are basically snapshots >>>> where no refs are counted for file data. This means, that when file >>>> data is changed in the sendshot origin, we do not consume extra space >>>> for two copies of the data. We would only have the metadata >>>> duplicated. >>>> >>>> For the initial btrfs send we could do this: >>>> 1. Create a hidden read-only snapshot of the subvolume to send. Hidden >>>> means that it''s not referenced by any subvolume. It is however still a >>>> normal snapshot (not a sendshot!). Hidden snapshots are not possible >>>> atm so we would have to implement that. This step allows us to send >>>> read-write subvolumes, because we have a freezed version of it. >>> >>> Why we should want/need an hidden snapshot ? We could put this kind of >>> hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes) >> That would have the problem that the user may modify the subvolume >> in-between (by removing the ro flag). Or he could simple cd into it >> and we would later fail to delete it. > > I prefer to make this more explicit. We could add a hard-readonly flag > that cannot be cleared. Having the snapshot show in the FS lets the > admin know what things are really using space.Me too, but I am guessing what should happens when the users try to read an old data ? (I am talking about sendshot ). If I understood correctly the old data isn''t tracked by the sendshot. GB -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 6, 2012 at 2:03 PM, Goffredo Baroncelli <kreijack@libero.it> wrote:> On 07/06/2012 01:55 PM, Chris Mason wrote: >> On Fri, Jul 06, 2012 at 02:51:43AM -0600, Alexander Block wrote: >>> On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote: >>>> On 07/05/2012 06:51 PM, Alexander Block wrote: >>>>> Hello all, >>>>> >>>>> in IRC we had a discussion on how we could solve sending live >>>>> subvolumes and how to send subvolumes without the need to >>>>> administrate/keep old snapshots for incremental sends. One of the >>>>> ideas was to introduce "sendshots", which are basically snapshots >>>>> where no refs are counted for file data. This means, that when file >>>>> data is changed in the sendshot origin, we do not consume extra space >>>>> for two copies of the data. We would only have the metadata >>>>> duplicated. >>>>> >>>>> For the initial btrfs send we could do this: >>>>> 1. Create a hidden read-only snapshot of the subvolume to send. Hidden >>>>> means that it''s not referenced by any subvolume. It is however still a >>>>> normal snapshot (not a sendshot!). Hidden snapshots are not possible >>>>> atm so we would have to implement that. This step allows us to send >>>>> read-write subvolumes, because we have a freezed version of it. >>>> >>>> Why we should want/need an hidden snapshot ? We could put this kind of >>>> hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes) >>> That would have the problem that the user may modify the subvolume >>> in-between (by removing the ro flag). Or he could simple cd into it >>> and we would later fail to delete it. >> >> I prefer to make this more explicit. We could add a hard-readonly flag >> that cannot be cleared. Having the snapshot show in the FS lets the >> admin know what things are really using space.Yepp sounds like a better solution then hidden snapshots. Or, we could protect against RO flag changes while performing the send.> > > Me too, but I am guessing what should happens when the users try to read > an old data ? (I am talking about sendshot ). If I understood correctly > the old data isn''t tracked by the sendshot.Two possible solutions that I see: 1. Hidden sendshots :P 2. Reading files from a sendshot will always give dummy data (e.g. all zero). But I really can''t estimate how hard this is to implement.> > GB-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2012-Jul-09 15:51 UTC
Re: [RFC] Btrfs "sendshots" and hidden snapshots
On 07/06/2012 02:45 PM, Alexander Block wrote:> On Fri, Jul 6, 2012 at 2:03 PM, Goffredo Baroncelli <kreijack@libero.it> wrote: >> On 07/06/2012 01:55 PM, Chris Mason wrote: >>> On Fri, Jul 06, 2012 at 02:51:43AM -0600, Alexander Block wrote: >>>> On Fri, Jul 6, 2012 at 12:34 AM, Goffredo Baroncelli <kreijack@libero.it> wrote: >>>>> On 07/05/2012 06:51 PM, Alexander Block wrote: >>>>>> Hello all, >>>>>> >>>>>> in IRC we had a discussion on how we could solve sending live >>>>>> subvolumes and how to send subvolumes without the need to >>>>>> administrate/keep old snapshots for incremental sends. One of the >>>>>> ideas was to introduce "sendshots", which are basically snapshots >>>>>> where no refs are counted for file data. This means, that when file >>>>>> data is changed in the sendshot origin, we do not consume extra space >>>>>> for two copies of the data. We would only have the metadata >>>>>> duplicated. >>>>>> >>>>>> For the initial btrfs send we could do this: >>>>>> 1. Create a hidden read-only snapshot of the subvolume to send. Hidden >>>>>> means that it''s not referenced by any subvolume. It is however still a >>>>>> normal snapshot (not a sendshot!). Hidden snapshots are not possible >>>>>> atm so we would have to implement that. This step allows us to send >>>>>> read-write subvolumes, because we have a freezed version of it. >>>>> >>>>> Why we should want/need an hidden snapshot ? We could put this kind of >>>>> hidden snapshot under a directory dot-prefixed (like /.hidden-subvolumes) >>>> That would have the problem that the user may modify the subvolume >>>> in-between (by removing the ro flag). Or he could simple cd into it >>>> and we would later fail to delete it. >>> >>> I prefer to make this more explicit. We could add a hard-readonly flag >>> that cannot be cleared. Having the snapshot show in the FS lets the >>> admin know what things are really using space. > Yepp sounds like a better solution then hidden snapshots. Or, we could > protect against RO flag changes while performing the send. >> >> >> Me too, but I am guessing what should happens when the users try to read >> an old data ? (I am talking about sendshot ). If I understood correctly >> the old data isn''t tracked by the sendshot. > Two possible solutions that I see: > 1. Hidden sendshots :P > 2. Reading files from a sendshot will always give dummy data (e.g. all > zero). But I really can''t estimate how hard this is to implement.From an user point of view, this would be a nightmare. Two similar filesystem with no obvious differences.... I suggest that the sendshot appears as empty read only subvolume. So all the btrfs subvolumes semantic could be applied (btrfs subvolume delete, btrfs sublume list, btrfs send, btrfs receive ) and the user cannot read false data. We could mark this with a specific inode number:currently all subvolume have inode number=256, we could use 255 or similar (I don''t know if this could raise some problem however ). Otherwise we need to create a separate namespace for this kind of subvolume (which could be a solution for other kinds of problems), for example under /sys. GB>> >> GB > . >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Apparently Analagous Threads
- [PATCH] Allow cross subvolume reflinks (2nd attempt)
- [PATCH] BTRFS-PROG: recursively subvolume snapshot and delete
- R: default subvolume abilities/restrictions
- R: Re: [PATCH 5/5] btrfs: Add ioctl to set snapshot readonly/writable
- make snapshot main volume, delete all others?