Rsync looks like a good tool for keeping files synchronized between two computers, such as a desktop and laptop. However, Mac OS uses a forked file system, with the second fork storing data critical to some files (such as applications) as well as type and creator information matching files up with the appropriate apps and file types. Mac OS X comes with a command-line utility called ditto that can copy files while preserving the second fork's information. There's also a C function to do the same thing. So my questions are: 1. Would it be possible (and how difficult?) to either use or modify rsync such that it used ditto or the relevant C call? 2. Would it be worth it over writing a simple app or script that runs on one machine? rsync would lose the ability to copy only the differences between files, and would only retain whatever advantages running on both machines gives it. Thanks, --Dave
As David Feldman wrote recently, rsync looks like it would be very useful for Mac OS X systems, where there is currently a dearth of options for backup. I am looking into using rsync to backup/mirror a few systems, but there are two changes that I will need to make first, based on two file system features: - Mac OS X systems use HFS+, which supports files with one or two forks. - HFS+ also supports some "meta data" for all files and directories. There are a few ways to add support for these FS features: 1) convert (on the fly) all files to MacBinary before comparing/sending them to the destination. MacBinary is a well documented way to package an HFS file into a single data file. The benefits with this method are compatibility with existing rsync versions that are not MacBinary aware, while the drawbacks are speed, maintainability, and that directory metadata is not addressed at all. 2) Treat the two forks and metadata as three separate files for the purposes of comparison/sending, and then reassemble them on the destination. Same drawbacks and benefits of the MacBinary route. This would also take more memory (potentially three times the number of files in the flist). 3) Change the protocol and implementation to handle arbitrary metadata and multiple forks. This could be made sort-of compatible with existing rsync's by using various tricks, but the most efficient way would be to alter the protocol. Benefits are that this would make the protocol extensible. Metadata can be "tagged" so that you could add any values needed, and ignore those tags that are not understood or supported. Any number of forks could be supported, which gives a step up in supporting NTFS where a file can have any number of "data streams". In fact, forks and metadata could all be done in the same way in the protocol. So, my question is, has anyone else done work in the areas of protocol enhancements and "rich" FS support? I have lots of experience on the Mac and have the code needed to access HFS+ metadata and forks from the BSD layer. I'm just looking for suggestions and news of anyone else working on stuff that might dovetail with this. Also, I'm a bit concerned about the current behavior of reading the entire tree into memory, especially the effects that would have on large file sets. Any work being done on this front? Regards, Mark.
I'm not familiar with netatalk, but along a similar line, Mac OS X stores resource forks and metadata differently on HFS+ and single-fork volumes (such as UFS or NFS). If you copy a file from an HFS+ volume over to a single-fork volume using the Finder it'll split the pieces apart and save the resource fork and metadata under variations of the original filename. I don't remember the exact names but I think they're in the Mac OS X System Overview document...something like ._<original filename>. If there's a way I can help with the porting effort please let me know. I don't know a lot about the lower-level details, but do know C, C++, Cocoa, etc. and would be interested in looking at the BSD-level info you have on transferring OS X files. As I stated in my earlier message, my primary interest is synchronization of desktop and laptop, though backup would be terrific too. I'm pretty sure there are a lot of OS X users out there in need of both. I'm currently synchronizing with a shell script that uses ditto. --Dave On Monday, December 17, 2001, at 11:25 AM, rsync-request@lists.samba.org wrote:> From: "Chris Garrigues" <cwg-dated-55c191e81afae8e9@deepeddy.com> > To: Mark Valence <kurash@sassafras.com> > Cc: rsync@lists.samba.org > Subject: Re: using rsync with Mac OS X > Date: Mon, 17 Dec 2001 09:17:44 -0600 > > --==_Exmh_771008192P > Content-Type: text/plain; charset=us-ascii > >> From: Mark Valence <kurash@sassafras.com> >> Date: Sun, 16 Dec 2001 22:26:04 -0500 >> >> 1) convert (on the fly) all files to MacBinary before >> comparing/sending them to the destination. MacBinary is a well >> documented way to package an HFS file into a single data file. The >> benefits with this method are compatibility with existing rsync >> versions that are not MacBinary aware, while the drawbacks are speed, >> maintainability, and that directory metadata is not addressed at all. >> >> 2) Treat the two forks and metadata as three separate files for the >> purposes of comparison/sending, and then reassemble them on the >> destination. Same drawbacks and benefits of the MacBinary route. >> This would also take more memory (potentially three times the number >> of files in the flist). >> >> 3) Change the protocol and implementation to handle arbitrary >> metadata and multiple forks. This could be made sort-of compatible >> with existing rsync's by using various tricks, but the most efficient >> way would be to alter the protocol. Benefits are that this would >> make the protocol extensible. Metadata can be "tagged" so that you >> could add any values needed, and ignore those tags that are not >> understood or supported. Any number of forks could be supported, >> which gives a step up in supporting NTFS where a file can have any >> number of "data streams". In fact, forks and metadata could all be >> done in the same way in the protocol. > > A quick thought about implementation details: It would be nice if this > were > done in such a way that if I were to rsync from a non-OSX netatalk > system > onto an OSX system the .AppleDouble directories would be merged back > into the > files, and conversely if I were to rsync from an OSX system to a > netatalk > system the resource forks would be split into .AppleDouble directories. > > I guess this would be simplest with scheme 2 above. > > Chris
I would lean toward option "1" for several reasons. Primarily it could probably inter-operate safely with non-HFS or older versions. How about a flag that changes the mode to detect named forks and encode them in-line. These encoded files could be safely synced to non-forked storage destinations or tape. A simple tag passed at the beginning of a session could notify the destination that MacBinary decoding could be attempted if available. I also understand the need for named resource files for systems like netatalk. The problem with this is that every named fork system is different: netatalk, Xinet, Helios, OSX Finder. This is a lot to chew. I would rather the user post process files to get them into the named fork method if they must. If you are going between two systems using the named fork technique, this whole process is unneeded. Option "3" might be the best. It seems to me that this could end up requiring a lot of changes to the protocol. It should also be noted, that a project like this should be done at the Darwin level. There have also been discussions on the darwin-development list in June 01. No one really stared anything, however they did discuss at length how access to resource forks might be done while stying inside posix calls. -Chris At 8:25 AM -0800 12/17/01, Mark Valence <kurash@sassafras.com> wrote:>As David Feldman wrote recently, rsync looks like it would be very >useful for Mac OS X systems, where there is currently a dearth of >options for backup. > >I am looking into using rsync to backup/mirror a few systems, but >there are two changes that I will need to make first, based on two >file system features: > > - Mac OS X systems use HFS+, which supports files with one or two forks. > > - HFS+ also supports some "meta data" for all files and directories. > >There are a few ways to add support for these FS features: > >1) convert (on the fly) all files to MacBinary before >comparing/sending them to the destination. MacBinary is a well >documented way to package an HFS file into a single data file. The >benefits with this method are compatibility with existing rsync >versions that are not MacBinary aware, while the drawbacks are speed, >maintainability, and that directory metadata is not addressed at all. > >2) Treat the two forks and metadata as three separate files for the >purposes of comparison/sending, and then reassemble them on the >destination. Same drawbacks and benefits of the MacBinary route. >This would also take more memory (potentially three times the number >of files in the flist). > >3) Change the protocol and implementation to handle arbitrary >metadata and multiple forks. This could be made sort-of compatible >with existing rsync's by using various tricks, but the most efficient >way would be to alter the protocol. Benefits are that this would >make the protocol extensible. Metadata can be "tagged" so that you >could add any values needed, and ignore those tags that are not >understood or supported. Any number of forks could be supported, >which gives a step up in supporting NTFS where a file can have any >number of "data streams". In fact, forks and metadata could all be >done in the same way in the protocol. > >So, my question is, has anyone else done work in the areas of >protocol enhancements and "rich" FS support? > >I have lots of experience on the Mac and have the code needed to >access HFS+ metadata and forks from the BSD layer. I'm just looking >for suggestions and news of anyone else working on stuff that might >dovetail with this. > >Also, I'm a bit concerned about the current behavior of reading the >entire tree into memory, especially the effects that would have on >large file sets. Any work being done on this front? > >Regards, > >Mark.-- -------------------------------------------------------------- Chris Irvine On-line store-> http://www.tfaw.com/ Information Systems Manager phone: 503-652-8815 Dark Horse Comics, Inc. http://www.darkhorse.com/ PGP Key ID: 0x0263648A PGP F.P. 8CEF 1BC8 F763 DF79 6F38 3156 EA30 50DF 0263 648A
OK, I'm brand new to this group, brand new to rsync, brand new to unix in general. I'm trying to play catch up with this discussion so there are likely many misconceptions that I have about these issues. My goal is to create a tool that does backup and restore only transferring changes. It will connect to a server running Linux from Mac OS X and preserve all metadata without the user ever knowing there is an issue. I've found the rsync algorithm is a good start and it sounds like you all have the same idea. I don't think I like the idea of the MacBinary solution, in that I can see some configuration of the tool that the user will have to worry about. We obviously don't want the overhead of flattening files without forks or files that have FileInfo that can be determined from other metadata strategies. The user might have to maintain a list of files they use... How do I handle this file or that (? la mac cvs tools). I see another user experience issue with the MacBinary solution and the protocol change. What do the files look like when they get backed up? If I connect to the server via the finder am I going to see a bunch of files that are 'archived' or do I get the real deal. I would hate to use rsync if I couldn't just go and grab the files that got backed up. Not that running the file through stuffit is a big deal but it going to seems like a bit of a kludge to the user even if the solution is in fact much more elegant. What format is this new protocol going to produce? Will the only way to get to the files be to use the rsync client? Sorry, that's just not acceptable. The only solution left is to pre-process the file by splitting it before before creating the change lists. There will have to be some intelligence about what method of splitting was used on the server but I'm positive that couldn't be too hard to determine. Please tell me if I'm way off base here. One other question that I'm sure will show my ignorance of Darwin development. What is the issue with using the high level API's if the output is compatible with the other platforms running rsync. What is the advantage of trying for posix purity or code at the "Darwin level" if the code is only going to be used on Macs running the higher level stuff anyway? If you don't have a forked file system why would you care if you don't know how to handle forks? I'm planning on taking this project on full time and we would all benefit if we can all agree on a direction. Lets get this thing going, Terrence Geernaert> Mark Valence wrote: > > So, that's one vote each for options 1, 2, and 3 ;-) > > I agree that the ideal implementation would support HFS+ as well as > netatalk's .AppleDouble scheme, Mac OS X's ._<filename> scheme, and > MacBinary for all the rest. This can certainly be a goal of the > implementation, but personally I am interested in the HFS+ on Mac > OS X part of the problem. > > My implementation, whether it is MacBinary based or a change the > the protocol, will leave room for these alternative schemes. Right > now, I am thinking that MacBinary is the way to go. This doesn't > give the flexibility and extensibility that a protocol change would > give, but it does have the benefit of supporting existing rsync > versions. > > Chris I., I'm not sure what you mean by "done at the Darwin level". > If you mean that it should be done based on Darwin/BSD APIs and not > Carbon/Cocoa APIs, then I am in full agreement with you. The calls > that I'd use to access the resource fork are posix calls > (essentially, it's just an open() call), although the calls to get > HFS metadata are Mac OS X-specific (but not Carbon calls). > > Anyway, I'm still mulling all this over, so any suggestions are > more than welcome. Once a path is chosen and code is written, > things will be harder to change ;-) > > > Chris Garrigues wrote: > >> A quick thought about implementation details: It would be nice if >> this were >> done in such a way that if I were to rsync from a non-OSX netatalk >> system >> onto an OSX system the .AppleDouble directories would be merged >> back into the >> files, and conversely if I were to rsync from an OSX system to a >> netatalk >> system the resource forks would be split into .AppleDouble >> directories. >> >> I guess this would be simplest with scheme 2 above. > > David Feldman wrote:>> I'm not familiar with netatalk, but along a similar line, Mac OS X >> stores resource forks and metadata differently on HFS+ and >> single-fork volumes (such as UFS or NFS). If you copy a file from >> an HFS+ volume over to a single-fork volume using the Finder it'll >> split the pieces apart and save the resource fork and metadata >> under variations of the original filename. I don't remember the >> exact names but I think they're in the Mac OS X System Overview >> document...something like ._<original filename>. >> >> If there's a way I can help with the porting effort please let me >> know. I don't know a lot about the lower-level details, but do >> know C, C++, Cocoa, etc. and would be interested in looking at the >> BSD-level info you have on transferring OS X files. >> >> As I stated in my earlier message, my primary interest is >> synchronization of desktop and laptop, though backup would be >> terrific too. I'm pretty sure there are a lot of OS X users out >> there in need of both. I'm currently synchronizing with a shell >> script that uses ditto. > > > Chris Irvine wrote: > >> I would lean toward option "1" for several reasons. Primarily it >> could probably inter-operate safely with non-HFS or older versions. >> >> How about a flag that changes the mode to detect named forks and >> encode them in-line. These encoded files could be safely synced to >> non-forked storage destinations or tape. A simple tag passed at >> the beginning of a session could notify the destination that >> MacBinary decoding could be attempted if available. >> >> I also understand the need for named resource files for systems >> like netatalk. The problem with this is that every named fork >> system is different: netatalk, Xinet, Helios, OSX Finder. This is >> a lot to chew. I would rather the user post process files to get >> them into the named fork method if they must. If you are going >> between two systems using the named fork technique, this whole >> process is unneeded. >> >> Option "3" might be the best. It seems to me that this could end >> up requiring a lot of changes to the protocol. >> >> It should also be noted, that a project like this should be done >> at the Darwin level. There have also been discussions on the >> darwin-development list in June 01. No one really stared anything, >> however they did discuss at length how access to resource forks >> might be done while stying inside posix calls.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 6864 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20020205/c1db8caa/attachment.bin
OK, I'm brand new to this group, brand new to rsync, brand new to unix in general. I'm trying to play catch up with this discussion so there are likely many misconceptions that I have about these issues. My goal is to create a tool that does backup and restore only transferring changes. It will connect to a server running Linux from Mac OS X and preserve all metadata without the user ever knowing there is an issue. I've found the rsync algorithm is a good start and it sounds like you all have the same idea. I don't think I like the idea of the MacBinary solution, in that I can see some configuration of the tool that the user will have to worry about. We obviously don't want the overhead of flattening files without forks or files that have FileInfo that can be determined from other metadata strategies. The user might have to maintain a list of files they use... How do I handle this file or that (? la mac cvs tools). I see another user experience issue with the MacBinary solution and the protocol change. What do the files look like when they get backed up? If I connect to the server via the finder am I going to see a bunch of files that are 'archived' or do I get the real deal. As a user I wouldn't use rsync if I couldn't just go and grab the files that got backed up. Not that running the file through stuffit is a big deal but it going to seems a bit clunky even if the solution is in fact much more extensible. What format is this new protocol going to produce? Will the only way to get to the files be to use the rsync client? Sorry, that's just not acceptable. The only solution left is to pre-process the file by splitting it before before creating the change lists so that comparisons can be made if the file is split on the server. There will have to be some intelligence about what method of splitting is used on the server but I'm positive that couldn't be too hard to determine. Directory metadata just has to be handled in another file as well, isn't that what .DSInfo files are? I'm starting to think that what I'm proposing is more of a combination of 2) and 3). Wouldn't it be great if we could support ACL's as well. Please tell me if I'm way off base here. One other question that I'm sure will show my ignorance of Darwin development. What is the issue with using the high level API's if the output is compatible with the other platforms running rsync. What is the advantage of trying for posix purity or code at the "Darwin level" if the code is only going to be used on Macs running the higher level stuff anyway? If you don't have a forked file system why would you care if you don't know how to handle forks? I'm planning on taking this project on full time and we would all benefit if we can all agree on a direction. Lets get this thing going, Terrence Geernaert Mark Valence wrote:> 1) convert (on the fly) all files to MacBinary before > comparing/sending them to the destination. MacBinary is a well > documented way to package an HFS file into a single data file. The > benefits with this method are compatibility with existing rsync > versions that are not MacBinary aware, while the drawbacks are > speed, maintainability, and that directory metadata is not > addressed at all. > > 2) Treat the two forks and metadata as three separate files for the > purposes of comparison/sending, and then reassemble them on the > destination. Same drawbacks and benefits of the MacBinary route. > This would also take more memory (potentially three times the > number of files in the flist). > > 3) Change the protocol and implementation to handle arbitrary > metadata and multiple forks. This could be made sort-of compatible > with existing rsync's by using various tricks, but the most > efficient way would be to alter the protocol. Benefits are that this > would make the protocol extensible. Metadata can be "tagged" so > that you could add any values needed, and ignore those tags that > are not understood or supported. Any number of forks could be > supported, which gives a step up in supporting NTFS where a file can > have any number of "data streams". In fact, forks and metadata > could all be done in the same way in the protocol.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 4265 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20020205/45e0384d/attachment.bin
Seemingly Similar Threads
- Feature request: Sync Mac OS resource forks and metadata on Mac OS X
- Mac OS X HFS+ metadata patch, take 2
- HFS+ resource forks: WIP patch included
- Aw: Re: Re: rsync not copy all information for font file
- Aw: Re: Re: rsync not copy all information for font file