thr3ads.net - rsync - using rsync with Mac OS X [Dec 2001]

If this information is useful, please help other people find it:
Share via:

David Feldman

2001-Dec-05 05:45 UTC

using rsync with Mac OS X

Rsync looks like a good tool for keeping files synchronized between two 
computers, such as a desktop and laptop.  However, Mac OS uses a forked 
file system, with the second fork storing data critical to some files 
(such as applications) as well as type and creator information matching 
files up with the appropriate apps and file types.

Mac OS X comes with a command-line utility called ditto that can copy 
files while preserving the second fork's information.  There's also a C 
function to do the same thing.  So my questions are:

1. Would it be possible (and how difficult?) to either use or modify 
rsync such that it used ditto or the relevant C call?

2. Would it be worth it over writing a simple app or script that runs on 
one machine?  rsync would lose the ability to copy only the differences 
between files, and would only retain whatever advantages running on both 
machines gives it.

Thanks,
--Dave

Mark Valence

2001-Dec-17 14:26 UTC

head link

using rsync with Mac OS X

As David Feldman wrote recently, rsync looks like it would be very 
useful for Mac OS X systems, where there is currently a dearth of 
options for backup.

I am looking into using rsync to backup/mirror a few systems, but 
there are two changes that I will need to make first, based on two 
file system features:

  - Mac OS X systems use HFS+, which supports files with one or two forks.

  - HFS+ also supports some "meta data" for all files and directories.

There are a few ways to add support for these FS features:

1) convert (on the fly) all files to MacBinary before 
comparing/sending them to the destination.  MacBinary is a well 
documented way to package an HFS file into a single data file.  The 
benefits with this method are compatibility with existing rsync 
versions that are not MacBinary aware, while the drawbacks are speed, 
maintainability, and that directory metadata is not addressed at all.

2) Treat the two forks and metadata as three separate files for the 
purposes of comparison/sending, and then reassemble them on the 
destination.  Same drawbacks and benefits of the MacBinary route. 
This would also take more memory (potentially three times the number 
of files in the flist).

3) Change the protocol and implementation to handle arbitrary 
metadata and multiple forks.  This could be made sort-of compatible 
with existing rsync's by using various tricks, but the most efficient 
way would be to alter the protocol.  Benefits are that this would 
make the protocol extensible.  Metadata can be "tagged" so that you 
could add any values needed, and ignore those tags that are not 
understood or supported.  Any number of forks could be supported, 
which gives a step up in supporting NTFS where a file can have any 
number of "data streams".  In fact, forks and metadata could all be 
done in the same way in the protocol.

So, my question is, has anyone else done work in the areas of 
protocol enhancements and "rich" FS support?

I have lots of experience on the Mac and have the code needed to 
access HFS+ metadata and forks from the BSD layer.  I'm just looking 
for suggestions and news of anyone else working on stuff that might 
dovetail with this.

Also, I'm a bit concerned about the current behavior of reading the 
entire tree into memory, especially the effects that would have on 
large file sets.  Any work being done on this front?

Regards,

Mark.

David Feldman

2001-Dec-18 03:40 UTC

head link

using rsync with Mac OS X

I'm not familiar with netatalk, but along a similar line, Mac OS X 
stores resource forks and metadata differently on HFS+ and single-fork 
volumes (such as UFS or NFS). If you copy a file from an HFS+ volume 
over to a single-fork volume using the Finder it'll split the pieces 
apart and save the resource fork and metadata under variations of the 
original filename.  I don't remember the exact names but I think they're
in the Mac OS X System Overview document...something like ._<original 
filename>.

If there's a way I can help with the porting effort please let me know. 
I don't know a lot about the lower-level details, but do know C, C++, 
Cocoa, etc. and would be interested in looking at the BSD-level info you 
have on transferring OS X files.

As I stated in my earlier message, my primary interest is 
synchronization of desktop and laptop, though backup would be terrific 
too.  I'm pretty sure there are a lot of OS X users out there in need of 
both. I'm currently synchronizing with a shell script that uses ditto.

--Dave

On Monday, December 17, 2001, at 11:25 AM, rsync-request@lists.samba.org 
wrote:
> From: "Chris Garrigues"
<cwg-dated-55c191e81afae8e9@deepeddy.com>
> To: Mark Valence <kurash@sassafras.com>
> Cc: rsync@lists.samba.org
> Subject: Re: using rsync with Mac OS X
> Date: Mon, 17 Dec 2001 09:17:44 -0600
>
> --==_Exmh_771008192P
> Content-Type: text/plain; charset=us-ascii
>
>> From:  Mark Valence <kurash@sassafras.com>
>> Date:  Sun, 16 Dec 2001 22:26:04 -0500
>>
>> 1) convert (on the fly) all files to MacBinary before
>> comparing/sending them to the destination.  MacBinary is a well
>> documented way to package an HFS file into a single data file.  The
>> benefits with this method are compatibility with existing rsync
>> versions that are not MacBinary aware, while the drawbacks are speed,
>> maintainability, and that directory metadata is not addressed at all.
>>
>> 2) Treat the two forks and metadata as three separate files for the
>> purposes of comparison/sending, and then reassemble them on the
>> destination.  Same drawbacks and benefits of the MacBinary route.
>> This would also take more memory (potentially three times the number
>> of files in the flist).
>>
>> 3) Change the protocol and implementation to handle arbitrary
>> metadata and multiple forks.  This could be made sort-of compatible
>> with existing rsync's by using various tricks, but the most
efficient
>> way would be to alter the protocol.  Benefits are that this would
>> make the protocol extensible.  Metadata can be "tagged" so
that you
>> could add any values needed, and ignore those tags that are not
>> understood or supported.  Any number of forks could be supported,
>> which gives a step up in supporting NTFS where a file can have any
>> number of "data streams".  In fact, forks and metadata could
all be
>> done in the same way in the protocol.
>
> A quick thought about implementation details:  It would be nice if this 
> were
> done in such a way that if I were to rsync from a non-OSX netatalk 
> system
> onto an OSX system the .AppleDouble directories would be merged back 
> into the
> files, and conversely if I were to rsync from an OSX system to a 
> netatalk
> system the resource forks would be split into .AppleDouble directories.
>
> I guess this would be simplest with scheme 2 above.
>
> Chris

Chris Irvine

2001-Dec-18 16:36 UTC

head link

using rsync with Mac OS X

I would lean toward option "1" for several reasons. Primarily it could
probably inter-operate safely with non-HFS or older versions.

How about a flag that changes the mode to detect named forks and encode them
in-line. These encoded files could be safely synced to non-forked storage
destinations or tape. A simple tag passed at the beginning of a session could
notify the destination that MacBinary decoding could be attempted if available.

I also understand the need for named resource files for systems like netatalk.
The problem with this is that every named fork system is different: netatalk,
Xinet, Helios, OSX Finder. This is a lot to chew. I would rather the user post
process files to get them into the named fork method if they must. If you are
going between two systems using the named fork technique, this whole process is
unneeded.

Option "3" might be the best. It seems to me that this could end up
requiring a lot of changes to the protocol.

It should also be noted, that a project like this should be done at the Darwin
level. There have also been discussions on the darwin-development list in June
01. No one really stared anything, however they did discuss at length how access
to resource forks might be done while stying inside posix calls.

-Chris

At 8:25 AM -0800 12/17/01, Mark Valence <kurash@sassafras.com>
wrote:>As David Feldman wrote recently, rsync looks like it would be very
>useful for Mac OS X systems, where there is currently a dearth of
>options for backup.
>
>I am looking into using rsync to backup/mirror a few systems, but
>there are two changes that I will need to make first, based on two
>file system features:
>
>  - Mac OS X systems use HFS+, which supports files with one or two forks.
>
>  - HFS+ also supports some "meta data" for all files and
directories.
>
>There are a few ways to add support for these FS features:
>
>1) convert (on the fly) all files to MacBinary before
>comparing/sending them to the destination.  MacBinary is a well
>documented way to package an HFS file into a single data file.  The
>benefits with this method are compatibility with existing rsync
>versions that are not MacBinary aware, while the drawbacks are speed,
>maintainability, and that directory metadata is not addressed at all.
>
>2) Treat the two forks and metadata as three separate files for the
>purposes of comparison/sending, and then reassemble them on the
>destination.  Same drawbacks and benefits of the MacBinary route.
>This would also take more memory (potentially three times the number
>of files in the flist).
>
>3) Change the protocol and implementation to handle arbitrary
>metadata and multiple forks.  This could be made sort-of compatible
>with existing rsync's by using various tricks, but the most efficient
>way would be to alter the protocol.  Benefits are that this would
>make the protocol extensible.  Metadata can be "tagged" so that
you
>could add any values needed, and ignore those tags that are not
>understood or supported.  Any number of forks could be supported,
>which gives a step up in supporting NTFS where a file can have any
>number of "data streams".  In fact, forks and metadata could all
be
>done in the same way in the protocol.
>
>So, my question is, has anyone else done work in the areas of
>protocol enhancements and "rich" FS support?
>
>I have lots of experience on the Mac and have the code needed to
>access HFS+ metadata and forks from the BSD layer.  I'm just looking
>for suggestions and news of anyone else working on stuff that might
>dovetail with this.
>
>Also, I'm a bit concerned about the current behavior of reading the
>entire tree into memory, especially the effects that would have on
>large file sets.  Any work being done on this front?
>
>Regards,
>
>Mark.
-- 
--------------------------------------------------------------
Chris Irvine              On-line store-> http://www.tfaw.com/
Information Systems Manager                phone: 503-652-8815
Dark Horse Comics, Inc.              http://www.darkhorse.com/
PGP Key ID: 0x0263648A
PGP F.P.  8CEF 1BC8 F763 DF79 6F38  3156 EA30 50DF 0263 648A

Terrence Geernaert

2002-Feb-06 06:16 UTC

head link

using rsync with Mac OS X

OK, I'm brand new to this group, brand new to rsync, brand new to 
unix in general.  I'm trying to play catch up with this discussion 
so there are likely many misconceptions that I have about these 
issues.

My goal is to create a tool that does backup and restore only 
transferring changes.  It will connect to a server running Linux 
from Mac OS X and preserve all metadata without the user ever 
knowing there is an issue.  I've found the rsync algorithm is a good 
start and it sounds like you all have the same idea.

I don't think I like the idea of the MacBinary solution, in that I 
can see some configuration of the tool that the user will have to 
worry about.  We obviously don't want the overhead of flattening files 
without forks or files that have FileInfo that can be determined from 
other metadata strategies.   The user might have to maintain a list 
of files they use... How do I handle this file or that  (? la mac cvs 
tools).

I see another user experience issue with the MacBinary solution and 
the protocol change.  What do the files look like when they get 
backed up?  If I connect to the server via the finder am I going to 
see a bunch of files that  are 'archived' or do I get the real deal.  
I would hate to use rsync if I couldn't just go and grab the files 
that got backed up.  Not that running the file through stuffit is a 
big deal but it going to seems like a bit of a kludge to the user 
even if the solution is in fact much more elegant.  What format is 
this new protocol going to produce?  Will the only way to get to the 
files be to use the rsync client?  Sorry, that's just not acceptable.

The only solution left is to pre-process the file by splitting it 
before before creating the change lists.  There will have to be some 
intelligence about what method of splitting was used on the server 
but I'm positive that couldn't be too hard to determine.   Please 
tell me if I'm way off base here.

One other question that I'm sure will show my ignorance of Darwin 
development.  What is the issue with using the high level API's if 
the output is compatible with the other platforms running rsync.  
What is the advantage of trying for posix purity or code at the 
"Darwin level" if the code is only going to be used on Macs running 
the higher level stuff anyway?  If you don't have a forked file 
system why would you care if you don't know how to handle forks?

I'm planning on taking this project on full time and we would all 
benefit if we can all agree on a direction.

Lets get this thing going,
Terrence Geernaert
> Mark Valence wrote:
>
> So, that's one vote each for options 1, 2, and 3 ;-)
>
> I agree that the ideal implementation would support HFS+ as well as 
> netatalk's .AppleDouble scheme, Mac OS X's ._<filename>
scheme, and
> MacBinary for all the rest.  This can certainly be a goal of the 
> implementation, but personally I am interested in the HFS+ on Mac 
> OS X part of the problem.
>
> My implementation, whether it is MacBinary based or a change the 
> the protocol, will leave room for these alternative schemes.  Right 
> now, I am thinking that MacBinary is the way to go.  This doesn't 
> give the flexibility and extensibility that a protocol change would 
> give, but it does have the benefit of supporting existing rsync 
> versions.
>
> Chris I., I'm not sure what you mean by "done at the Darwin
level".
> If you mean that it should be done based on Darwin/BSD APIs and not 
> Carbon/Cocoa APIs, then I am in full agreement with you.  The calls 
> that I'd use to access the resource fork are posix calls 
> (essentially, it's just an open() call), although the calls to get 
> HFS metadata are Mac OS X-specific (but not Carbon calls).
>
> Anyway, I'm still mulling all this over, so any suggestions are 
> more than welcome.  Once a path is chosen and code is written, 
> things will be harder to change ;-)
>
>
> Chris Garrigues wrote:
>
>> A quick thought about implementation details:  It would be nice if 
>> this were
>> done in such a way that if I were to rsync from a non-OSX netatalk 
>> system
>> onto an OSX system the .AppleDouble directories would be merged 
>> back into the
>> files, and conversely if I were to rsync from an OSX system to a 
>> netatalk
>> system the resource forks would be split into .AppleDouble 
>> directories.
>>
>> I guess this would be simplest with scheme 2 above.
>
> David Feldman wrote:
>> I'm not familiar with netatalk, but along a similar line, Mac OS X 
>> stores resource forks and metadata differently on HFS+ and 
>> single-fork volumes (such as UFS or NFS). If you copy a file from 
>> an HFS+ volume over to a single-fork volume using the Finder it'll 
>> split the pieces apart and save the resource fork and metadata 
>> under variations of the original filename.  I don't remember the 
>> exact names but I think they're in the Mac OS X System Overview 
>> document...something like ._<original filename>.
>>
>> If there's a way I can help with the porting effort please let me 
>> know. I don't know a lot about the lower-level details, but do 
>> know C, C++, Cocoa, etc. and would be interested in looking at the 
>> BSD-level info you have on transferring OS X files.
>>
>> As I stated in my earlier message, my primary interest is 
>> synchronization of desktop and laptop, though backup would be 
>> terrific too.  I'm pretty sure there are a lot of OS X users out 
>> there in need of both. I'm currently synchronizing with a shell 
>> script that uses ditto.
>
>
> Chris Irvine wrote:
>
>> I would lean toward option "1" for several reasons. Primarily
it
>> could probably inter-operate safely with non-HFS or older versions.
>>
>> How about a flag that changes the mode to detect named forks and 
>> encode them in-line. These encoded files could be safely synced to 
>> non-forked storage destinations or tape. A simple tag passed at 
>> the beginning of a session could notify the destination that 
>> MacBinary decoding could be attempted if available.
>>
>> I also understand the need for named resource files for systems 
>> like netatalk. The problem with this is that every named fork 
>> system is different: netatalk, Xinet, Helios, OSX Finder. This is 
>> a lot to chew. I would rather the user post process files to get 
>> them into the named fork method if they must. If you are going 
>> between two systems using the named fork technique, this whole 
>> process is unneeded.
>>
>> Option "3" might be the best. It seems to me that this could
end
>> up requiring a lot of changes to the protocol.
>>
>> It should also be noted, that a project like this should be done 
>> at the Darwin level. There have also been discussions on the 
>> darwin-development list in June 01. No one really stared anything, 
>> however they did discuss at length how access to resource forks 
>> might be done while stying inside posix calls.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 6864 bytes
Desc: not available
Url :
http://lists.samba.org/archive/rsync/attachments/20020205/c1db8caa/attachment.bin

Terrence Geernaert

2002-Feb-06 08:15 UTC

head link

using rsync with Mac OS X

OK, I'm brand new to this group, brand new to rsync, brand new to 
unix in general.  I'm trying to play catch up with this discussion 
so there are likely many misconceptions that I have about these 
issues.

My goal is to create a tool that does backup and restore only 
transferring changes.  It will connect to a server running Linux 
from Mac OS X and preserve all metadata without the user ever 
knowing there is an issue.  I've found the rsync algorithm is a good 
start and it sounds like you all have the same idea.

I don't think I like the idea of the MacBinary solution, in that I 
can see some configuration of the tool that the user will have to 
worry about.  We obviously don't want the overhead of flattening files 
without forks or files that have FileInfo that can be determined from 
other metadata strategies.   The user might have to maintain a list 
of files they use... How do I handle this file or that  (? la mac cvs 
tools).

I see another user experience issue with the MacBinary solution and 
the protocol change.  What do the files look like when they get 
backed up?  If I connect to the server via the finder am I going to 
see a bunch of files that  are 'archived' or do I get the real deal.  
As a user I wouldn't use rsync if I couldn't just go and grab the 
files that got backed up.  Not that running the file through stuffit is 
a big deal but it going to seems a bit clunky even if the solution 
is in fact much more extensible.  What format is this new protocol 
going to produce?  Will the only way to get to the files be to use 
the rsync client?  Sorry, that's just not acceptable.

The only solution left is to pre-process the file by splitting it 
before before creating the change lists so that comparisons can be 
made if the file is split on the server.  There will have to be some 
intelligence about what method of splitting is used on the server  
but I'm positive that couldn't be too hard to determine.   Directory 
metadata just has to be handled in another file as well,  isn't that 
what .DSInfo files are?  I'm starting to think that what I'm 
proposing is more of a combination of 2) and 3).  Wouldn't it be 
great if we could support ACL's as well.  Please tell me if I'm way 
off base here.

One other question that I'm sure will show my ignorance of Darwin 
development.  What is the issue with using the high level API's if 
the output is compatible with the other platforms running rsync.  
What is the advantage of trying for posix purity or code at the 
"Darwin level" if the code is only going to be used on Macs running 
the higher level stuff anyway?  If you don't have a forked file 
system why would you care if you don't know how to handle forks?

I'm planning on taking this project on full time and we would all 
benefit if we can all agree on a direction.

Lets get this thing going,
Terrence Geernaert

Mark Valence wrote:
> 1) convert (on the fly) all files to MacBinary before 
> comparing/sending them to the destination.  MacBinary is a well 
> documented way to package an HFS file into a single data file.  The 
> benefits with this method are compatibility with existing rsync 
> versions that are not MacBinary aware, while the drawbacks are 
> speed, maintainability, and that directory metadata is not 
> addressed at all.
>
> 2) Treat the two forks and metadata as three separate files for the 
> purposes of comparison/sending, and then reassemble them on the 
> destination.  Same drawbacks and benefits of the MacBinary route. 
> This would also take more memory (potentially three times the 
> number of files in the flist).
>
> 3) Change the protocol and implementation to handle arbitrary 
> metadata and multiple forks.  This could be made sort-of compatible 
> with existing rsync's by using various tricks, but the most 
> efficient way would be to alter the protocol.  Benefits are that this 
> would make the protocol extensible.  Metadata can be "tagged" so 
> that you could add any values needed, and ignore those tags that 
> are not understood or supported.  Any number of forks could be 
> supported, which gives a step up in supporting NTFS where a file can 
> have any number of "data streams".  In fact, forks and metadata 
> could all be done in the same way in the protocol.-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4265 bytes
Desc: not available
Url :
http://lists.samba.org/archive/rsync/attachments/20020205/45e0384d/attachment.bin

Seemingly Similar Threads

Search for more possibly parallel threads

rsync - Dec 2001 - using rsync with Mac OS X

using rsync with Mac OS X

using rsync with Mac OS X

using rsync with Mac OS X

using rsync with Mac OS X

using rsync with Mac OS X

using rsync with Mac OS X

Seemingly Similar Threads