> Qualities> > 1. Be reasonably portable: at least in principle, it should be > possible to port to Windows, OS X, and various Unixes without major > changes. In general, I would like to see OpenVMS in that list. > Principles > > 1. Clean design rather than micro-optimization. A clean design allows optimization to be done by the compiler, and tight optimization should be driven by profiling tools. > 4. Keep the socket open until the client gets bored. (Avoids startup > time; good for on-line mirroring; good for interactive clients.) I am afraid I do not quite understand this one. Are you refering to a server waiting for a reconnect for a while instead of reconnecting? If so, that seems to be a standard behavior for network daemons. > 5. Similarly, no silly tricks with forking, threads, or nonblocking > IO: one process, one IO. Forking or multiple processes can be high cost on some platforms. I am not experienced with Posix threads to judge their portability. But as long as it is done right, non-blocking I/O is not a problem for me. If you structure the protocol processing where no subroutine ever posts a write and then waits for a read, you can set up a library that can be used either blocking or non-blocking. The same for file access. On OpenVMS, I can do all I/O in a non-blocking manor. The problem is that I must use native I/O calls to do so. If the structure is that after any I/O, control returns to a common point for the next step in the protocol, then it is easy to move from a blocking implementation to a non-blocking one. MACROs can probably be used to allow common code to be used for blocking or non-blocking implementations. Two systems that use non-blocking mode can push a higher data rate through the same time period. This is an area where I can offer help to produce a clean implementation. One of the obstacles to me cleanly implementing RSYNC as a single process is when a subroutine is waiting for a response to a command that it sent. If that subroutine is called from as an asynchronous event, it blocks all other execution in that process. That same practice hurts in SAMBA. > 8. Design for testability. For example: don't rely on global > resources that may not be available when testing; do make behaviours > deterministic to ease testing. Test programs that internally fork() are very troublesome for me. Starting a few hundred individually by a script are not. I can only read UNIX shell scripts of minor complexity. > 10. Have a design that is as simple as possible. > 11. "Smart clients, dumb servers." This is claimed to be a good > design pattern for internet software. rsync at the moment does not > really adhere to it. Part of the point of rsync is that having a > smarter server can make things much more efficient. A strength of > this approach is that to add features, you (often) only need to add > them to the client. It should be a case of who can do the job easier. > 12. Try to keep the TCP pipe full in both directions at all times. > Pursuing this intently has worked well in rsync, but has also led to > a complicated design prone to deadlocks. Deadlocks can be avoided. Make sure if an I/O is initiated, that the next step is to return to the protocol dispatching routine. > General design ideas > > 9 Model files as composed of a stream of bytes, plus an optional > table of key-value attributes. Some of these can be distinguished to > model ownership, ACLs, resource forks, etc. Not portable. This will effectively either exclude all non-UNIX or make it very difficult to port to them. Binary files are a stream of bytes. Text files are a stream of records. Many systems do not store text files as a stream of bytes. They may or may not even be ASCII. If you are going to maintain meta files for ACLs and Resource Forks. Then there should be some provision to supply attributes for an entire directory or individual files. BINARY files are no real problem. The binary is either meaningful on the client or server or it is not. However file attributes may need to be maintained. If the file attributes are maintained, it would be possible for me to have a OpenVMS indexed file moved up to a UNIX server, and then back to another OpenVMS system and be usuable. Currently in order to do so, I must encapsulate them in a .ZIP archive. That is .ZIP, not GZIP or BZIP. On OpenVMS those are only useful to transfer source and a limited subset of binaries. TEXT files are much different than binary files, except on UNIX. A text file needs to be processed by records, and on many systems can not have the records updated randomly, or if they do it is not real efficient. If a target use for this program is to be for assisting in cross platform open source synchronization, then it really needs to properly address the text files. A server should know how to represent a TEXT file in a portable format to the client. Stream records in ASCII, delimited by Line-Feeds is probably the most convenient. The client would be responsible to make sure that a TEXT file is in a local format. Additional note #1: I recall seeing a comment somewhere in this thread about timestamps being left to 16 bits. File timestamps for OpenVMS and for Windows NT are in 64 bits, but use different base dates. Using 16 bits for timestamps will result in a loss of data for these platforms. For applications like open source distribution, the data loss is probably not significant. For BACKUP type applications, it can be significant. Additional note #2: File attributes need to be stored somewhere, so a reserved directory or filename convention will need to be used. I assume that there will be provisions for a server to be marked as a master reference. Additional note #3: For flexability, a client may need to provide filename translation, so the original filename (that will be used on the wire) should be stored as a file attribute. It also follows that it probably is a good idea to store the translated filename as an attribute also. -John wb8tyw@qsl.network Personal Opinion Only
On 22 Jul 2002, "John E. Malmberg" <wb8tyw@qsl.net> wrote:> > Qualities > > > > 1. Be reasonably portable: at least in principle, it should be > > possible to port to Windows, OS X, and various Unixes without major > > changes. > > In general, I would like to see OpenVMS in that list.Yes, OpenVMS, perhaps also QNX and some other TCP/IP-capable RTOSs. Having a portable protocol is a bit more important than a portable implementation. I would hope that with a new system, even if the implementation was unix-bound, you would at least be able to write a new client, reusing some of the code, that worked well on ITS.> A clean design allows optimization to be done by the compiler, and tight > optimization should be driven by profiling tools.Right. So, for example, glib has a very smart assembly ntohl() and LZO is tight code. I would much rather use them than try to reduce the byte count by a complicated protocol.> > 4. Keep the socket open until the client gets bored. (Avoids startup > > time; good for on-line mirroring; good for interactive clients.) > > I am afraid I do not quite understand this one. Are you refering to a > server waiting for a reconnect for a while instead of reconnecting?What I meant is that I would like to be able to open a connection to a server, download a file, leave the connection open, decide I need another file, and then get that one too. You can do this with FTP, and (kindof) HTTP, but not rsync, which needs to know the command up front. Of course the server can drop you too by a timeout or whatever.> If so, that seems to be a standard behavior for network daemons. > > > 5. Similarly, no silly tricks with forking, threads, or nonblocking > > IO: one process, one IO. > > Forking or multiple processes can be high cost on some platforms. I am > not experienced with Posix threads to judge their portability. > > But as long as it is done right, non-blocking I/O is not a problem for me. > > If you structure the protocol processing where no subroutine ever posts > a write and then waits for a read, you can set up a library that can be > used either blocking or non-blocking.Yes, that's how librsync is structured. Is it reasonable to assume that some kind of poll/select arrangement is available everywhere? In other words, can I check to see if input is available from a socket without needing to block trying to read from it? I would hope that only a relatively small layer needs to know about how and when IO is scheduled. It will make callbacks (or whatever) to processes that produce and consume data. That layer can be adapted, or if necessary, rewritten, to use whatever async IO features are available on the relevant platform.> Test programs that internally fork() are very troublesome for me. > Starting a few hundred individually by a script are not.If we always use fork/exec (aka spawn()) is that OK? Is it only processes that fork and that then continue executing the same program that cause trouble?> I can only read UNIX shell scripts of minor complexity.Apparently Python runs on VMS. I'm in favour of using it for the test suite; it's much more effective than sh.> > 12. Try to keep the TCP pipe full in both directions at all times. > > Pursuing this intently has worked well in rsync, but has also led to > > a complicated design prone to deadlocks. > > Deadlocks can be avoided.Do you mean that in the technical sense of "deadlock avoidance"? i.e. checking for a cycle of dependencies and failing? That sounds undesirably complex.> Make sure if an I/O is initiated, that the > next step is to return to the protocol dispatching routine.> > 9 Model files as composed of a stream of bytes, plus an optional > > table of key-value attributes. Some of these can be distinguished to > > model ownership, ACLs, resource forks, etc. > > Not portable. This will effectively either exclude all non-UNIX or make > it very difficult to port to them."Non-UNIX" is not completely fair; as far as I know MacOS, Amiga, OS/2, Windows, BeOS, and QNX are {byte stream + attributes + forks} too. I realize there are platforms which are record-oriented, but I don't have much experience on them. How would the rsync algorithm even operate on such things? Is it sufficient to model them as ascii+linefeeds internally, and then do any necessary translation away from that model on IO?> BINARY files are no real problem. The binary is either meaningful on > the client or server or it is not. However file attributes may need to > be maintained. If the file attributes are maintained, it would be > possible for me to have a OpenVMS indexed file moved up to a UNIX > server, and then back to another OpenVMS system and be usuable.Possibly it would be nice to have a way to stash attributes that cannot be represented on the destination filesystem, but perhaps that is out of scope.> I recall seeing a comment somewhere in this thread about timestamps > being left to 16 bits.No, 32 bits. 16 bits is obviously silly.> File timestamps for OpenVMS and for Windows NT are in 64 bits, but use > different base dates.I think we should use something like 64-bit microseconds-since-1970, with a precision indicator.> File attributes need to be stored somewhere, so a reserved directory or > filename convention will need to be used. > > I assume that there will be provisions for a server to be marked as a > master reference.What do you mean "master reference"?> For flexability, a client may need to provide filename translation, so > the original filename (that will be used on the wire) should be stored > as a file attribute. It also follows that it probably is a good idea to > store the translated filename as an attribute also.Can you give us an example? Are you talking about things like managing case-insensitive systems? -- Martin
> User-Agent: Mozilla/5.0 (X11; U; OpenVMS COMPAQ_AlphaServer_DS10_466_MHz; en-US; rv:1.1a) Gecko/20020614If something as complex as Mozilla can run on OpenVMS then I guess we really have no excuse :-) -- Martin
Lenny Foner <foner-rsync@media.mit.education> wrote:> jw schultz wrote: > I find the use of funny chars (including space) in filenames > offensive but we need to deal with internationalizations and > sheer stupidity. > > Regardless of what you think about them, MacOS comes with pathnames > containing spaces right out of the box (think "System Folder"). Yes, > rsync needs to not make assumptions about what's legal in a filename. > Some OS's think slashes are path separators; some put them inside > individual filenames. Some think [] are separators. We shouldn't > try to make any assumptions.Agreed. For a file distribution program, for each file to be transferred, ideally the server will have a list of how the file should be represented on platforms that the server knows about. The client would be able to tell the server about new platforms, but the server would not be required to remember the information if it did not trust the client. As I work through my back log of e-mail messages, I will give some possible implemention details as answers to other posts. -John wb8tyw@qsl.network Personal Opinion Only
Martin Pool wrote:> > On 22 Jul 2002, "John E. Malmberg" <wb8tyw@qsl.network> wrote: > > >>A clean design allows optimization to be done by the compiler, and tight >>optimization should be driven by profiling tools. > > > Right. So, for example, glib has a very smart assembly ntohl() and > LZO is tight code. I would much rather use them than try to reduce > the byte count by a complicated protocol.Many compilers will inline ntohl() giving the call very low overhead.>> >>>5. Similarly, no silly tricks with forking, threads, or nonblocking >>>IO: one process, one IO. >> >>Forking or multiple processes can be high cost on some platforms. I am >>not experienced with Posix threads to judge their portability. >> >>But as long as it is done right, non-blocking I/O is not a problem for me. >> >>If you structure the protocol processing where no subroutine ever posts >>a write and then waits for a read, you can set up a library that can be >>used either blocking or non-blocking. > > > Yes, that's how librsync is structured. > > Is it reasonable to assume that some kind of poll/select arrangement > is available everywhere? In other words, can I check to see if input > is available from a socket without needing to block trying to read > from it?I can poll, but I prefer to cause the I/O completion to trigger a completion routine. But that is not portable. :-)> I would hope that only a relatively small layer needs to know about > how and when IO is scheduled. It will make callbacks (or whatever) to > processes that produce and consume data. That layer can be adapted, > or if necessary, rewritten, to use whatever async IO features are > available on the relevant platform. > >>Test programs that internally fork() are very troublesome for me. >>Starting a few hundred individually by a script are not. > > If we always use fork/exec (aka spawn()) is that OK? Is it only > processes that fork and that then continue executing the same program > that cause trouble?Mainly. I can deal with spawn() much easier than fork()> >>I can only read UNIX shell scripts of minor complexity. > > Apparently Python runs on VMS. I'm in favour of using it for the test > suite; it's much more effective than sh.Unfortunately the Python maintainer for VMS retired, and I have not been able to figure out how to get his source to compile. I have got the official Python to compile and link with only having to fix one severe programming error. However it still is not running. I am isolating where the problem is in my "free" time.>>>12. Try to keep the TCP pipe full in both directions at all times. >>>Pursuing this intently has worked well in rsync, but has also led to >>>a complicated design prone to deadlocks. >> >>Deadlocks can be avoided. > > Do you mean that in the technical sense of "deadlock avoidance"? > i.e. checking for a cycle of dependencies and failing? That sounds > undesirably complex.No by not using a complex protocol, so that there are no deadlocks.> >>>9 Model files as composed of a stream of bytes, plus an optional >>>table of key-value attributes. Some of these can be distinguished to >>>model ownership, ACLs, resource forks, etc. >> >>Not portable. This will effectively either exclude all non-UNIX or make >>it very difficult to port to them. > > "Non-UNIX" is not completely fair; as far as I know MacOS, Amiga, > OS/2, Windows, BeOS, and QNX are {byte stream + attributes + forks} > too. > > I realize there are platforms which are record-oriented, but I don't > have much experience on them. How would the rsync algorithm even > operate on such things?Record files need to be transmitted on record boundaries, not arbitrary boundaries. Also random access can not be used. The file segments need to be transmitted in order. For a UNIX text file, a record is a line of text deliminated by the line-feed character. [This is turned out to be a big problem in porting SAMBA. An NT client transfers a large file by sending 64K, skipping 32K, sending some more and then sending the 32K later. Samba does not do this, so the resulting corruption of a record structured file did not show up in the initial testing. I still have not found the ideal fix for SAMBA, but implemented a workaround.]> Is it sufficient to model them as ascii+linefeeds internally, and then > do any necessary translation away from that model on IO?Yes as long as no partial records are transmitted. Partial records can be a problem. If I now the rest of the record is coming, then I can wait for it, but if the rest of the record is going to be skipped, then it takes more work.> >>BINARY files are no real problem. The binary is either meaningful on >>the client or server or it is not. However file attributes may need to >>be maintained. If the file attributes are maintained, it would be >>possible for me to have a OpenVMS indexed file moved up to a UNIX >>server, and then back to another OpenVMS system and be usuable. > > Possibly it would be nice to have a way to stash attributes that > cannot be represented on the destination filesystem, but perhaps that > is out of scope.I would anticipate having an optional attribute file for each directory for attributes common to all files in the directory, and optional attribute file for each file. This would allow a server to handle files for other platforms.> >>I recall seeing a comment somewhere in this thread about timestamps >>being left to 16 bits. > > No, 32 bits. 16 bits is obviously silly.Rushed typing. I meant 32 bits. Yes 16 bits is obviously silly.>>File timestamps for OpenVMS and for Windows NT are in 64 bits, but use >>different base dates. > > I think we should use something like 64-bit microseconds-since-1970, > with a precision indicator.It sounds like a set of host functions/macros will be needed.>>File attributes need to be stored somewhere, so a reserved directory or >>filename convention will need to be used. >> >>I assume that there will be provisions for a server to be marked as a >>master reference. > > What do you mean "master reference"?For mirroring of distributions. The server maintained directly by the team would be marked as a tier 1 or "master reference". The next level of mirrors would get higher numbered tiers. More for bookkeeping to where the files are coming from. For example direct access to parts of SAMBA.ORG by everyone would overload the server, so the use of local mirrors are recommended. The primary mirrors would be marked as tier 2, and mirrors of them would get a higher number. Nothing requires this, but this may be useful for some. A level can only accept updates from a lower numbered server.> >>For flexability, a client may need to provide filename translation, so >>the original filename (that will be used on the wire) should be stored >>as a file attribute. It also follows that it probably is a good idea to >>store the translated filename as an attribute also. > > Can you give us an example? Are you talking about things like > managing case-insensitive systems?Yes and other issues. Say you have a source module name foo.dat++, that file name can not be represented on OpenVMS ODS-2 filesystems. One way of doing this is to have the OpenVMS client convert it to FOO.DAT_PLS_PLS when it receives it. However if the OpenVMS system wants to be a mirror for distribution, it needs some way to send that out. For SAMBA, the file would be stored as FOO.DAT__2B__2B. So SAMBA clients will see the original file name, case is not preserved. One other issue with the OpenVMS is for ODS-2, the file names are limited to 39 characters, a period delimiter, and then another 39 characters. So you can see that there are limits to the hex expansion. So the server would need some way of knowing what name that clients need to see the name in. The portable solution is to have an attribute file that can optionally be beside the real file. This attribute file can hold those attributes that are foreign to the host operating system. So UNIX systems would see the file as foo.dat++, and OpenVMS could see the file as FOO.DAT_PLS_PLS or other local format. Client: Hello server, I am looking for the xxxxx distribution. Server: Accepted, I have the xxxxxx distribution. Server: Here is the first directory, name is main.master Here are the global attributes for the directory. Default format for files in this directory is plain text. Client, directory attributes accepted. Please update attributes: x_openvms_filename: main_master Server (possible response one): Sorry, you are not on the list of clients I can accept updates from. Server (possible response two): Update has been submitted to maintainer for review. Server (possible response three) Update has been accepted to attributes. This way the program can optionally deal with platform specific attributes, but not really need to understand them. The attribute file would probably be in plain text. This takes up a bit more room, but makes it maintainable from a text editor. Of course a platform can store the attributes in any fashion, but I would expect that a file would most commonly be used. -John wb8tyw@qsl.network Personal Opinion Only
To help explain why the backup and file distribution have such different implementation issues, let me give some background. This is a dump of an OpenVMS native text file. This is the format that virtually all text editors produce on it. Dump of file PROJECT_ROOT:[rsync_vms]CHECKSUM.C_VMS;1 on 29-JUL-2002 22:02:21.32 File ID (118449,3,0) End of file block 8 / Allocated 8 Virtual block number 1 (00000001), 512 (0200) bytes 67697279 706F4320 20200025 2A2F0002 ../*%. Copyrig 000000 72542077 6572646E 41202943 28207468 ht (C) Andrew Tr 000010 20200024 00363939 31206C6C 65676469 idgell 1996.$. 000020 50202943 28207468 67697279 706F4320 Copyright (C) P 000030 39312073 61727265 6B63614D 206C7561 aul Mackerras 19 000040 72702073 69685420 20200047 00003639 96..G. This pr 000050 Each record is preceded by a 16 bit count of how long the record is. While any value can be present in a record, ususally only printable ASCII is usually present. When this type of file is read in through a C program, the records are translated so that it looks like each line of text is terminated by a line feed character. So if I am just using a program ported from UNIX to read text files, there is no problem. And pure binary files are not a problem because they have attributes that tell the I/O system that they are binary, not text files. But the problem comes in when the remote system sends a request to update the middle of a file. It sends me a byte offset. Now at this point, I have to have kept track independantly in the program where the simulated offset is. Now as long as the file is always sent in sequence, I have a hope of getting this right. If the file updates are sent in a random order, I can not. Now this is the issues for using an rsync() like program for file distribution. All I need to know is if the file being transferred is binary or text. And while the ideal is for the system hosting the file to identify it, this can be faked by having a mapping of file types for default attributes. So for text file transfers, as long as the sections are sent in sequence, not a problem. Now for backup, if I am assuming that the system that will eventually use the backup understands the file format of the source, I can open the files as binary, so I do not have to be concerned about keeping track of where the logical offest maps to the physical offset. However I have a whole new set of issues. The file must be open in "binary" mode. On an fopen() call, the "b" mode qualifier causes the file to be opened in binary mode, so no translation is done. This has no effect on UNIX, but it is important on other file platforms. This flag is documented as part of the ISO C standard, but has no effect on a UNIX platform. For an open() call, a special operating system extension is needed to open the file in binary mode. Then there are the file attributes: CHECKSUM.C_VMS;1 File ID: (118449,3,0) Size: 8/8 Owner: [SYSOP,MALMBERG] Created: 29-JUL-2002 22:01:37.95 Revised: 29-JUL-2002 22:01:38.01 (1) Expires: <None specified> Backup: <No backup recorded> Effective: <None specified> Recording: <None specified> File organization: Sequential Shelved state: Online Caching attribute: Writethrough File attributes: Allocation: 8, Extend: 0, Global buffer count: 0 No version limit Record format: Variable length, maximum 0 bytes, longest 71 bytes Record attributes: Carriage return carriage control RMS attributes: None Journaling enabled: None File protection: System:RWED, Owner:RWED, Group:RWED, World:RE Access Cntrl List: None Client attributes: None And this is for a simple file format. Files can be indexed or have multiple keys. And there is no cross platform API for retrieving all of these attributes, so how do you determine how to transmit them through? Security is another issue: In some cases the binary values for the access control entries needs to preserved, and in other cases, the text values need to be preserved. It also may need a translation from one set of text or binary values to another set. And again, there are no cross platform API's for returning this information. So a backup type application is going to have to have a lot of platform specific tweaks, and some way to pass all this varied information between the client and server. As each platform is added, an extension may need to be developed. A server definitely needs to know if it is in backup mode as opposed to file distribution mode. In file distribution mode, only a few file attributes need to be preserved, and a loss of precision of dates is usually not a problem. So while the two applications could be done in a single image, I still am of the opinion that they should be developed separately. Maybe share a common support library, but I think that keeping them as separate programs may be better for support and development. Especially if you mean for these to be cross platform. It is likely that the backup function otherwise would only be useful for a subset of platforms. Is it fair to have the people that can only use the file distribution part of the package, when porting be burdened with the backup portion? It just seems that it is not too difficult to come up with a cross platform file distribution system that uses the principles developed with rsync. A backup type application is going to be a problem for cross platform, and is likely to be limited to a subset of UNIX systems. Or maybe a build option to build a full function superlifter, or just a superlifter lite? -John wb8tyw@qsl.network Personal Opinion Only