Now that the documentation section is expanding, I can make a not-so-ill-informed comment about zfs and the Mac OS. If I''m reading the documentation correctly, the news is unfortunately bad. The "problem" with Mac support is that the Mac stored all of its display-related information in the directory itself. By this I refer to things like the icon''s position within its directory window, the "type and creator" flags that are an analog to a file extension, and even things like window sizing, position and scroll location for folders. The idea here was that on a floppy, which the FS was targetted for way back when, a single pass over the flat-file directory would give you EVERYTHING you needed to draw the display. Given the extremely limited bandwidth, this was an extremely practical decision, if potentially limiting. For the interested, here''s the info needed: http://developer.apple.com/technotes/tn/tn1150.html#FinderInfo It''s a pair of four-byte strings, three 2x16-bit Points and potentially a 32-bit Rect, and a couple of flags, the vast majority of which are no longer used or are duplicated in zfs''s directory ZAP anyway. Now generally the Unix nerds would suggest hanging the Mac''s extra information in the xattrs.The problem here is that zfs stores all xattrs separately from the ZAP directory entry. So while it is physically possible to make another ACE for the Mac info and hang it off the xattr "inode" (ZAP), this means that drawing a window in the Finder will require a file system walk! This just isn''t going to work, Mac users won''t accept PC-slow directory displays -- I know I wouldn''t. What I don''t understand is why this disturbingly inflexible design was chosen. Note that the ACL ACE is built in order to store up to six entries in-line, which likely serves 95% of all cases. Why an identical solution was not used for xattrs absolutely baffles me. An identical six entry with overflow ACE dir would work wonderfully for xattrs, and in this particular case, would store all the needed Mac-related goodness. So can anyone tell me why xattrs weren''t handled in the same way as ACLs? It smells of inside-the-box-thinking, but I''m no FS expert and there may very well be a good reason. And if there isn''t a good reason, is it simply too late to fix this? If xattr were moved below gid and used up the pad[4], that would at least give us something useful. Maury This message posted from opensolaris.org
I don''t disagree with the below, however, you can run your mac on UFS instead of HFS+. Since UFS hasn''t been mac-ified, I''m wondering if the below is actually true for all filesystem types. On May 2, 2006, at 2:02 PM, Maury Markowitz wrote:> Now that the documentation section is expanding, I can make a not- > so-ill-informed comment about zfs and the Mac OS. If I''m reading > the documentation correctly, the news is unfortunately bad. > > The "problem" with Mac support is that the Mac stored all of its > display-related information in the directory itself. By this I > refer to things like the icon''s position within its directory > window, the "type and creator" flags that are an analog to a file > extension, and even things like window sizing, position and scroll > location for folders. The idea here was that on a floppy, which the > FS was targetted for way back when, a single pass over the flat- > file directory would give you EVERYTHING you needed to draw the > display. Given the extremely limited bandwidth, this was an > extremely practical decision, if potentially limiting. > > For the interested, here''s the info needed: > > http://developer.apple.com/technotes/tn/tn1150.html#FinderInfo > > It''s a pair of four-byte strings, three 2x16-bit Points and > potentially a 32-bit Rect, and a couple of flags, the vast majority > of which are no longer used or are duplicated in zfs''s directory > ZAP anyway. > > Now generally the Unix nerds would suggest hanging the Mac''s extra > information in the xattrs.The problem here is that zfs stores all > xattrs separately from the ZAP directory entry. So while it is > physically possible to make another ACE for the Mac info and hang > it off the xattr "inode" (ZAP), this means that drawing a window in > the Finder will require a file system walk! This just isn''t going > to work, Mac users won''t accept PC-slow directory displays -- I > know I wouldn''t. > > What I don''t understand is why this disturbingly inflexible design > was chosen. Note that the ACL ACE is built in order to store up to > six entries in-line, which likely serves 95% of all cases. Why an > identical solution was not used for xattrs absolutely baffles me. > An identical six entry with overflow ACE dir would work wonderfully > for xattrs, and in this particular case, would store all the needed > Mac-related goodness. > > So can anyone tell me why xattrs weren''t handled in the same way as > ACLs? It smells of inside-the-box-thinking, but I''m no FS expert > and there may very well be a good reason. > > And if there isn''t a good reason, is it simply too late to fix > this? If xattr were moved below gid and used up the pad[4], that > would at least give us something useful. > > Maury > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, IT Architect Phone: (303) 673-8273 Fax: (303) 673-8273 ITCTO Group, Sun Microsystems Inc. 1 StorageTek Drive ULVL4-382 greg.shaw at sun.com (work) Louisville, CO 80028-4382 shaw at fmsoft.com (home) "When Microsoft writes an application for Linux, I''ve Won." - Linus Torvalds
On Tue, May 02, 2006 at 01:02:59PM -0700, Maury Markowitz wrote:> The "problem" with Mac support is that the Mac stored all of its > display-related information in the directory itself.As you''ve mentioned, performance considerations mean that extended attributes may not be the best solution for storing this information. Unfortunately, embedding the extended attributes in the znode_phys_t (as you suggest) is not really practical. The extended attributes are full-fledged files, with all the associated metadata (permissions, dates, etc). There simply isn''t enough space in the znode_phys_t to store even a single extended attribute. Even if you invented a new interface for name-value extended properties on files, it would be tough to fit many properties in the unused space in the znode_phys_t. But such an interface might be worth considering, if there are important uses for it (the Mac windowing information may be one). One way to store this information would be to simply add it to the znode_phys_t (by using a couple words of the padding). This would result in very good performance, since you wouldn''t even have to find the attribute with the matching name. Keep in mind that the whole idea of storing this windowing information with the file presumes that there are not multiple (hard) links to a file. If there are multiple names for the file (ie. it exists in multiple directories), it probably makes more sense to store the windowing information with the directory entry. That change would be fairly straightforward: use the ZAP to store a larger value in the directory entry, adding words that describe the window information. --matt
On Tue, May 02, 2006 at 05:10:09PM -0400, Maury Markowitz wrote:> >Unfortunately, embedding the extended attributes in the znode_phys_t (as > >you suggest) is not really practical. The extended attributes are > >full-fledged files > > I think that''s the disconnect. WHY are they "full-fledged files"?Because that''s what the specification calls for. If they weren''t full-fledged files, they wouldn''t be compatable with existing interfaces. That wouldn''t necessarily be a bad thing, just a different thing. As I mentioned, a new, lighter-weight interface could be designed in addition to extended attributes.> >windowing information with the directory entry. That change would be > >fairly straightforward: use the ZAP to store a larger value in the > >directory entry, adding words that describe the window information. > > I think I need a little hand holding here. Where is the ZAP in relation to > the directory or file? I read the documentation to imply that the ZAP was > the directory (page 46, bottom) effectively. However, looking at it now, I > see there is no file name for instance. > > Reading over section 5, and I correct that a znode_phys_t is "built up > inside" a ZAP (potentially micro), and the ZAP itself holds the > znode_phys_t and other additional information? If this is correct, is there > a diagram somewhere that illustrates the resulting overall structure?Ah, I see that you''ve been reading the on-disk format document. Section 6.2 describes the relation between the ZAP, directory entries, and the znode_phys_t: Filesystem directories are implemented as ZAP objects. Each directory holds a set of name-value pairs which contain the names and object numbers for each director entry. Traversing through a directory tree is as simple as looking up the value for an entry and reading that object number. All filesystem objects contain a znode_phys_t structure in the bonus buffer of it''s dnode. This structure stores the attributes for the filesystem object. To recap, the ZAP implements a directory, mapping from file name to object number. That object number identifies (ie. refers to) a particular dnode, which contains a znode_phys_t in the bonus buffer. As I mentioned, the directory could potentially map from file name to object number + some additional information, since the values stored in the ZAP can be variable-length. Could elaborate on what makes it seem like there "is no file name" and that the "ZAP itself holds the znode_phys_t"? Then we can change that documentation to make it clear that that is not the case. --matt
On Wed, May 03, 2006 at 03:22:53PM -0400, Maury Markowitz wrote:> >> I think that''s the disconnect. WHY are they "full-fledged files"? > > > >Because that''s what the specification calls for. > > Right, but that''s my concern. To me this sounds like "historically > circular" reasoning...> 20xx) we need a new file system that supports xaddrs > well xaddrs are this second file, so...> To me it appears that there is some confusion between the purpose and > implementation.> Certainly if xaddrs were originally introduced to store, well, "x" > addrs, then the implementation is a poor one. Years later the > _implementation_ was copied, even though it was never a good one.I think you are confusing the interface with the implementation. ZFS has "copied" (aka. adhered to) a pre-existing interface[*]. Our implementation of that interface is in some ways similar to other implementations. I believe that our implementation is a very good one, but if you have specific suggestions for how it could be improved, we''d love to hear them. [*] The solaris extended attributes interface is actually more accurately called "named streams", and has been used as the back-end for CIFS (Windows) and NFSv4 named-streams protocols. See the fsattr(5) manpage. We appreciate your suggestion that we implement a higher-performance method for storing additional metadata associated with files. This will most likely not be possible within the extended attribute interface, and will require that we design (and applications use) a new interface. Having specific examples of how that interface would be used will help us to design a useful feature.> The real problem is that there is nothing like a "general overview" of > the zfs system as a wholeI agree that a higher-level overview would be useful.> COMPARING the system with the widely understood UFS would be > invaluable, IMHO.Agreed, thanks for the suggestion. Unfortunately, ZFS and UFS are sufficiently different that I think the comparison would only be useful for a very limited part of ZFS, say from the file/directory down.> But to the specifics. You asked why I thought it was that the file > name did not appear. Well, that''s because the term "file name" (or > "filename") does not appear anywhere in the document.Thanks, maybe we should use that keyword in section 6.2 to help when doing a search.> So then, at a first glance it seems that one would expect to find the > directory description in Chapter 6, which has a subsection called > "Directories and Directory Traversal".I believe that that section does in fact describe directories. Perhaps the description could be made more explicit (eg. "The ZAP object which stores the directory maps from filename to object number. Each entry in the ZAP is a single directory entry. The entry''s name is the filename, and its value is the object number which identifies that file.> That section describes the znode_phys_t structure.You''re right, it also describes the znode_phys_t. There should be a section break after the first paragraph, before we start talking about the znode_phys_t.> Maybe I''m going down a dark alley here, but is there any reason this > split still exists under zfs? IE, I asumed that the znode_phys_t would > be located in the directory ZAP, because to my mind, that''s where > metadata belongs.ZFS must support POSIX semantics, part of which is hard links. Hard links allow you to create multiple names (directory entries) for the same file. Therefore, all UNIX filesystems have chosen to store the file information separately for the directory entries (otherwise, you''d have multiple copies, and need pointers between all of them so you could update them all -- yuck). Hard links suck for FS designers because they constrain our implementation in this way. We''d love to have the flexability to easily store metadata with the directory entry. We''ve actually contemplated caching the metadata needed to do a stat(2) in the directory entry, to improve performance of directory traversals like find(1). Perhaps we''ll be able to add this performance improvement in an future release. --matt
On Wed, 2006-05-03 at 17:20, Matthew Ahrens wrote:> We appreciate your suggestion that we implement a higher-performance > method for storing additional metadata associated with files. This will > most likely not be possible within the extended attribute interface, and > will require that we design (and applications use) a new interface. > Having specific examples of how that interface would be used will help > us to design a useful feature.another potential consumer for an extra-metadata extension is trusted extensions, for per-file security labels and similar obscurity.
> ZFS must support POSIX semantics, part of which is hard links. Hard > links allow you to create multiple names (directory entries) for the > same file. Therefore, all UNIX filesystems have chosen to store the > file information separately for the directory entries (otherwise, you''d > have multiple copies, and need pointers between all of them so you could > update them all -- yuck).For what it''s worth, some file systems have chosen to special-case hard links because they are rare and the directory/inode split hurts performance. Apple''s HFS is a case in point. The file metadata ("inode") is part of the directory entry, so that no additional disk access is required to retrieve it. If the file is a hard link, this metadata is a pointer to the shared metadata for the file. Anton This message posted from opensolaris.org
>> ZFS must support POSIX semantics, part of which is hard links. Hard >> links allow you to create multiple names (directory entries) for the >> same file. Therefore, all UNIX filesystems have chosen to store the >> file information separately for the directory entries (otherwise, you''d >> have multiple copies, and need pointers between all of them so you could >> update them all -- yuck). > > For what it''s worth, some file systems have chosen to special-case hard links > because they are rare and the directory/inode split hurts performance. Apple''s > HFS is a case in point. The file metadata ("inode") is part of the directory entry, > so that no additional disk access is required to retrieve it. If the file is a hard > link, this metadata is a pointer to the shared metadata for the file.Yes, Microsoft''s FAT does it the same way - the dirent is the inode. This creates locking nightmares in its own right - directory scans/updates may be blocking file access; at the very least, the two race. It might have advantages in some situations, and simplifies the metadata implementation - but at least to me, it also causes headaches ... and an upset stomach every now and then ... FrankH.
On Thu, May 04, 2006 at 10:05:31AM -0400, Maury Markowitz wrote:> Hmmm, where in 6.2 is the filename? I see the description of the > znode_phys_t, which doesn''t have it, and "Each directory holds a set > of name-value pairs which contain the names and object numbers for > each directory entry." Is that "names" the filenames?Yes, the names that are stored in directory entries are filenames.> And is it accurate to go further, that the "object number" is a > pointer to a dnode? If so, what is the conceptual difference here with > UFS, where the directory stores a filename and pointer to an inode?Yes. The concept is the same between ZFS and UFS (and every other UNIX filesystem I''m aware of) -- the directory stores a mapping from filename to some number, which points to the structure that describes the file (dnode/znode, inode, etc).> >We''ve actually contemplated caching the metadata needed to do a > >stat(2) in the directory entry, to improve performance of directory > >traversals like find(1). > > So how would this work? I assume the extra data is added as additional > key/values into the directory, but how do they keep in sync with changes to > the znode_phys_t?We''d probably make the existing value longer, and store the additional info in the existing name/value entry. Keeping the changes in sync is the challenge. --matt
Maury Markowitz <maury_markowitz at hotmail.com> wrote:> So can anyone tell me why xattrs weren''t handled in the same way as ACLs? It smells of inside-the-box-thinking, but I''m no FS expert and there may very well be a good reason.They are: ACLs (ar least in UFS) are inside a shadow inode that is referenced in the main inode. XATTRs are inside a "shadow" XATTR directors that is referenced in the inode. The difference between Apple does and what Sun does it that Sun''s imlementation is not limited and also serves wishes that come from Microsoft. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Gregory Shaw <greg.shaw at sun.com> wrote:> I don''t disagree with the below, however, you can run your mac on UFS > instead of HFS+. Since UFS hasn''t been mac-ified, I''m wondering if > the below is actually true for all filesystem types.It seems that UFS has been "mac-ified" on MacOS X. IIRC, you may append "/rsrc" to any file name. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Matthew Ahrens <ahrens at eng.sun.com> wrote:> > I think that''s the disconnect. WHY are they "full-fledged files"? > > Because that''s what the specification calls for. If they weren''t > full-fledged files, they wouldn''t be compatable with existing > interfaces. That wouldn''t necessarily be a bad thing, just a different > thing. As I mentioned, a new, lighter-weight interface could be > designed in addition to extended attributes.Do you believe this should be done using a new different basic implementation? Note that I would need to know this in order to decide on how to support XATTRs in a OS independent way inside star. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Frank Hofmann <Frank.Hofmann at Sun.COM> wrote:> Yes, Microsoft''s FAT does it the same way - the dirent is the inode. > > This creates locking nightmares in its own right - directory scans/updates > may be blocking file access; at the very least, the two race. It might > have advantages in some situations, and simplifies the metadata > implementation - but at least to me, it also causes headaches ... and an > upset stomach every now and then ...With FAT, you are right, but there are other ways to implement hard links. Look at my WOFS from 1990... It uses ''gnodes'' that include the filename in one single meta data chunk for a file. Hard links are implemented as inode number related soft links (while symlinks are name related soft links). If ZFS did use my concept, you don''t have the problems you have with FAT. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
"Maury Markowitz" <maury_markowitz at hotmail.com> wrote:> I believe xattrs were added to store things just like what we''re talking > about here. Specifically, if I''m not mistaken, many originally used them for > ALC storage. Now that zfs promotes ACL''s to first-class citizens, it seems > that a reevaluation of what people ACTUALLY DO with xattrs and whether or > not the current mechanism is "correct" for this role certainly seems in > order. If some large percentage of use-cases turns out to be "storing > ACL''s", and some smaller percentage is "other OS metadata", then certainly > it seems that a dedicated "expanding" (as in the zfs ACL system) key/value > pair storage system seems to make sense. But that implies more API, which I > don''t think anyone would want.XATTRs have been implemented recently while ACLs exist for a long time. Solaris-2.4 has them already inside UFS.... The fact that Linux and FreeBSD did put them into XATTRs is a hack and the way they implement ACLs make them slow. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On 07 May 2006, at 17:03, Joerg Schilling wrote:> Look at my WOFS from 1990... It uses ''gnodes'' that include the > filename > in one single meta data chunk for a file. Hard links are > implemented as > inode number related soft links (while symlinks are name related > soft links). > > If ZFS did use my concept, you don''t have the problems you have > with FAT.Yes, but WOFS is a write-once filesystem. ZFS is read-write. What happens if you delete the file referenced by the inode-softlinks? Wout.
Wout Mertens <wmertens at cisco.com> wrote:> > On 07 May 2006, at 17:03, Joerg Schilling wrote: > > > Look at my WOFS from 1990... It uses ''gnodes'' that include the > > filename > > in one single meta data chunk for a file. Hard links are > > implemented as > > inode number related soft links (while symlinks are name related > > soft links). > > > > If ZFS did use my concept, you don''t have the problems you have > > with FAT. > > Yes, but WOFS is a write-once filesystem. ZFS is read-write. What > happens if you delete the file referenced by the inode-softlinks?WOFS lives on a Write once medium, WOFS itself is not write once. I would need to check my papers.... there is a solution. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Tue, May 09, 2006 at 05:37:07PM +0200, Joerg Schilling wrote:> Wout Mertens <wmertens at cisco.com> wrote: > > On 07 May 2006, at 17:03, Joerg Schilling wrote: > > > If ZFS did use my concept, you don''t have the problems you have > > > with FAT. > > > > Yes, but WOFS is a write-once filesystem. ZFS is read-write. What > > happens if you delete the file referenced by the inode-softlinks? > > WOFS lives on a Write once medium, WOFS itself is not write once. > > I would need to check my papers.... there is a solution.If you unlink the original name/inode entry you can mark it as deleted without actually deleting it, thus leaving extant links to it live and fresh; you only have to chase down the other links when the original''s directory is removed, though you can do the same to directories, deferring actual removal to later (at the cost of wasting disk space and complicating accounting). And/or you can always lazily update back references, if you throw in a log, so that you can find those when they are needed. Note that this assymetric hard-linking approach makes original links fast and others slow; this will surely bother someone :) Nico --
On 09 May 2006, at 18:09, Nicolas Williams wrote:> On Tue, May 09, 2006 at 05:37:07PM +0200, Joerg Schilling wrote: >> Wout Mertens <wmertens at cisco.com> wrote: >>> >>> Yes, but WOFS is a write-once filesystem. ZFS is read-write. What >>> happens if you delete the file referenced by the inode-softlinks? >> >> WOFS lives on a Write once medium, WOFS itself is not write once. >> >> I would need to check my papers.... there is a solution. > > If you unlink the original name/inode entry you can mark it as deleted > without actually deleting it, thus leaving extant links to it live and > fresh; you only have to chase down the other links when the original''s > directory is removed, though you can do the same to directories, > deferring actual removal to later (at the cost of wasting disk > space and > complicating accounting). And/or you can always lazily update back > references, if you throw in a log, so that you can find those when > they > are needed. > > Note that this assymetric hard-linking approach makes original links > fast and others slow; this will surely bother someone :)How about, as soon as a file is hardlinked, you move it to a special invisible directory, and make both the target and the source be your special inode-symlink? That solves the directory deletion and makes the access symmetrical... Wout.
Wout Mertens <wmertens at cisco.com> wrote:> >> WOFS lives on a Write once medium, WOFS itself is not write once. > >> > >> I would need to check my papers.... there is a solution. > > > > If you unlink the original name/inode entry you can mark it as deleted > > without actually deleting it, thus leaving extant links to it live and > > fresh; you only have to chase down the other links when the original''s > > directory is removed, though you can do the same to directories, > > deferring actual removal to later (at the cost of wasting disk > > space and > > complicating accounting). And/or you can always lazily update back > > references, if you throw in a log, so that you can find those when > > they > > are needed. > > > > Note that this assymetric hard-linking approach makes original links > > fast and others slow; this will surely bother someone :) > > How about, as soon as a file is hardlinked, you move it to a special > invisible directory, and make both the target and the source be your > special inode-symlink? That solves the directory deletion and makes > the access symmetrical...Possible implementations are discussed in my master thesis: e.g. Page 1042 "OpenSolaris f?r Anwender, Administratoren und Rechenzentren" ISBN: 3-540-29236-5 Or page 16 of the master thesis: http://cdrecord.berlios.de/old/private/wofs.ps.gz J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On 09 May 2006, at 23:48, Joerg Schilling wrote:> Wout Mertens <wmertens at cisco.com> wrote: > >>>> WOFS lives on a Write once medium, WOFS itself is not write once.Oops, now that I read your thesis, I see. So you can treat a WORM like a normal disk. Cool :) How come it never got traction? There was a time in the 90s when it would have been great to have a stable writeable filesystem on cheap CD-ROMs. In fact, with dual layer DVDs right now, it still would have its uses. BTW, finding WOFS on Google is not the easiest thing :-( While reading your thesis, I kept thinking that it should be possible for ZFS to be implemented on top of a WORM as well. Most of the hard bits are already done, checksums and copy-on-write. The only issue would be the root blocks that can''t be overwritten, but that can be fixed like in your thesis, by growing an array from one side of the disk and putting all the data on the other side. BTW, I would have put the generation nodes at the beginning of the disk and the data at the end, because then you can read the whole array in one big sequential gulp.>>>> I would need to check my papers.... there is a solution. >>> >>> If you unlink the original name/inode entry you can mark it as >>> deleted >>> without actually deleting it, thus leaving extant links to it >>> live and >>> fresh; you only have to chase down the other links when the >>> original''s >>> directory is removed, though you can do the same to directories, >>> deferring actual removal to later (at the cost of wasting disk >>> space and >>> complicating accounting). And/or you can always lazily update back >>> references, if you throw in a log, so that you can find those when >>> they >>> are needed. >>> >>> Note that this assymetric hard-linking approach makes original links >>> fast and others slow; this will surely bother someone :) >> >> How about, as soon as a file is hardlinked, you move it to a special >> invisible directory, and make both the target and the source be your >> special inode-symlink? That solves the directory deletion and makes >> the access symmetrical... > > Possible implementations are discussed in my master thesis:But it doesn''t mention this solution ;-) Anyway, we might be straying a bit too far from the topic. Wout.
Wout Mertens <wmertens at cisco.com> wrote:> >>>> WOFS lives on a Write once medium, WOFS itself is not write once. > > Oops, now that I read your thesis, I see. So you can treat a WORM > like a normal disk. Cool :)Thank you ;-)> How come it never got traction? There was a time in the 90s when it > would have been great to have a stable writeable filesystem on cheap > CD-ROMs. In fact, with dual layer DVDs right now, it still would have > its uses. BTW, finding WOFS on Google is not the easiest thing :-(I have no idea why nobody was interested. I did talk about it many times in the 90s. Maybe the absence of web servers at that time was the reason that the right people did miss the concept.> While reading your thesis, I kept thinking that it should be possible > for ZFS to be implemented on top of a WORM as well.As it seems that ZFS is similar to WOFS in several places you may be right.> Most of the hard bits are already done, checksums and copy-on-write. > The only issue would be the root blocks that can''t be overwritten, > but that can be fixed like in your thesis, by growing an array from > one side of the disk and putting all the data on the other side.Correct.> BTW, I would have put the generation nodes at the beginning of the > disk and the data at the end, because then you can read the whole > array in one big sequential gulp.The gnodes grow backwards, but they are read in forwards.> > Possible implementations are discussed in my master thesis: > > But it doesn''t mention this solution ;-) > > Anyway, we might be straying a bit too far from the topic.Well, I did not say it mentions _all_ posible solutions ;-) J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Tue, 2006-05-16 at 07:59, Joerg Schilling wrote:> I have no idea why nobody was interested. I did talk about it many > times in the 90s. Maybe the absence of web servers at that time > was the reason that the right people did miss the concept.Someone I worked with in the 1985-1987 timeframe also wrote a read-write filesystem for WORM drives, and, as best as I can tell, it rolled over and sank without leaving a trace... - Bill