So, with a not inconsiderable amount of pain, I''ve got attribute caching working on the Mac client. Hooray! But there is one wrinkle that bugs me: I can''t figure out how to cache any information off of the root directory of a filesystem. In MacOS X (and from what I can tell, most vnode-derived systems), you never get a "lookup" call for the root node in your filesystem; it''s managed by the operating system, and when you traverse a mountpoint the operating system then just substitutes "your" root node for the root directory (there''s a special function just to fetch the root node information). The whole setup works out that you never end up with a lookup for the root node (the filesystem path is cached so when you would look up "." in the root directory, it ends up just back at the root node again without a lookup call). And now that I think about it, I''m not sure how you could even do that, since one of the arguments to the lookup RPC is the parent directory and what do you put in for that for the root node? Obviously I can still fetch the directory attributes when the OS asks for it, but in a perfect world I''d cache that information. I''ve looked at the Linux client, but it''s not clear what happens in the root node case. Can someone shed some light on this? Does Linux cache the root node attribute information? If so, how does it do that? --Ken
On 2010-08-12, at 13:12, Ken Hornstein wrote:> So, with a not inconsiderable amount of pain, I''ve got attribute caching > working on the Mac client. Hooray!Yay.> But there is one wrinkle that bugs me: I can''t figure out how to cache any > information off of the root directory of a filesystem. > > In MacOS X (and from what I can tell, most vnode-derived systems), you > never get a "lookup" call for the root node in your filesystem; it''s > managed by the operating system, and when you traverse a mountpoint the > operating system then just substitutes "your" root node for the root > directory (there''s a special function just to fetch the root node information).I believe the Linux VFS will call revalidate on the root inode as part of the path traversal, so that it can present correct information for that inode. See, for example, ll_inode_revalidate_it().> The whole setup works out that you never end up with a lookup for the > root node (the filesystem path is cached so when you would look up "." > in the root directory, it ends up just back at the root node again > without a lookup call). And now that I think about it, I''m not sure > how you could even do that, since one of the arguments to the lookup > RPC is the parent directory and what do you put in for that for the > root node?Just leave it blank.> Obviously I can still fetch the directory attributes when the OS asks for > it, but in a perfect world I''d cache that information. I''ve looked at the > Linux client, but it''s not clear what happens in the root node case. > Can someone shed some light on this? Does Linux cache the root node > attribute information? If so, how does it do that?Caching the entries in the root directory is important, since they are traversed all the time, and also change very rarely. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
>> The whole setup works out that you never end up with a lookup for the >> root node (the filesystem path is cached so when you would look up "." >> in the root directory, it ends up just back at the root node again >> without a lookup call). And now that I think about it, I''m not sure >> how you could even do that, since one of the arguments to the lookup >> RPC is the parent directory and what do you put in for that for the >> root node? > >Just leave it blank.Hrm. I''m not sure that will work. Obviously I can call md_getattr() with just the fid of the root directory; that''s what is done now if no cached data exists. But even if CONNECT_ATTRFID was supported (which it currently is NOT, as we''ve discussed previously), you need the parent FID in there, because if you DON''T provide it, the server will throw an LBUG (which, okay, we all know is a bug, but it''s still clear that the parent fid needs to be there). In the case of md_intent_lock() .... hmmm. I see in ll_revalidate_it() that mountpoints are explicitly skipped (and the attributes are fixed up in ll_inode_revalidate_it(), which from what I see unless there is a lock already, you don''t get a new one). Does d_mountpoint(de) apply to the root node?>From what I can read ... ll_prep_md_op_data() pretty much requires thatthe first argument (the parent inode) needs to be filled in. The server gets a little complicated to follow, but I do see that the server does: rc = mdt_object_exists(parent); inside of mdt_getattr_name_lock(). Which sure would imply to me that things are going to be unhappy if you don''t supply a parent node.>Caching the entries in the root directory is important, since they are >traversed all the time, and also change very rarely.Right, I haven''t yet gotten to caching the directory pages yet, and everything _from_ the root directory down is fine .. it''s just the root itself I can''t quite figure out how to cache the attributes :-/ --Ken
Hi Ken, I hope attached patches will help in your''s work. two patches resolve some performance degradation with access to dentry. and one patch avoid sending two extra RPC when client want access to LOV EA, that strickly need you use use lustre reexport to SMB network, because samba ask xattr for each directory entry when do listing. -------------- next part -------------- A non-text attachment was scrubbed... Name: generate-lovea.diff Type: application/octet-stream Size: 2538 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100814/ad013a72/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: xattr-interop-1.8.diff Type: application/octet-stream Size: 2187 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100814/ad013a72/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: xattr-interop-HEAD.diff Type: application/octet-stream Size: 1322 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20100814/ad013a72/attachment-0002.obj -------------- next part --------------
Hello! On Aug 12, 2010, at 8:37 PM, Ken Hornstein wrote:> In the case of md_intent_lock() .... hmmm. I see in ll_revalidate_it() > that mountpoints are explicitly skipped (and the attributes are fixed up > in ll_inode_revalidate_it(), which from what I see unless there is a lock > already, you don''t get a new one). Does d_mountpoint(de) apply to the > root node?No, d_mountpoint() applies to a dentry on some filesystem where something is mounted. Mountpoints are skipped just because there is no point in looking at their attrs, the attrs should come from the underlying fs mounted below.> >> From what I can read ... ll_prep_md_op_data() pretty much requires that > the first argument (the parent inode) needs to be filled in. The server > gets a little complicated to follow, but I do see that the server does: > > rc = mdt_object_exists(parent); > > inside of mdt_getattr_name_lock(). Which sure would imply to me that things > are going to be unhappy if you don''t supply a parent node.Well, there is a simple rule for a root of the fs. The rule says the parent of the root inode is the root inode. Besides, when you do getattr by fid (which you do from ll_inode_revalidate), you don''t have parent anyway, so you onl supply child inode information. Bye, Oleg
>Well, there is a simple rule for a root of the fs. The rule says >the parent of the root inode is the root inode.Hm, I had just tried that, but it didn''t seem to work; I''ll have to see what''s going on there.>Besides, when you do getattr by fid (which you do from ll_inode_revalidate), >you don''t have parent anyway, so you onl supply child inode information.Well, if you just call md_getattr(), yes, you can do that with just the FID in question; that works fine, and I do that now. But if you want to get a lock so you can cache the attributes using md_intent_lock(), then you DEFINITELY need the parent fid; I learned the hard way that if you don''t include it then the server will throw an LBUG(). This is presuming you ignore the server''s indication of the lack of support for CONNECT_ATTRFID (see previous thread, but the short deal is that it was disabled somewhere between 1.8 and 2.0). --Ken