Goswin von Brederlow
2006-Oct-12 11:35 UTC
[Lustre-discuss] Kernel oops with dangling symlinks
Hi, I''m using Lustre 1.4.6 and a vanilla 2.6.15 kernel and dangling symlinks on lustre cause kernel oopses. I have a nfs export mounted on /data/nfs, lustre mounted on /data/lustre (both by the automounter) and a dangling symlink /data/lustre/nonexistent -> /data/nfs/nonexistent. When I run ''cat /data/lustre/nonexistent'' (gives a ''No such file or directory'') the mount count of /data/nfs seem to get reduced one more time than it is raised. This results in the mount count reaching 0 and /data/nfs being sort of umounted. fs/super.c: generic_shutdown_super() gets called and prints ''VFS: Busy inodes after umount. Self-destruct in 5 seconds. Have a nice day...'' Roughly 5 seconds later I get a kernel oops from some NFS function on trying to access /data/nfs. Can anyone reproduce this behaviour or even have a patch for the problem? MfG Goswin
Hello! On Thu, Oct 12, 2006 at 07:34:52PM +0200, Goswin von Brederlow wrote:> When I run ''cat /data/lustre/nonexistent'' (gives a ''No such file or > directory'') the mount count of /data/nfs seem to get reduced one more > time than it is raised. This results in the mount count reaching 0 and > /data/nfs being sort of umounted. fs/super.c: > generic_shutdown_super() gets called and prints ''VFS: Busy inodes > after umount. Self-destruct in 5 seconds. Have a nice day...'' > Roughly 5 seconds later I get a kernel oops from some NFS function > on trying to access /data/nfs.You must be running on x86_64 or other 64 bit arch on client. We run into this problem two days ago as well. The simple workaround for you would be to change return type for ll_follow_link from int to long (in llite/symlink.c). Of course this is not a proper fix (which is under work), but it will get rid of the oops and restore normal operations. Bye, Oleg
Goswin von Brederlow
2006-Oct-13 02:23 UTC
[Lustre-discuss] Kernel oops with dangling symlinks
Oleg Drokin <green@clusterfs.com> writes:> Hello! > > On Thu, Oct 12, 2006 at 07:34:52PM +0200, Goswin von Brederlow wrote: > >> When I run ''cat /data/lustre/nonexistent'' (gives a ''No such file or >> directory'') the mount count of /data/nfs seem to get reduced one more >> time than it is raised. This results in the mount count reaching 0 and >> /data/nfs being sort of umounted. fs/super.c: >> generic_shutdown_super() gets called and prints ''VFS: Busy inodes >> after umount. Self-destruct in 5 seconds. Have a nice day...'' >> Roughly 5 seconds later I get a kernel oops from some NFS function >> on trying to access /data/nfs. > > You must be running on x86_64 or other 64 bit arch on client. > We run into this problem two days ago as well. > The simple workaround for you would be to change return type for ll_follow_link > from int to long (in llite/symlink.c). > Of course this is not a proper fix (which is under work), but it will get rid > of the oops and restore normal operations. > > Bye, > OlegHi, I actualy found that problem 5 minutes after my mail when I run the debug output against the source to see the code path taken for a dangling symlink in lustre and non lustre. I noticed that lustre retruns -2 (debug output in from ll_follow_link) but was quite surprised that __do_follow_link didn''t see that as an error. The relevant code fragment is void *cookie; cookie = dentry->d_inode->i_op->follow_link(dentry, nd); error = PTR_ERR(cookie); if (!IS_ERR(cookie)) { and the prototype is void * (*follow_link) (struct dentry *, struct nameidata *); but thanks to your mail it is pretty clear now. Lustre has static int ll_follow_link(struct dentry *dentry, struct nameidata *nd) resulting in 0xFFFFFFFE being returned instead of 0xFFFFFFFFFFFFFFFE. The correct fix is to change the function prototype to void *, not to long though. The function should also use ERR_PTR(rc) from <linux/err.h> to convert the error code into what linux expects. Just to be correct in case the error encoding ever changes. MfG Goswin
On Oct 13, 2006 10:22 +0200, Goswin von Brederlow wrote:> Oleg Drokin <green@clusterfs.com> writes: > > The simple workaround for you would be to change return type for > > ll_follow_link from int to long (in llite/symlink.c). > > Of course this is not a proper fix (which is under work), but it will get > > rid of the oops and restore normal operations. > > The correct fix is to change the function prototype to void *, not to > long though. The function should also use ERR_PTR(rc) from > <linux/err.h> to convert the error code into what linux expects. Just > to be correct in case the error encoding ever changes.I believe Oleg is aware of that - it''s just that Lustre needs to run on many older kernels as well. The proper fix is to have a configure check for the return type of ->follow_link and add some #ifdefs to the Lustre code to change the return type appropriately. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Hello! On Fri, Oct 13, 2006 at 10:22:25AM +0200, Goswin von Brederlow wrote:> The correct fix is to change the function prototype to void *, not to > long though. The function should also use ERR_PTR(rc) from > <linux/err.h> to convert the error code into what linux expects. Just > to be correct in case the error encoding ever changes.Well, lustre need to compile on a lot of kernels. The correct fix is to have int for kernels that expect int and void * for kernels that expect void *. That would be proper fix and that''s what we plan to include ultimately into our released tree. Bye, Oleg
Goswin von Brederlow
2006-Oct-13 07:00 UTC
[Lustre-discuss] Kernel oops with dangling symlinks
Oleg Drokin <green@clusterfs.com> writes:> Hello! > > On Fri, Oct 13, 2006 at 10:22:25AM +0200, Goswin von Brederlow wrote: > >> The correct fix is to change the function prototype to void *, not to >> long though. The function should also use ERR_PTR(rc) from >> <linux/err.h> to convert the error code into what linux expects. Just >> to be correct in case the error encoding ever changes. > > Well, lustre need to compile on a lot of kernels. > The correct fix is to have int for kernels that expect int and void * > for kernels that expect void *. That would be proper fix and that''s > what we plan to include ultimately into our released tree. > > Bye, > OlegWas that changed between 2.4.x and 2.6.x? I only looked at the docs of the 2.6.15 thats why I came up with void *. No offense ment. MfG Goswin
Hello! On Fri, Oct 13, 2006 at 03:00:14PM +0200, Goswin von Brederlow wrote:> > Well, lustre need to compile on a lot of kernels. > > The correct fix is to have int for kernels that expect int and void * > > for kernels that expect void *. That would be proper fix and that''s > > what we plan to include ultimately into our released tree. > Was that changed between 2.4.x and 2.6.x? I only looked at the docs of > the 2.6.15 thats why I came up with void *. No offense ment.This was in fact changed at least twice. With 2.4 it was returning int and was supposed to actually follow symlink. With early 2.6 (up to 2.6.14, I think) it returns int and might just set symlink info in nameidata to be followed separately, now after 2.6.15 it returns void * with rest of semantic similar to earlier 2.6 Bye, Oleg
Andreas Dilger wrote:> On Oct 13, 2006 10:22 +0200, Goswin von Brederlow wrote: >> Oleg Drokin <green@clusterfs.com> writes: >>> The simple workaround for you would be to change return type for >>> ll_follow_link from int to long (in llite/symlink.c). >>> Of course this is not a proper fix (which is under work), but it will get >>> rid of the oops and restore normal operations. >> The correct fix is to change the function prototype to void *, not to >> long though. The function should also use ERR_PTR(rc) from >> <linux/err.h> to convert the error code into what linux expects. Just >> to be correct in case the error encoding ever changes. > > I believe Oleg is aware of that - it''s just that Lustre needs to run on > many older kernels as well. The proper fix is to have a configure check > for the return type of ->follow_link and add some #ifdefs to the Lustre > code to change the return type appropriately.What about the fix in b1_5 one month ago? I didn''t see further objections. I suggest that we simply convert "int foo()" to "void * foo()". In this way, we do not need configure check and #ifdefs. Isn''t configure test fragile and #ifdefs ugly? For 2.6.13+, we are correct in both API and ABI. For older kernel, we are still correct in ABI. The only badness is that it brings a compile warning to old kernels. Also we don''t need to worry about removing configure tests and #ifdefs when old kernels are gone. -- Qi Yong System Software Engineer Cluster File Systems, Inc.