Dan
2007-Nov-27 18:00 UTC
[Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons
Hi, I''m running Lustre 1.6.3 on a 64 bit RHEL 4 update 5. I''ve exported the Lustre volume via NFS to 20 32 bit RHEL 3 update 6 clients. I can mount, create files and write to them at first but the NFS daemons all go into an I/O wait state and never return. Since they cannot be killed the only solution is rebooting. Increasing the number of daemons allows it to run longer before all available are hung up. I get no errors to console or logs maybe I''m not looking in the right place? Trying to unmount the NFS export results in infinite wait unless I use lazy umount. This is similar to bug #13030 but that was landed for 1.6.3 which I''m running. How do I make this work? Suggestions? Thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071127/f2b517f5/attachment-0002.html
Oleg Drokin
2007-Nov-27 18:14 UTC
[Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons
Hello! On Nov 27, 2007, at 1:00 PM, Dan wrote:> I''m running Lustre 1.6.3 on a 64 bit RHEL 4 update 5. I''ve exported > the Lustre volume via NFS to 20 32 bit RHEL 3 update 6 clients. I > can mount, create files and write to them at first but the NFS > daemons all go into an I/O wait state and never return. Since they > cannot be killed the only solution is rebooting. Increasing the > number of daemons allows it to run longer before all available are > hung up. I get no errors to console or logs maybe I''m not looking > in the right place? Trying to unmount the NFS export results in > infinite wait unless I use lazy umount. > > This is similar to bug #13030 but that was landed for 1.6.3 which > I''m running. How do I make this work? Suggestions? Thanks,Can you please issue sysrq-t when this happens and provide us with traces of hung nfs daemons? Bye, Oleg
Dan
2007-Nov-30 20:34 UTC
[Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons
Hi Oleg, I''m working on getting the sysrq-t output but since the system is in a secure area it can''t be had on digital media. I can retype the parts you need assuming you can tell me! I guess that could be difficult to specify! Otherwise I can fax the output to you. Suggestions? Thank you! Dan -----Original Message----- From: Oleg Drokin <Oleg.Drokin at Sun.COM> Date: Tuesday, Nov 27, 2007 10:14 am Subject: Re: [Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons To: Dan <dan at nerp.net> CC: Lustre-discuss <lustre-discuss at clusterfs.com> Hello! On Nov 27, 2007, at 1:00 PM, Dan wrote: I''m running Lustre 1.6.3 on a 64 bit RHEL 4 update 5. I''ve exported the Lustre volume via NFS to 20 32 bit RHEL 3 update 6 clients. I can mount, create files and write to them at first but the NFS daemons all go into an I/O wait state and never return. Since they cannot be killed the only solution is rebooting. Increasing the number of daemons allows it to run longer before all available are hung up. I get no errors to console or logs maybe I''m not looking in the right place? Trying to unmount the NFS export results in infinite wait unless I use lazy umount.> This is similar to bug #13030 but that was landed for 1.6.3 whichI''m running. How do I make this work? Suggestions? Thanks, Can you please issue sysrq-t when this happens and provide us with traces of hung nfs daemons? Bye, Oleg
Oleg Drokin
2007-Nov-30 23:26 UTC
[Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons
Hello! On Nov 30, 2007, at 3:34 PM, Dan wrote:> I''m working on getting the sysrq-t output but since the system is in > a secure area it can''t be had on digital media. I can retype the > parts you need assuming you can tell me! I guess that could be > difficult to specify! Otherwise I can fax the output to you.Good start would be to separate out only all nfs daemons in D state and send traces for those. As an less labor-intensive alternative, I guess you can just convert the fax into some sort of image file and send it either to the list or if the image is too big, you can create a bug in our bugzilla and attache the image there. Please CC me on the bug (there is a CC field in bugzilla "new bug" screen) Bye, Oleg
Oleg Drokin
2007-Dec-13 18:49 UTC
[Lustre-discuss] Exporting Lustre volume via NFS hangs NFS daemons
Hello! On Dec 13, 2007, at 1:25 PM, Dan Redig wrote:> I finally got the dump out of the area. It''s in a searchable PDF > found here: http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX I have established > ways of getting information out of the area now, what else do you > want?Ok, it turns out I recently fixed something like this bug. See https://bugzilla.lustre.org/show_bug.cgi?id=14360 for the patch. While you are at it, you might also be interested in fixes from https://bugzilla.lustre.org/show_bug.cgi?id=14379 as well, and also lustre exported with NFS with 1.6.x releases suffers from some severe perfomance degradation at the moment. While there is no existing 1.6 patch yet, related development is being done in bug 13371 which you might want to monitor. Bye, Oleg