<Derek.Whayman@barclayscapital.com>
2008-Jan-29 09:19 UTC
Deadly interaction between Autofs/NFS and Ruby
I managed to track to a vicious bug where Ruby gets itself in a tight loop and eats all your memory, reducing your server to a gibbering wreck (RHEL5, Ruby 1.8.5). I originally caused it by applying a Puppet manifest (puppet standalone) out of my autofs-mounted home directory. In the course of its actions, it restarted autofs. I then tried to reapply and the machine ground to a halt. The problem actually occurs in facter.rb, where it tries to shell out to run %x{"uname -s"} in Resolution.exec. Ruby never reaches the shell or uname - it gets into a tight loop, strace showing a lot of mmap and getcwd syscalls. Then your memory goes away. This can be reproduced without autofs by mount server:/export/myhomedir /mnt/myhomedir cd /mnt/myhomedir/subdir umount -l server:/export/myhomedir /mnt/myhomedir mount server:/export/myhomedir /mnt/myhomedir umount -f doesn''t work. I''m guessing autofs uses -l (lazy unmount). Then any of the following produce, ahem, interesting results <0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % which blah Can''t get current working directory <255> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % /bin/pwd /bin/pwd: couldn''t find directory entry in `../..'' with matching i-node <1> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % facter (dead - time to kill -9) <1> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % ls -id . 282523 . <0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % ls -id .. 174555 .. <0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % ls -id ../.. 174555 ../.. <0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt % ls -id ../../.. 174555 ../../.. <0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs) /mnt/sa_dewha/munt The mount point is "broken". This inability to go "up" past the mount point appears to be what makes /bin/pwd upset (and it''s the same issue whether you do this with an autofs restart or an umount -l). Incidentally a simple call to getcwd(3) works fine (proven with a 5 line C program). I suspect Ruby is doing something similar but behaving v badly on when a call that should be a foregone conclusion behaves badly. It''s hard to apportion where the fix should go, but I believe Ruby shoudn''t bomb so badly even when presented with a broken file system mount. Does anyone out there have any deeper knowledge of these things before I try my luck submitting a case to Red Hat (there are some Ruby developers on the payroll there)... Cheers, Derek ------------------------------------------------------------------------ For important statutory and regulatory disclosures and more information about Barclays Capital, please visit our web site at http://www.barcap.com. Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ------------------------------------------------------------------------
On Jan 29, 2008, at 8:19 PM, <Derek.Whayman@barclayscapital.com> wrote:> It''s hard to apportion where the fix should go, but I believe Ruby > shoudn''t bomb so badly even when presented with a broken file system > mount.Yeah, this is a painfully common scenario -- PWD going away in some form. Ruby should certainly be able to handle it.> > Does anyone out there have any deeper knowledge of these things > before I > try my luck submitting a case to Red Hat (there are some Ruby > developers > on the payroll there)...Nope, I think you''ve pegged it. -- The time to repair the roof is when the sun is shining. -- John F. Kennedy --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com