<Derek.Whayman@barclayscapital.com>
2008-Jan-29 09:19 UTC
Deadly interaction between Autofs/NFS and Ruby
I managed to track to a vicious bug where Ruby gets itself in a tight
loop and eats all your memory, reducing your server to a gibbering wreck
(RHEL5, Ruby 1.8.5).
I originally caused it by applying a Puppet manifest (puppet standalone)
out of my autofs-mounted home directory. In the course of its actions,
it restarted autofs. I then tried to reapply and the machine ground to
a halt.
The problem actually occurs in facter.rb, where it tries to shell out to
run %x{"uname -s"} in Resolution.exec. Ruby never reaches the shell
or
uname - it gets into a tight loop, strace showing a lot of mmap and
getcwd syscalls. Then your memory goes away.
This can be reproduced without autofs by
mount server:/export/myhomedir /mnt/myhomedir
cd /mnt/myhomedir/subdir
umount -l server:/export/myhomedir /mnt/myhomedir
mount server:/export/myhomedir /mnt/myhomedir
umount -f doesn''t work. I''m guessing autofs uses -l (lazy
unmount).
Then any of the following produce, ahem, interesting results
<0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% which blah
Can''t get current working directory
<255> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% /bin/pwd
/bin/pwd: couldn''t find directory entry in `../..'' with
matching i-node
<1> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% facter
(dead - time to kill -9)
<1> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% ls -id .
282523 .
<0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% ls -id ..
174555 ..
<0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% ls -id ../..
174555 ../..
<0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
% ls -id ../../..
174555 ../../..
<0> sa_dewha@engpsr0155.intranetdev.barcapdev.com (0 jobs)
/mnt/sa_dewha/munt
The mount point is "broken". This inability to go "up" past
the mount
point appears to be what makes /bin/pwd upset (and it''s the same issue
whether you do this with an autofs restart or an umount -l).
Incidentally a simple call to getcwd(3) works fine (proven with a 5 line
C program). I suspect Ruby is doing something similar but behaving v
badly on when a call that should be a foregone conclusion behaves badly.
It''s hard to apportion where the fix should go, but I believe Ruby
shoudn''t bomb so badly even when presented with a broken file system
mount.
Does anyone out there have any deeper knowledge of these things before I
try my luck submitting a case to Red Hat (there are some Ruby developers
on the payroll there)...
Cheers,
Derek
------------------------------------------------------------------------
For important statutory and regulatory disclosures and more information about
Barclays Capital, please visit our web site at http://www.barcap.com.
Internet communications are not secure and therefore the Barclays Group does not
accept legal responsibility for the contents of this message. Although the
Barclays Group operates anti-virus programmes, it does not accept responsibility
for any damage whatsoever that is caused by viruses being passed. Any views or
opinions presented are solely those of the author and do not necessarily
represent those of the Barclays Group. Replies to this email may be monitored
by the Barclays Group for operational or business reasons.
Barclays Capital is the investment banking division of Barclays Bank PLC, a
company registered in England (number 1026167) with its registered office at 1
Churchill Place, London, E14 5HP. This email may relate to or be sent from other
members of the Barclays Group.
------------------------------------------------------------------------
On Jan 29, 2008, at 8:19 PM, <Derek.Whayman@barclayscapital.com> wrote:> It''s hard to apportion where the fix should go, but I believe Ruby > shoudn''t bomb so badly even when presented with a broken file system > mount.Yeah, this is a painfully common scenario -- PWD going away in some form. Ruby should certainly be able to handle it.> > Does anyone out there have any deeper knowledge of these things > before I > try my luck submitting a case to Red Hat (there are some Ruby > developers > on the payroll there)...Nope, I think you''ve pegged it. -- The time to repair the roof is when the sun is shining. -- John F. Kennedy --------------------------------------------------------------------- Luke Kanies | http://reductivelabs.com | http://madstop.com