Sklyar, Oleg (London)
2009-Jan-29 15:09 UTC
[Rd] after some time R stopped returning from Rmpi calls
Hi, this is not exactly a developer question, but maybe you have noticed similar behaviour before. For quite some time R and Rmpi were working perfectly for me until one day they just stopped doing so without any changes in the configs. R still spawns jobs as requested, and if they are small they run through and return, but as soon as their duration is over 5s or so the spawned processes go to sleep and never return to the head node. Below is the top of one of the slave nodes with the spawned jobs, as you see their status is sleeping. It looks like a communication problem between the master and the slave nodes, but this behaviour *is* user specific: exactly the same script will work for some users and will just lead to hanging for others. Rmpi is installed with a default R CMD INSTALL without additional arguments. LD_LIBRARY_PATH is set and the whole setup *was* working with the same config. Has anybody experienced similar problems with Rmpi and LAM before? Thank you, Oleg RHEL 5 x86_64, 16core Opteron LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University It is quite a dated version of R I running now, but recent Rmpi.> sessionInfo()R version 2.9.0 Under development (unstable) (2008-09-30 r46585) x86_64-unknown-linux-gnu locale: C attached base packages: [1] stats graphics utils datasets grDevices methods base other attached packages: [1] Rmpi_0.5-5 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7699 osklyar 16 0 19128 1448 1000 S 0 0.0 0:00.02 lamd 7807 osklyar 16 0 8652 992 824 S 0 0.0 0:00.01 Rslaves.sh 7808 osklyar 16 0 8656 992 824 S 0 0.0 0:00.01 Rslaves.sh 7809 osklyar 16 0 8652 992 824 S 0 0.0 0:00.00 Rslaves.sh 7810 osklyar 17 0 8656 992 824 S 0 0.0 0:00.01 Rslaves.sh 7811 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh 7812 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh 7813 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh 7814 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh 7815 osklyar 15 0 165m 60m 4568 S 0 0.2 0:03.66 R 7816 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.51 R 7817 osklyar 15 0 161m 56m 4584 S 0 0.2 0:03.82 R 7818 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.31 R 7819 osklyar 16 0 165m 61m 4568 S 0 0.2 0:03.59 R 7820 osklyar 15 0 162m 58m 4568 S 0 0.2 0:03.43 R 7821 osklyar 16 0 162m 58m 4568 S 0 0.2 0:03.26 R 7824 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.49 R 7973 osklyar 15 0 87208 1880 1140 S 0 0.0 0:00.00 sshd 7974 osklyar 15 0 72332 1716 1276 S 0 0.0 0:00.01 bash Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 osklyar at maninvestments.com ********************************************************************** Please consider the environment before printing this email or its attachments. The contents of this email are for the named addressees ...{{dropped:19}}