Paul,
On 30 May 2008 at 15:47, Paul Hewson wrote:
| Hello,
|
| We have R working with Rmpi/openmpi, but I'm a little worried.
Specifically, (a) the -np flag doesn't seem to override the hostfile (it
works fine with fortran hello world) and (b) I appear to have twice as many
processes running as I think I should.
|
| Rmpi version 0.5.5
| Openmpi version 1.1
That's old. Open MPI 1.2.* fixed and changed a lot of things. I am happy
with
1.2.6, the default on Debian.
| Viglen HPC with (effectively) 9 blades and 8 nodes on each blade.
| myhosts file contains details of the 9 blades, but specifies that there are 4
slots on each blade (to make sure I leave room for other users).
|
| When running mpirun -bynode -np 2 -hostfile myhosts R --slave --vanilla
task_pull.R
|
| 1. I get as many R slaves as there slots defined in my myhosts file (there
are 36 slots defined, and I get 36 slaves, regardless of the setting of -np, the
master goes on the first machine in the myhosts file.
| 2. The .Rout file confirms that I have 1 comm with 1 master and 36 slaves
| 3. When I top each blade it indicates that there are in fact 8 processes
running on each blade and
| 4. When I pstree each blade it indicates that there are two orted processes,
each with 4 subprocesses.
You never showed us task_pull.R ... And as I readily acknowledge that this
can be tricky, why don't you experiment with simple setting?. Consider this
token littler [1] invocation (or use Rscript if you prefer / have only that):
edd at ron:~> r -e'library(Rmpi); cat("Hello rank",
mpi.comm.rank(0), "size", mpi.comm.size(0), "on",
mpi.get.processor.name(), "\n")'
Hello rank 0 size 1 on ron
edd at ron:~>
So without an outer mpirun (or orterun as the Open MPI group now calls it) we
get one instance. Makes sense.
Now with two hosts defined on the fly, and two instances each:
edd at ron:~> orterun -n 4 -H ron,joe r -e'library(Rmpi);
cat("Hello rank", mpi.comm.rank(0), "size",
mpi.comm.size(0), "on", mpi.get.processor.name(), "\n")'
Hello rank 0 size 4 on ron
Hello rank 2 size 4 on ron
Hello rank 3 size 4 on joe
Hello rank 1 size 4 on joe
edd at ron:~>
Adding '-bynode' and using '-np 4' instead of '-n 4'
does not change anything.
| >From the point of view of getting a job done this ***seems*** OK (it's
running very quickly), but it doesn't seem quite right - given I'm
sharing the machine with other users and so on. Is there something I've
missed in the useage of mpirun with R/Rmpi.
I cannot quite determine from what you said here what your objective is.
What exactly are you trying to do that you are not getting done? Using fewer
instances? Maybe that is in fact an Open MPI 1.2.* versus 1.1.* issue.
One thing to note is that if you wrap all this in the excellent snow packache
by Tierney et al, then Open MPI's '-n' can always be one as
determine from
_within_ how many nodes you want:
edd at ron:~> orterun -bynode -np 1 -H ron,joe r -e'library(snow); cl
<- makeCluster(4, "MPI"); res <- clusterCall(cl, function()
Sys.info()["nodename"]); print(do.call(rbind, res))'
Loading required package: utils
Loading required package: Rmpi
4 slaves are spawned successfully. 0 failed.
nodename
[1,] "joe"
[2,] "ron"
[3,] "joe"
[4,] "ron"
edd at ron:~>
Note the outer '-n 1' and the inner makeCluster(4, "MPI") to
give you 4
slaves. If you use a larger '-n $N' you will get $N instances each
starting
as many nodes as makeCluster asks for.
Hope this helps, Dirk
[1] Littler can be had via Debian / Ubuntu or from
http://dirk.eddelbuettel.com/code/littler.html
--
Three out of two people have difficulties with fractions.