Samu Mäntyniemi
2007-Dec-20 13:54 UTC
[R] Multicore computation in Windows network: How to set up Rmpi
R-users, My question is related to earlier posts about benefits of quadcore over dualcore computers; I am trying to setup a cluster of windows xp computers so that eventually I could make use of 10-20 cpu:s, but for learning how to do this, I am playing around with two laptops. I thought that the package snow would come handy in this situation, but to use snow, I would probably need to install package Rmpi first. I might also like to use Rmpi directly. In order to use Rmpi in windows, I need to install MPI middle-ware, for which I have found two options: MPICH2: http://www.mcs.anl.gov/research/projects/mpich2/ DeinoMPI: http://mpi.deino.net/ First I tried MPICH2 1.06 + R-2.6.0 + Rmpi 0.5-5. (I downloaded windows binaries of Rmpi from Rmpi website :http://www.stats.uwo.ca/faculty/yu/Rmpi/) With MPICH2 I managed to connect my computers so that I was able to remotely launch Rgui on both machines but R hanged when calling "library(Rmpi)". If only one Rgui was launched on the localhost, "library(Rmpi)" worked without errors, but trying to use "mpi.spawn.Rslaves()" resulted in an error message, and so did "mpi.universe.size()". (In my current setup I can not reproduce this error message, but I can go back to this setup if this seems to be an important piece of information) After that I removed MPICH2 from the system and installed DeinoMPI instead, thus my setup was DeinoMPI 1.1.0 + R-2.6.0 + Rmpi 0.5-6. Using this setup on a single machine seems to work, "mpi.universe.size()" returns the correct number of cpu cores and "mpi.spawn.Rslaves()" creates one master and one slave on the local unicore computer. However, trying to use two computers results in similar behavior as with MPICH2: Rgui gets started on both, but R hangs when trying "library(Rmpi)". Following the advice given at the Rmpi website, this behavior could be due to firewall settings. However, the result is the same if I take down all my firewalls. Trying to debug this, I tried to run the example MPI programs provided with the DeinoMPI installation. The example programs work as expected: both machines participate to parallel computation and the result is shown on the master node. This makes me believe that the problem is likely related to R and Rmpi configuration, or the settings I used when launching Rgui using mpiexec: Settings on Mpiexec- tab: Application: "C:\Program Files\R\R-2.6.0\bin\Rgui.exe" Number of processes: 2 hosts: "akva26 samu" These are the computers with DeinoMPI installed. DeinoMPI cluster-tab shows that they are ready to accept MPI jobs.) localroot: checked Other options were just empty. I am sure that someone has tried these before, and I was hoping to find such users from this mailing list. Could you kindly share your experiences about this issue? For example, does anyone have a working setup with DeinoMPI? According to Rmpi website DeinoMPI is the easiest way to set up MPI for a single windows machine, but I am not sure whether this is also intended to mean that one can not expect it to work with multiple computers. Regards, Samu M?ntyniemi ------------------------------------------ Samu M?ntyniemi Researcher Fisheries and Environmental Management Group (FEM) Department of Biological and Environmental Sciences Biocenter 3, room 4414 Viikinkaari 1 P.O. Box 65 FIN-00014 University of Helsinki Phone: +358 9 191 58710 Fax: +358 9 191 58257 email: samu.mantyniemi helsinki.fi personal webpage: http://www.helsinki.fi/people/samu.mantyniemi/ FEM webpage: http://www.helsinki.fi/science/fem/
Samu Mäntyniemi
2007-Dec-21 08:07 UTC
[R] Multicore computation in Windows network: How to set up Rmpi
Some progress in my problem: Samu M?ntyniemi kirjoitti:> With MPICH2 I managed to connect my computers so that I was able to > remotely launch Rgui on both machines but R hanged when calling > "library(Rmpi)". If only one Rgui was launched on the localhost, > "library(Rmpi)" worked without errors, but trying to use > "mpi.spawn.Rslaves()" resulted in an error message, and so did > "mpi.universe.size()". (In my current setup I can not reproduce this > error message, but I can go back to this setup if this seems to be an > important piece of information)I vent back to MPICH2 installation to see what the error was: "ERROR in names(HOSTNAMES)<-base: attempt to set an attribute on NULL" Trying to rethink what the problem was I realized that unlike in DeinoMPI, I need to write the host names manually on the "configurable settings" -window, and in order to have one cpu available on the local machine, I need to write "myhostname:2". After these changes MPICH2 1.06 +R-2.6.0+Rmpi 0.5-5 work on the single machine in the same way as my DeinoMPI installation: Correct number of cpu:s is detected and I can "mpi.spawn.Rslaves()" I will try to do this with two hosts next and see if there is more luck with MPICH2 than DeinoMPI. Samu ------------------------------------------ Samu M?ntyniemi Researcher Fisheries and Environmental Management Group (FEM) Department of Biological and Environmental Sciences Biocenter 3, room 4414 Viikinkaari 1 P.O. Box 65 FIN-00014 University of Helsinki Phone: +358 9 191 58710 Fax: +358 9 191 58257 email: samu.mantyniemi at helsinki.fi personal webpage: http://www.helsinki.fi/people/samu.mantyniemi/ FEM webpage: http://www.helsinki.fi/science/fem/
Samu Mäntyniemi
2007-Dec-29 11:03 UTC
[R] Multicore computation in Windows network: How to set up Rmpi
Hello!
I finally got MPICH 1.06 + R 2.6.1 + Rmpi 0.5-5 working with multiple
computers. The key was to realize that the number of processes should be
one when launching Rgui using mpiexec and not the number of
master+slaves, as I had first wrongly understood.
However, I seem to have a new problem which I have not been able to
figure out:
After loading Rmpi, the first attempt to mpi.spawn.Rslaves() always
spawns the slaves on the local machine instead of on both machines. If I
close the slaves and spawn again, then one slave gets spawned on remote
machine. Each time I close and then spawn againg, the order of machines
is different, and eventually I get back to the situation where all
slaves are on the local machine. Continuing to do spawning and closing
seems to reveal a pattern. I can see similar behavior if I have more
than two machines, and it takes more spawn-close cycles to get all my
slave machines spawned on.
Below is an example session with two machines. This pattern shows
everytime I start R and run this script. How to control the spawning so
that I get everything right at the first call of mpi.spawn.Rslaves()?
Regards,
Samu
<R>
>
> library(Rmpi)
> sessionInfo()
R version 2.6.1 (2007-11-26)
i386-pc-mingw32
locale:
LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rmpi_0.5-5
> mpi.universe.size()
[1] 2
> mpichhosts()
master slave1 slave2
"clustermaster" "clustermaster" "clusterslave1"
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterMaster
slave2 (rank 2, comm 1) of size 3 is running on: ClusterMaster
> mpi.close.Rslaves()
[1] 1
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterSlave1
slave2 (rank 2, comm 1) of size 3 is running on: ClusterMaster
> mpi.close.Rslaves()
[1] 1
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterMaster
slave2 (rank 2, comm 1) of size 3 is running on: ClusterSlave1
> mpi.close.Rslaves()
[1] 1
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterMaster
slave2 (rank 2, comm 1) of size 3 is running on: ClusterMaster
> mpi.close.Rslaves()
[1] 1
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterSlave1
slave2 (rank 2, comm 1) of size 3 is running on: ClusterMaster
> mpi.close.Rslaves()
[1] 1
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterMaster
slave2 (rank 2, comm 1) of size 3 is running on: ClusterSlave1
> mpi.close.Rslaves()
[1] 1
>
>
> mpi.spawn.Rslaves()
2 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 3 is running on: ClusterMaster
slave1 (rank 1, comm 1) of size 3 is running on: ClusterMaster
slave2 (rank 2, comm 1) of size 3 is running on: ClusterMaster
> mpi.close.Rslaves()
[1] 1
>
</R>
Samu M?ntyniemi kirjoitti:> Some progress in my problem:
>
> Samu M?ntyniemi kirjoitti:
>
>> With MPICH2 I managed to connect my computers so that I was able to
>> remotely launch Rgui on both machines but R hanged when calling
>> "library(Rmpi)". If only one Rgui was launched on the
localhost,
>> "library(Rmpi)" worked without errors, but trying to use
>> "mpi.spawn.Rslaves()" resulted in an error message, and so
did
>> "mpi.universe.size()". (In my current setup I can not
reproduce this
>> error message, but I can go back to this setup if this seems to be an
>> important piece of information)
>
> I vent back to MPICH2 installation to see what the error was:
> "ERROR in names(HOSTNAMES)<-base: attempt to set an attribute on
NULL"
>
> Trying to rethink what the problem was I realized that unlike in
> DeinoMPI, I need to write the host names manually on the "configurable
> settings" -window, and in order to have one cpu available on the local
> machine, I need to write "myhostname:2".
>
> After these changes MPICH2 1.06 +R-2.6.0+Rmpi 0.5-5 work on the single
> machine in the same way as my DeinoMPI installation: Correct number of
> cpu:s is detected and I can "mpi.spawn.Rslaves()"
>
> I will try to do this with two hosts next and see if there is more luck
> with MPICH2 than DeinoMPI.
>
> Samu
>
>
>
>
> ------------------------------------------
> Samu M?ntyniemi
> Researcher
> Fisheries and Environmental Management Group (FEM)
> Department of Biological and Environmental Sciences
> Biocenter 3, room 4414
> Viikinkaari 1
> P.O. Box 65
> FIN-00014 University of Helsinki
>
> Phone: +358 9 191 58710
> Fax: +358 9 191 58257
>
> email: samu.mantyniemi at helsinki.fi
> personal webpage: http://www.helsinki.fi/people/samu.mantyniemi/
> FEM webpage: http://www.helsinki.fi/science/fem/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
------------------------------------------
Samu M?ntyniemi
Researcher
Fisheries and Environmental Management Group (FEM)
Department of Biological and Environmental Sciences
Biocenter 3, room 4414
Viikinkaari 1
P.O. Box 65
FIN-00014 University of Helsinki
Phone: +358 9 191 58710
Fax: +358 9 191 58257
email: samu.mantyniemi at helsinki.fi
personal webpage: http://www.helsinki.fi/people/samu.mantyniemi/
FEM webpage: http://www.helsinki.fi/science/fem/