Huntsinger, Reid
2004-Apr-21 19:25 UTC
[R] RE: [openMosix-general] openMosix and R: File I/O issues?
It's generally said that OM needs lots more swap than plain Linux. There has to be someplace to juggle processes around. We use 4 GB nodes with something like 12 GB swap; that plus round-robin logins to distribute home nodes seemed to solve a lot of our problems. To use oMFS to allow processes to write from the node they've migrated to you also need DFSA enabled. That sounds like a great idea but it seems to cause problems in our setup--that is, with DFSA off we go a lot longer without oMFS misbehavior or other odd phenomena than when DFSA is on. How many processes do you run at a time? Reid Huntsinger -----Original Message----- From: Jim Thomas [mailto:james at staarfunds.com] Sent: Wednesday, April 21, 2004 3:14 PM To: Huntsinger, Reid Cc: r-help at stat.math.ethz.ch; openmosix-general at lists.sourceforge.net Subject: Re: [openMosix-general] openMosix and R: File I/O issues? Memory could be an issue. The three nodes in the cluster we are using only have 512 MB. The machines are otherwise identical in terms of hardware. They are dual PIIIs (purchased from Penguin Computing in 2001). (If you need specs on motherboard etc, we can get that info together.) The installed operating system is RHEL-3, standard workstation with some additional libraries (e.g., all of the development ones) and packages (e.g., tarball-based installs of R & Octave). Swap was set to twice the RAM. Not sure about how big the R processes are. Under Matlab and using a network version of LVQ, the processes were under 256 MB. It could be a lot larger in R, given that KNN is explicitly used in certain stages of LVQ and that should consume a lot of memory. oMFS is not currently being used, nor any other network filesystem... do you think this would be helpful by allowing the large processes to remain on the nodes to which they migrated when writing files. The actual file I/O is very small.... only approximately 5 - 50 results are being written into a file that remains under 2k in size. However, I believe that these results are not derived until LVQ finishes so there's no way to write a few resluts periodically. Thanks for your help! Jim Huntsinger, Reid wrote:>Can you have the R processes open the file, write, and close periodically? >Just to see where they die? Is it that the file never gets flushed to disk >before the process dies for some reason? Maybe processes migrate back for >file i/o and the total required memory is just too large? R processes can >easily have large memory requirements. > >More details on your setup would probably help: are you using omfs? other >network file systems? How is the cluster set up? How much RAM and swap? How >big are the R processes? > >Reid Huntsinger > >-----Original Message----- >From: openmosix-general-admin at lists.sourceforge.net >[mailto:openmosix-general-admin at lists.sourceforge.net] On Behalf Of Jim >Thomas >Sent: Tuesday, April 20, 2004 11:16 AM >To: r-help at stat.math.ethz.ch; openmosix-general at lists.sourceforge.net >Subject: [openMosix-general] openMosix and R: File I/O issues? > > >Hi there, > >We're attempting to run an LVQ analysis over a cluster of machines via R >and openMosix. R spawns several child processes simply by writing >commands to several files and using system() to start a slave process. > The processes migrate perfectly, and often finish with no reported >errors, writing their results into respective files for the parent >process to piece together. However, occasionally we have had the >problem that the results from a child process never make it into a file. > The process finishes, and exits, with no errors - but the file never >turns up. Repeated tests with the same data have shown that the >specific process that dies is random, and stress tests of R I/O have >shown that there are no issues there. Does anyone know of I/O issues >with openMosix, either specifically related to R or not? > >using: >openMosix kernel 2.4.21 >R 1.8.1 >RedHat Enterprise Edition > >Thanks, >Jim > > > >------------------------------------------------------- >This SF.Net email is sponsored by: IBM Linux Tutorials >Free Linux tutorial presented by Daniel Robbins, President and CEO of >GenToo technologies. Learn everything from fundamentals to system >administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click >_______________________________________________ >openMosix-general mailing list >openMosix-general at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/openmosix-general > > > >------------------------------------------------------------------------------>Notice: This e-mail message, together with any attachments, containsinformation of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.>------------------------------------------------------------------------------> > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}