thr3ads.net - R help - [R] many datasets run with one R script in a computer cluster [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Martin Hughes

2010-Oct-08 16:33 UTC

[R] many datasets run with one R script in a computer cluster

Hello Everyone

I have an R script (and a source file which I keep my functions) that  
I need to run on 70 data sets (each consisting of a pair of files).

I wish to run these data sets in a computer cluster that is run by my  
uni (HOWEVER they cannot help me with this problem but say it is  
do-able)

the cluster is clever enough that if i set my data up as follows:  
within one folder called 'work' there is 70 subfolders each of which  
contain a pair of files, each pair of files having a unique first part  
eg CottonEA05 as in the example text below)

then if I have one R script to run the analysis within the main  
folder, it will open each subfolder, run the R script and output the  
results into that subfolder.

The problem is that this script for R needs to have some kind of wild  
card element so for example in the script below, R will replace  
CottonEA05 with the whatever the unique identifier is for the  
particular subfolder its looking through eg change it to   
Martin_M_STAGE.txt or bananas_M_STAGE.txt etc

Can R do this? ie can it look a file title, and change the file name  
within the script to be the same as that file title, and then run the  
analysis

OR do I have to use another programme that does that?

###
m<-read.table("CottonEA05_M_STAGE.txt")
#"CottonEA05" what is different for each dataset


M<-as.matrix(m[,-c(1)])
rownames(M)<-(m[,1])
pa<-read.table("CottonEA05_D_STAGE.txt",header=T)
timetable<-read.table("TimeBinLookup.txt",header=T,sep="\t")
PA<-as.matrix(pa[,-c(1)])
rownames(PA)<-(pa[,1])
OCHAR<-c()

source("DISPARITY.R")
library(calibrate)
###


Thanks
Martin

-- 
Martin Hughes
MPhil/PhD Research in Biology
Rm 1.07,  4south
University of Bath
Department of Biology and Biochemistry
Claverton
Bath    BA2 7AY
Tel: 01225 385 437
M.Hughes at bath.ac.uk
http://www.bath.ac.uk/bio-sci/biodiversity-lab/hughes.html

David Winsemius

2010-Oct-08 18:30 UTC

head link

[R] many datasets run with one R script in a computer cluster

On Oct 8, 2010, at 12:33 PM, Martin Hughes wrote:
>
> Hello Everyone
>
> I have an R script (and a source file which I keep my functions)  
> that I need to run on 70 data sets (each consisting of a pair of  
> files).
>
> I wish to run these data sets in a computer cluster that is run by  
> my uni (HOWEVER they cannot help me with this problem but say it is  
> do-able)
>
> the cluster is clever enough that if i set my data up as follows:  
> within one folder called 'work' there is 70 subfolders each of
which
> contain a pair of files, each pair of files having a unique first  
> part eg CottonEA05 as in the example text below)
>
> then if I have one R script to run the analysis within the main  
> folder, it will open each subfolder, run the R script and output the  
> results into that subfolder.
>
> The problem is that this script for R needs to have some kind of  
> wild card element so for example in the script below, R will replace  
> CottonEA05 with the whatever the unique identifier is for the  
> particular subfolder its looking through eg change it to   
> Martin_M_STAGE.txt or bananas_M_STAGE.txt etc
>
> Can R do this? ie can it look a file title, and change the file name  
> within the script to be the same as that file title, and then run  
> the analysis
It can certain read a directory and return the file names into a  
vector. And you can certainly do sub() on that vector to strip out the  
leading characters before the first occurrence of a character.

?list.files

(Which also has pattern matching facilities through its second  
argument.)

This reads the files in my working directory and then returns only the  
characters before the first period:

 > filist <- list.files()
 > str(filist)
  chr [1:295] "_train_1.dat" "~Show.Dot.Files.txt" ...
 > first <- sub("\\..+$","", filist)
 > str(first)
  chr [1:295] "_train_1" "~Show" "~UCONN"
"2001VBTANB" ...

Was that what you were asking?

-- 
David.
>
> OR do I have to use another programme that does that?
>
> ###
> m<-read.table("CottonEA05_M_STAGE.txt")
> #"CottonEA05" what is different for each dataset
>
>
> M<-as.matrix(m[,-c(1)])
> rownames(M)<-(m[,1])
> pa<-read.table("CottonEA05_D_STAGE.txt",header=T)
>
timetable<-read.table("TimeBinLookup.txt",header=T,sep="\t")
> PA<-as.matrix(pa[,-c(1)])
> rownames(PA)<-(pa[,1])
> OCHAR<-c()
>
> source("DISPARITY.R")
> library(calibrate)
> ###
>
>
> Thanks
> Martin
>
> -- 
> Martin Hughes
> MPhil/PhD Research in Biology
> Rm 1.07,  4south
> University of Bath
> Department of Biology and Biochemistry
> Claverton
> Bath    BA2 7AY
> Tel: 01225 385 437
> M.Hughes at bath.ac.uk
> http://www.bath.ac.uk/bio-sci/biodiversity-lab/hughes.html
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Oct 2010 - many datasets run with one R script in a computer cluster

[R] many datasets run with one R script in a computer cluster

[R] many datasets run with one R script in a computer cluster

Apparently Analagous Threads