Martin Hughes
2010-Oct-08 16:33 UTC
[R] many datasets run with one R script in a computer cluster
Hello Everyone I have an R script (and a source file which I keep my functions) that I need to run on 70 data sets (each consisting of a pair of files). I wish to run these data sets in a computer cluster that is run by my uni (HOWEVER they cannot help me with this problem but say it is do-able) the cluster is clever enough that if i set my data up as follows: within one folder called 'work' there is 70 subfolders each of which contain a pair of files, each pair of files having a unique first part eg CottonEA05 as in the example text below) then if I have one R script to run the analysis within the main folder, it will open each subfolder, run the R script and output the results into that subfolder. The problem is that this script for R needs to have some kind of wild card element so for example in the script below, R will replace CottonEA05 with the whatever the unique identifier is for the particular subfolder its looking through eg change it to Martin_M_STAGE.txt or bananas_M_STAGE.txt etc Can R do this? ie can it look a file title, and change the file name within the script to be the same as that file title, and then run the analysis OR do I have to use another programme that does that? ### m<-read.table("CottonEA05_M_STAGE.txt") #"CottonEA05" what is different for each dataset M<-as.matrix(m[,-c(1)]) rownames(M)<-(m[,1]) pa<-read.table("CottonEA05_D_STAGE.txt",header=T) timetable<-read.table("TimeBinLookup.txt",header=T,sep="\t") PA<-as.matrix(pa[,-c(1)]) rownames(PA)<-(pa[,1]) OCHAR<-c() source("DISPARITY.R") library(calibrate) ### Thanks Martin -- Martin Hughes MPhil/PhD Research in Biology Rm 1.07, 4south University of Bath Department of Biology and Biochemistry Claverton Bath BA2 7AY Tel: 01225 385 437 M.Hughes at bath.ac.uk http://www.bath.ac.uk/bio-sci/biodiversity-lab/hughes.html
David Winsemius
2010-Oct-08 18:30 UTC
[R] many datasets run with one R script in a computer cluster
On Oct 8, 2010, at 12:33 PM, Martin Hughes wrote:> > Hello Everyone > > I have an R script (and a source file which I keep my functions) > that I need to run on 70 data sets (each consisting of a pair of > files). > > I wish to run these data sets in a computer cluster that is run by > my uni (HOWEVER they cannot help me with this problem but say it is > do-able) > > the cluster is clever enough that if i set my data up as follows: > within one folder called 'work' there is 70 subfolders each of which > contain a pair of files, each pair of files having a unique first > part eg CottonEA05 as in the example text below) > > then if I have one R script to run the analysis within the main > folder, it will open each subfolder, run the R script and output the > results into that subfolder. > > The problem is that this script for R needs to have some kind of > wild card element so for example in the script below, R will replace > CottonEA05 with the whatever the unique identifier is for the > particular subfolder its looking through eg change it to > Martin_M_STAGE.txt or bananas_M_STAGE.txt etc > > Can R do this? ie can it look a file title, and change the file name > within the script to be the same as that file title, and then run > the analysisIt can certain read a directory and return the file names into a vector. And you can certainly do sub() on that vector to strip out the leading characters before the first occurrence of a character. ?list.files (Which also has pattern matching facilities through its second argument.) This reads the files in my working directory and then returns only the characters before the first period: > filist <- list.files() > str(filist) chr [1:295] "_train_1.dat" "~Show.Dot.Files.txt" ... > first <- sub("\\..+$","", filist) > str(first) chr [1:295] "_train_1" "~Show" "~UCONN" "2001VBTANB" ... Was that what you were asking? -- David.> > OR do I have to use another programme that does that? > > ### > m<-read.table("CottonEA05_M_STAGE.txt") > #"CottonEA05" what is different for each dataset > > > M<-as.matrix(m[,-c(1)]) > rownames(M)<-(m[,1]) > pa<-read.table("CottonEA05_D_STAGE.txt",header=T) > timetable<-read.table("TimeBinLookup.txt",header=T,sep="\t") > PA<-as.matrix(pa[,-c(1)]) > rownames(PA)<-(pa[,1]) > OCHAR<-c() > > source("DISPARITY.R") > library(calibrate) > ### > > > Thanks > Martin > > -- > Martin Hughes > MPhil/PhD Research in Biology > Rm 1.07, 4south > University of Bath > Department of Biology and Biochemistry > Claverton > Bath BA2 7AY > Tel: 01225 385 437 > M.Hughes at bath.ac.uk > http://www.bath.ac.uk/bio-sci/biodiversity-lab/hughes.html > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT