Debs Majumdar
2011-Oct-21 21:32 UTC
[R] Reading in and modifying multiple datasets in a loop
Hi, ? I have been given a set of around 300 files where there are 5 files corresponding to each chunk. E.g. Chunk 1 for chr1 contains these 5 files: ? ? ??? chr1.one.phased.impute2.chunk1 ??????? chr1.one.phased.impute2.chunk1_info ??????? chr1.one.phased.impute2.chunk1_info_by_sample ??????? chr1.one.phased.impute2.chunk1_summary ??????? chr1.one.phased.impute2.chunk1_warnings For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks. I am using the DatABEL package to? convert them databel format using the following command: impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)? which uses two files per chunk. Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format. Thanks, ?-Debs
On 21.10.2011 23:32, Debs Majumdar wrote:> > > Hi, > > I have been given a set of around 300 files where there are 5 files corresponding to each chunk. > > E.g. Chunk 1 for chr1 contains these 5 files: > > chr1.one.phased.impute2.chunk1 > chr1.one.phased.impute2.chunk1_info > chr1.one.phased.impute2.chunk1_info_by_sample > chr1.one.phased.impute2.chunk1_summary > chr1.one.phased.impute2.chunk1_warnings > > For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks. > > I am using the DatABEL package to convert them databel format using the following command: > > > impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE) > > which uses two files per chunk. > > > Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format.Yes, probably (all untested): owd <- setwd(pth) fls <- list.files(pattern="^chr") ufls <- unique(sapply(strsplit(fls, "_"), "[", 1)) for(i in ufls){ of <- strsplit(i, "\\.")[[1]] of <- paste(of[1], tail(of, 1), sep=".") impute2databel(genofile = i, samplefile = paste(i, "info", sep="_"), outfile = of, makeprob=TRUE, old=FALSE) } setwd(owd) Uwe Ligges> > Thanks, > > -Debs > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Debs Majumdar
2011-Oct-24 21:10 UTC
[R] Reading in and modifying multiple datasets in a loop
Thanks Uwe. This works perfectly. ####### owd <- setwd(pth) fls <- list.files(pattern="^chr") ufls <- unique(sapply(strsplit(fls, "_"), "[", 1)) for(i in ufls){ ? ? ?of <- strsplit(i, "\\.")[[1]] ? ? ?of <- paste(of[1], tail(of, 1), sep=".") ? ? ?impute2databel(genofile = i, ? ? ? ? ? ? ? ? ? ? samplefile = paste(i, "info", sep="_"), ? ? ? ? ? ? ? ? ? ? outfile = of, ? ? ? ? ? ? ? ? ? ? makeprob=TRUE, old=FALSE) } setwd(owd) #### I have a question regarding how strsplit works. When my files are the following: ??????? chr1.one.phased.impute2.chunk1 ??????? chr1.one.phased.impute2.chunk1_info ??????? chr1.one.phased.impute2.chunk1_info_by_sample ??????? chr1.one.phased.impute2.chunk1_summary ??????? chr1.one.phased.impute2.chunk1_warnings ufls <- unique(sapply(strsplit(fls, "_"), "[", 1)) This works like a charm. I have another dataset where the files are ? ? ? ? study1_chr1.one.phased.impute2.chunk1 ??????? study1_chr1.one.phased.impute2.chunk1_info ??????? study1_chr1.one.phased.impute2.chunk1_info_by_sample ??????? study1_chr1.one.phased.impute2.chunk1_summary ??????? study1_chr1.one.phased.impute2.chunk1_warnings ... and so on. and I wanted to run the same loop but I was unable to change strsplit so that it will work when the files are names ads above: I tried ufls <- unique(sapply(strsplit(fls, "_"), "[", 2)) but this knocks off "study1" (modified code below).? What modification do I need to make to make this run: #### fls <- list.files(pattern="study1_chr") ufls <- unique(sapply(strsplit(fls, "_"), "[", 2)) library(GenABEL) for(i in ufls){ ???? of <- strsplit(i, "\\.")[[1]] ???? of <- paste(of[1], tail(of, 1), sep=".") ???? impute2databel(genofile = i, ??????????????????? samplefile = paste(i, "info", sep="_"), ??????????????????? outfile = of, ??????????????????? makeprob=TRUE, old=FALSE) } ##### Thanks, ?Debs ----- Original Message ----- From: Debs Majumdar <debs_stata at yahoo.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, October 21, 2011 2:32 PM Subject: Reading in and modifying multiple datasets in a loop Hi, ? I have been given a set of around 300 files where there are 5 files corresponding to each chunk. E.g. Chunk 1 for chr1 contains these 5 files: ? ? ??? chr1.one.phased.impute2.chunk1 ??????? chr1.one.phased.impute2.chunk1_info ??????? chr1.one.phased.impute2.chunk1_info_by_sample ??????? chr1.one.phased.impute2.chunk1_summary ??????? chr1.one.phased.impute2.chunk1_warnings For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks. I am using the DatABEL package to? convert them databel format using the following command: impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)? which uses two files per chunk. Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format. Thanks, ?-Debs
On 24.10.2011 23:10, Debs Majumdar wrote:> Thanks Uwe. This works perfectly. > > ####### > > > owd<- setwd(pth) > fls<- list.files(pattern="^chr") > ufls<- unique(sapply(strsplit(fls, "_"), "[", 1)) > for(i in ufls){ > of<- strsplit(i, "\\.")[[1]] > of<- paste(of[1], tail(of, 1), sep=".") > impute2databel(genofile = i, > samplefile = paste(i, "info", sep="_"), > outfile = of, > makeprob=TRUE, old=FALSE) > } > setwd(owd) > > #### > > > I have a question regarding how strsplit works. > > When my files are the following: > > chr1.one.phased.impute2.chunk1 > chr1.one.phased.impute2.chunk1_info > chr1.one.phased.impute2.chunk1_info_by_sample > chr1.one.phased.impute2.chunk1_summary > chr1.one.phased.impute2.chunk1_warnings > ufls<- unique(sapply(strsplit(fls, "_"), "[", 1)) > > This works like a charm. > > I have another dataset where the files are > > > study1_chr1.one.phased.impute2.chunk1 > study1_chr1.one.phased.impute2.chunk1_info > study1_chr1.one.phased.impute2.chunk1_info_by_sample > study1_chr1.one.phased.impute2.chunk1_summary > study1_chr1.one.phased.impute2.chunk1_warnings > > ... and so on. > > and I wanted to run the same loop but I was unable to change strsplit so that it will work when the files are names ads above: > > I tried > > ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))unique(gsub("(_.*)_.*", "\\1", x)) Should do if there is a first underscore. Uwe Ligges> but this knocks off "study1" (modified code below). What modification do I need to make to make this run: > > #### > > fls<- list.files(pattern="study1_chr") > ufls<- unique(sapply(strsplit(fls, "_"), "[", 2)) > > library(GenABEL) > > for(i in ufls){ > of<- strsplit(i, "\\.")[[1]] > of<- paste(of[1], tail(of, 1), sep=".") > impute2databel(genofile = i, > samplefile = paste(i, "info", sep="_"), > outfile = of, > makeprob=TRUE, old=FALSE) > > } > > ##### > > Thanks, > > Debs > > > ----- Original Message ----- > From: Debs Majumdar<debs_stata at yahoo.com> > To: "r-help at r-project.org"<r-help at r-project.org> > Cc: > Sent: Friday, October 21, 2011 2:32 PM > Subject: Reading in and modifying multiple datasets in a loop > > > > Hi, > > I have been given a set of around 300 files where there are 5 files corresponding to each chunk. > > E.g. Chunk 1 for chr1 contains these 5 files: > > chr1.one.phased.impute2.chunk1 > chr1.one.phased.impute2.chunk1_info > chr1.one.phased.impute2.chunk1_info_by_sample > chr1.one.phased.impute2.chunk1_summary > chr1.one.phased.impute2.chunk1_warnings > > For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23 chunks. > > I am using the DatABEL package to convert them databel format using the following command: > > > impute2databel(genofile="chr1.one.phased.impute2.chunk1", samplefile="chr1.one.phased.impute2.chunk1_info", outfile="chr1.chunk1", makeprob=TRUE, old=FALSE) > > which uses two files per chunk. > > > Is there a way I can automate this so that the code goes through each chunk of each chromosome and does the conversion to databel format. > > > Thanks, > > -Debs > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.