thr3ads.net - R help - [R] Reading in and modifying multiple datasets in a loop [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Debs Majumdar

2011-Oct-21 21:32 UTC

[R] Reading in and modifying multiple datasets in a loop

Hi,

? I have been given a set of around 300 files where there are 5 files
corresponding to each chunk.

E.g. Chunk 1 for chr1 contains these 5 files:

? ? ??? chr1.one.phased.impute2.chunk1
??????? chr1.one.phased.impute2.chunk1_info
??????? chr1.one.phased.impute2.chunk1_info_by_sample
??????? chr1.one.phased.impute2.chunk1_summary
??????? chr1.one.phased.impute2.chunk1_warnings

For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23
chunks.

I am using the DatABEL package to? convert them databel format using the
following command:


impute2databel(genofile="chr1.one.phased.impute2.chunk1",
samplefile="chr1.one.phased.impute2.chunk1_info",
outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)?

which uses two files per chunk.


Is there a way I can automate this so that the code goes through each chunk of
each chromosome and does the conversion to databel format.


Thanks,

?-Debs

Uwe Ligges

2011-Oct-22 17:25 UTC

head link

[R] Reading in and modifying multiple datasets in a loop

On 21.10.2011 23:32, Debs Majumdar wrote:>
>
> Hi,
>
>    I have been given a set of around 300 files where there are 5 files
corresponding to each chunk.
>
> E.g. Chunk 1 for chr1 contains these 5 files:
>
>          chr1.one.phased.impute2.chunk1
>          chr1.one.phased.impute2.chunk1_info
>          chr1.one.phased.impute2.chunk1_info_by_sample
>          chr1.one.phased.impute2.chunk1_summary
>          chr1.one.phased.impute2.chunk1_warnings
>
> For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22
with 23 chunks.
>
> I am using the DatABEL package to  convert them databel format using the
following command:
>
>
> impute2databel(genofile="chr1.one.phased.impute2.chunk1",
samplefile="chr1.one.phased.impute2.chunk1_info",
outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)
>
> which uses two files per chunk.
>
>
> Is there a way I can automate this so that the code goes through each chunk
of each chromosome and does the conversion to databel format.

Yes, probably (all untested):

owd <- setwd(pth)
fls <- list.files(pattern="^chr")
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1))
for(i in ufls){
     of <- strsplit(i, "\\.")[[1]]
     of <- paste(of[1], tail(of, 1), sep=".")
     impute2databel(genofile = i,
                    samplefile = paste(i, "info", sep="_"),
                    outfile = of,
                    makeprob=TRUE, old=FALSE)
}
setwd(owd)



Uwe Ligges



>
> Thanks,
>
>   -Debs
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Debs Majumdar

2011-Oct-24 21:10 UTC

head link

[R] Reading in and modifying multiple datasets in a loop

Thanks Uwe. This works perfectly.

#######


owd <- setwd(pth) 
fls <- list.files(pattern="^chr") 
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1)) 
for(i in ufls){ 
? ? ?of <- strsplit(i, "\\.")[[1]] 
? ? ?of <- paste(of[1], tail(of, 1), sep=".") 
? ? ?impute2databel(genofile = i, 
? ? ? ? ? ? ? ? ? ? samplefile = paste(i, "info", sep="_"), 
? ? ? ? ? ? ? ? ? ? outfile = of, 
? ? ? ? ? ? ? ? ? ? makeprob=TRUE, old=FALSE) 
} 
setwd(owd) 

####


I have a question regarding how strsplit works.

When my files are the following:

??????? chr1.one.phased.impute2.chunk1
??????? chr1.one.phased.impute2.chunk1_info
??????? chr1.one.phased.impute2.chunk1_info_by_sample
??????? chr1.one.phased.impute2.chunk1_summary
??????? chr1.one.phased.impute2.chunk1_warnings
ufls <- unique(sapply(strsplit(fls, "_"), "[", 1))

This works like a charm.

I have another dataset where the files are


? ? ? ? study1_chr1.one.phased.impute2.chunk1
??????? study1_chr1.one.phased.impute2.chunk1_info
??????? study1_chr1.one.phased.impute2.chunk1_info_by_sample
??????? study1_chr1.one.phased.impute2.chunk1_summary
??????? study1_chr1.one.phased.impute2.chunk1_warnings

... and so on.

and I wanted to run the same loop but I was unable to change strsplit so that it
will work when the files are names ads above:

I tried 

ufls <- unique(sapply(strsplit(fls, "_"), "[", 2)) 

but this knocks off "study1" (modified code below).? What modification
do I need to make to make this run:

####

fls <- list.files(pattern="study1_chr")
ufls <- unique(sapply(strsplit(fls, "_"), "[", 2)) 

library(GenABEL)

for(i in ufls){
???? of <- strsplit(i, "\\.")[[1]]
???? of <- paste(of[1], tail(of, 1), sep=".")
???? impute2databel(genofile = i,
??????????????????? samplefile = paste(i, "info", sep="_"),
??????????????????? outfile = of,
??????????????????? makeprob=TRUE, old=FALSE)

}

#####

Thanks,

?Debs


----- Original Message -----
From: Debs Majumdar <debs_stata at yahoo.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Friday, October 21, 2011 2:32 PM
Subject: Reading in and modifying multiple datasets in a loop



Hi,

? I have been given a set of around 300 files where there are 5 files
corresponding to each chunk.

E.g. Chunk 1 for chr1 contains these 5 files:

? ? ??? chr1.one.phased.impute2.chunk1
??????? chr1.one.phased.impute2.chunk1_info
??????? chr1.one.phased.impute2.chunk1_info_by_sample
??????? chr1.one.phased.impute2.chunk1_summary
??????? chr1.one.phased.impute2.chunk1_warnings

For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22 with 23
chunks.

I am using the DatABEL package to? convert them databel format using the
following command:


impute2databel(genofile="chr1.one.phased.impute2.chunk1",
samplefile="chr1.one.phased.impute2.chunk1_info",
outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)?

which uses two files per chunk.


Is there a way I can automate this so that the code goes through each chunk of
each chromosome and does the conversion to databel format.


Thanks,

?-Debs

Uwe Ligges

2011-Oct-26 07:23 UTC

head link

[R] Reading in and modifying multiple datasets in a loop

On 24.10.2011 23:10, Debs Majumdar wrote:> Thanks Uwe. This works perfectly.
>
> #######
>
>
> owd<- setwd(pth)
> fls<- list.files(pattern="^chr")
> ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))
> for(i in ufls){
>       of<- strsplit(i, "\\.")[[1]]
>       of<- paste(of[1], tail(of, 1), sep=".")
>       impute2databel(genofile = i,
>                      samplefile = paste(i, "info",
sep="_"),
>                      outfile = of,
>                      makeprob=TRUE, old=FALSE)
> }
> setwd(owd)
>
> ####
>
>
> I have a question regarding how strsplit works.
>
> When my files are the following:
>
>          chr1.one.phased.impute2.chunk1
>          chr1.one.phased.impute2.chunk1_info
>          chr1.one.phased.impute2.chunk1_info_by_sample
>          chr1.one.phased.impute2.chunk1_summary
>          chr1.one.phased.impute2.chunk1_warnings
> ufls<- unique(sapply(strsplit(fls, "_"), "[", 1))
>
> This works like a charm.
>
> I have another dataset where the files are
>
>
>          study1_chr1.one.phased.impute2.chunk1
>          study1_chr1.one.phased.impute2.chunk1_info
>          study1_chr1.one.phased.impute2.chunk1_info_by_sample
>          study1_chr1.one.phased.impute2.chunk1_summary
>          study1_chr1.one.phased.impute2.chunk1_warnings
>
> ... and so on.
>
> and I wanted to run the same loop but I was unable to change strsplit so
that it will work when the files are names ads above:
>
> I tried
>
> ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))

unique(gsub("(_.*)_.*", "\\1", x))

Should do if there is a first underscore.

Uwe Ligges


> but this knocks off "study1" (modified code below).  What
modification do I need to make to make this run:
>
> ####
>
> fls<- list.files(pattern="study1_chr")
> ufls<- unique(sapply(strsplit(fls, "_"), "[", 2))
>
> library(GenABEL)
>
> for(i in ufls){
>       of<- strsplit(i, "\\.")[[1]]
>       of<- paste(of[1], tail(of, 1), sep=".")
>       impute2databel(genofile = i,
>                      samplefile = paste(i, "info",
sep="_"),
>                      outfile = of,
>                      makeprob=TRUE, old=FALSE)
>
> }
>
> #####
>
> Thanks,
>
>   Debs
>
>
> ----- Original Message -----
> From: Debs Majumdar<debs_stata at yahoo.com>
> To: "r-help at r-project.org"<r-help at r-project.org>
> Cc:
> Sent: Friday, October 21, 2011 2:32 PM
> Subject: Reading in and modifying multiple datasets in a loop
>
>
>
> Hi,
>
>    I have been given a set of around 300 files where there are 5 files
corresponding to each chunk.
>
> E.g. Chunk 1 for chr1 contains these 5 files:
>
>          chr1.one.phased.impute2.chunk1
>          chr1.one.phased.impute2.chunk1_info
>          chr1.one.phased.impute2.chunk1_info_by_sample
>          chr1.one.phased.impute2.chunk1_summary
>          chr1.one.phased.impute2.chunk1_warnings
>
> For chr 1 there are 47 chunks, chr2 has 42 chunks...and it ends at chr22
with 23 chunks.
>
> I am using the DatABEL package to  convert them databel format using the
following command:
>
>
> impute2databel(genofile="chr1.one.phased.impute2.chunk1",
samplefile="chr1.one.phased.impute2.chunk1_info",
outfile="chr1.chunk1", makeprob=TRUE, old=FALSE)
>
> which uses two files per chunk.
>
>
> Is there a way I can automate this so that the code goes through each chunk
of each chromosome and does the conversion to databel format.
>
>
> Thanks,
>
>   -Debs
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2011 - Reading in and modifying multiple datasets in a loop

[R] Reading in and modifying multiple datasets in a loop

[R] Reading in and modifying multiple datasets in a loop

[R] Reading in and modifying multiple datasets in a loop

[R] Reading in and modifying multiple datasets in a loop