thr3ads.net - R help - [R] Loops for repetitive task [Aug 2011]

If this information is useful, please help other people find it:
Share via:

a217

2011-Aug-10 04:31 UTC

[R] Loops for repetitive task

Hello,

I have an R script that I use as a template to perform a task for multiple
files (in this case, multiple chromosomes).

What I would like to do is to utilize a simple loop to parse through each
chromosome number so that I don't have to type the same code over and over
again in the R console.

I've tried using:

for(i in 1:22){
etc..
}

and replacing each chromosome number with [[i]], but that did not seem to
work.

Below is the script I have. Basically everywhere you see a '2' I would
like
there to be an 'i' so that the script can be applied in a general sense.
################################Code###############################

chr2.data<-read.table(file="chr2.out.txt", header=F)
colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand")
splc2<-split(chr2.data, paste(chr2.data$chr))
chr2.df<-as.data.frame(t(sapply(splc2, function(x)
list(TR=NROW(x[['totalreads']]),   
RG1=sum(x[['totalreads']]>=1),
percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']]))))))
chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x)
summary(x$methylation))))
chr2.summ<-cbind(chr2.df,chr2.df.summ)

##################################################################


Here are some sample input files in case you'd like to test the code:
##########
# chr1.out.txt
##########
chr1	100	159	104	104	1	0.05	+
chr1	100	159	145	145	1	0.04	+
chr1	200	260	205	205	1	0.12	+
chr1	500	750	600	600	1	0.09	+

##########
# chr2.out.txt
##########
chr2	100	200	105	105	1	0.03	+
chr2	100	200	110	110	1	0.08	+
chr2	300	400	350	350	0	0	+


The code works perfectly fine just typing everything out by hand, but that
is very inefficient given that there are 24 chromosomes for each dataset. I
am just looking for any suggestions as to how I can write a general version
of this code.


--
View this message in context:
http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html
Sent from the R help mailing list archive at Nabble.com.

Peter Alspach

2011-Aug-10 05:01 UTC

head link

[R] Loops for repetitive task

Tena koe

Try something along the following lines:

chrData <- vector('list', 22)
names(chrData) <- paste('chr', 1:22, sep='')

for (i in 1:length(chrData))
{
  chrData[[i]] <- read.table(file=paste('chr', i, '.out.txt',
sep=''), header=F)
  ...
}

HTH ....

Peter Alspach
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of a217
> Sent: Wednesday, 10 August 2011 4:32 p.m.
> To: r-help at r-project.org
> Subject: [R] Loops for repetitive task
> 
> Hello,
> 
> I have an R script that I use as a template to perform a task for
> multiple
> files (in this case, multiple chromosomes).
> 
> What I would like to do is to utilize a simple loop to parse through
> each
> chromosome number so that I don't have to type the same code over and
> over
> again in the R console.
> 
> I've tried using:
> 
> for(i in 1:22){
> etc..
> }
> 
> and replacing each chromosome number with [[i]], but that did not seem
> to
> work.
> 
> Below is the script I have. Basically everywhere you see a '2' I
would
> like
> there to be an 'i' so that the script can be applied in a general
> sense.
> ################################Code###############################
> 
> chr2.data<-read.table(file="chr2.out.txt", header=F)
> colnames(chr2.data)<-
>
c("chr","start","end","base1","base2","totalreads","methylation","stran
> d")
> splc2<-split(chr2.data, paste(chr2.data$chr))
> chr2.df<-as.data.frame(t(sapply(splc2, function(x)
> list(TR=NROW(x[['totalreads']]),   
RG1=sum(x[['totalreads']]>=1),
>
percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']]))))))
> chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x)
> summary(x$methylation))))
> chr2.summ<-cbind(chr2.df,chr2.df.summ)
> 
> ##################################################################
> 
> 
> Here are some sample input files in case you'd like to test the code:
> ##########
> # chr1.out.txt
> ##########
> chr1	100	159	104	104	1	0.05	+
> chr1	100	159	145	145	1	0.04	+
> chr1	200	260	205	205	1	0.12	+
> chr1	500	750	600	600	1	0.09	+
> 
> ##########
> # chr2.out.txt
> ##########
> chr2	100	200	105	105	1	0.03	+
> chr2	100	200	110	110	1	0.08	+
> chr2	300	400	350	350	0	0	+
> 
> 
> The code works perfectly fine just typing everything out by hand, but
> that
> is very inefficient given that there are 24 chromosomes for each
> dataset. I
> am just looking for any suggestions as to how I can write a general
> version
> of this code.
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Loops-for-
> repetitive-task-tp3732022p3732022.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be subject to legal
privilege.
 If you are not the intended recipient you must not use, disseminate, distribute
or
 reproduce all or any part of this e-mail or attachments.  If you have received
this
 e-mail in error, please notify the sender and delete all material pertaining to
this
 e-mail.  Any opinion or views expressed in this e-mail are those of the
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

Dennis Murphy

2011-Aug-10 13:12 UTC

head link

[R] Loops for repetitive task

Hi:

Try this:

## Function that takes a data frame as input and outputs a data frame:
chrSumm <- function(d) {   # d is a data frame
    colnames(d) <-
c("chr","start","end","base1","base2",
                    
"totalreads","methylation","strand")
    TR <- nrow(d)
    RG1 <- sum(d['totalreads'] >= 1)
    percent <- TR/RG1
    methylSumm <- summary(d$methylation)
    names(methylSumm) <- c('Min', 'Q1', 'Median',
'Mean', 'Q3', 'Max')
    data.frame(TR, RG1, percent, as.data.frame(as.list(methylSumm)))
  }

# Read the data files into a list and apply the function to each file
recursively,
# resulting in a data frame

# vector of file names
files <- c('chr1.out.txt', 'chr2.out.txt')
# use lapply() to read files into a list
filelist <- lapply(files, read.table, header = FALSE)
# Use the ldply() function from the plyr package to
# process the list and return a data frame
library('plyr')
ldply(filelist, chrSumm)

# Result from your example:> ldply(filelist, chrSumm)  TR RG1 percent  Min     Q1 Median    Mean     Q3  Max
1  4   4     1.0 0.04 0.0475   0.07 0.07500 0.0975 0.12
2  3   2     1.5 0.00 0.0150   0.03 0.03667 0.0550 0.08

HTH,
Dennis

On Tue, Aug 9, 2011 at 9:31 PM, a217 <ajn21 at case.edu>
wrote:> Hello,
>
> I have an R script that I use as a template to perform a task for multiple
> files (in this case, multiple chromosomes).
>
> What I would like to do is to utilize a simple loop to parse through each
> chromosome number so that I don't have to type the same code over and
over
> again in the R console.
>
> I've tried using:
>
> for(i in 1:22){
> etc..
> }
>
> and replacing each chromosome number with [[i]], but that did not seem to
> work.
>
> Below is the script I have. Basically everywhere you see a '2' I
would like
> there to be an 'i' so that the script can be applied in a general
sense.
> ################################Code###############################
>
> chr2.data<-read.table(file="chr2.out.txt", header=F)
>
colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand")
> splc2<-split(chr2.data, paste(chr2.data$chr))
> chr2.df<-as.data.frame(t(sapply(splc2, function(x)
> list(TR=NROW(x[['totalreads']]), ?
?RG1=sum(x[['totalreads']]>=1),
>
percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']]))))))
> chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x)
> summary(x$methylation))))
> chr2.summ<-cbind(chr2.df,chr2.df.summ)
>
> ##################################################################
>
>
> Here are some sample input files in case you'd like to test the code:
> ##########
> # chr1.out.txt
> ##########
> chr1 ? ?100 ? ? 159 ? ? 104 ? ? 104 ? ? 1 ? ? ? 0.05 ? ?+
> chr1 ? ?100 ? ? 159 ? ? 145 ? ? 145 ? ? 1 ? ? ? 0.04 ? ?+
> chr1 ? ?200 ? ? 260 ? ? 205 ? ? 205 ? ? 1 ? ? ? 0.12 ? ?+
> chr1 ? ?500 ? ? 750 ? ? 600 ? ? 600 ? ? 1 ? ? ? 0.09 ? ?+
>
> ##########
> # chr2.out.txt
> ##########
> chr2 ? ?100 ? ? 200 ? ? 105 ? ? 105 ? ? 1 ? ? ? 0.03 ? ?+
> chr2 ? ?100 ? ? 200 ? ? 110 ? ? 110 ? ? 1 ? ? ? 0.08 ? ?+
> chr2 ? ?300 ? ? 400 ? ? 350 ? ? 350 ? ? 0 ? ? ? 0 ? ? ? +
>
>
> The code works perfectly fine just typing everything out by hand, but that
> is very inefficient given that there are 24 chromosomes for each dataset. I
> am just looking for any suggestions as to how I can write a general version
> of this code.
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2011 - Loops for repetitive task

[R] Loops for repetitive task

[R] Loops for repetitive task

[R] Loops for repetitive task

Possibly Parallel Threads