Hello, I have an R script that I use as a template to perform a task for multiple files (in this case, multiple chromosomes). What I would like to do is to utilize a simple loop to parse through each chromosome number so that I don't have to type the same code over and over again in the R console. I've tried using: for(i in 1:22){ etc.. } and replacing each chromosome number with [[i]], but that did not seem to work. Below is the script I have. Basically everywhere you see a '2' I would like there to be an 'i' so that the script can be applied in a general sense. ################################Code############################### chr2.data<-read.table(file="chr2.out.txt", header=F) colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand") splc2<-split(chr2.data, paste(chr2.data$chr)) chr2.df<-as.data.frame(t(sapply(splc2, function(x) list(TR=NROW(x[['totalreads']]), RG1=sum(x[['totalreads']]>=1), percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']])))))) chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x) summary(x$methylation)))) chr2.summ<-cbind(chr2.df,chr2.df.summ) ################################################################## Here are some sample input files in case you'd like to test the code: ########## # chr1.out.txt ########## chr1 100 159 104 104 1 0.05 + chr1 100 159 145 145 1 0.04 + chr1 200 260 205 205 1 0.12 + chr1 500 750 600 600 1 0.09 + ########## # chr2.out.txt ########## chr2 100 200 105 105 1 0.03 + chr2 100 200 110 110 1 0.08 + chr2 300 400 350 350 0 0 + The code works perfectly fine just typing everything out by hand, but that is very inefficient given that there are 24 chromosomes for each dataset. I am just looking for any suggestions as to how I can write a general version of this code. -- View this message in context: http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html Sent from the R help mailing list archive at Nabble.com.
Tena koe Try something along the following lines: chrData <- vector('list', 22) names(chrData) <- paste('chr', 1:22, sep='') for (i in 1:length(chrData)) { chrData[[i]] <- read.table(file=paste('chr', i, '.out.txt', sep=''), header=F) ... } HTH .... Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of a217 > Sent: Wednesday, 10 August 2011 4:32 p.m. > To: r-help at r-project.org > Subject: [R] Loops for repetitive task > > Hello, > > I have an R script that I use as a template to perform a task for > multiple > files (in this case, multiple chromosomes). > > What I would like to do is to utilize a simple loop to parse through > each > chromosome number so that I don't have to type the same code over and > over > again in the R console. > > I've tried using: > > for(i in 1:22){ > etc.. > } > > and replacing each chromosome number with [[i]], but that did not seem > to > work. > > Below is the script I have. Basically everywhere you see a '2' I would > like > there to be an 'i' so that the script can be applied in a general > sense. > ################################Code############################### > > chr2.data<-read.table(file="chr2.out.txt", header=F) > colnames(chr2.data)<- > c("chr","start","end","base1","base2","totalreads","methylation","stran > d") > splc2<-split(chr2.data, paste(chr2.data$chr)) > chr2.df<-as.data.frame(t(sapply(splc2, function(x) > list(TR=NROW(x[['totalreads']]), RG1=sum(x[['totalreads']]>=1), > percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']])))))) > chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x) > summary(x$methylation)))) > chr2.summ<-cbind(chr2.df,chr2.df.summ) > > ################################################################## > > > Here are some sample input files in case you'd like to test the code: > ########## > # chr1.out.txt > ########## > chr1 100 159 104 104 1 0.05 + > chr1 100 159 145 145 1 0.04 + > chr1 200 260 205 205 1 0.12 + > chr1 500 750 600 600 1 0.09 + > > ########## > # chr2.out.txt > ########## > chr2 100 200 105 105 1 0.03 + > chr2 100 200 110 110 1 0.08 + > chr2 300 400 350 350 0 0 + > > > The code works perfectly fine just typing everything out by hand, but > that > is very inefficient given that there are 24 chromosomes for each > dataset. I > am just looking for any suggestions as to how I can write a general > version > of this code. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Loops-for- > repetitive-task-tp3732022p3732022.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
Hi: Try this: ## Function that takes a data frame as input and outputs a data frame: chrSumm <- function(d) { # d is a data frame colnames(d) <- c("chr","start","end","base1","base2", "totalreads","methylation","strand") TR <- nrow(d) RG1 <- sum(d['totalreads'] >= 1) percent <- TR/RG1 methylSumm <- summary(d$methylation) names(methylSumm) <- c('Min', 'Q1', 'Median', 'Mean', 'Q3', 'Max') data.frame(TR, RG1, percent, as.data.frame(as.list(methylSumm))) } # Read the data files into a list and apply the function to each file recursively, # resulting in a data frame # vector of file names files <- c('chr1.out.txt', 'chr2.out.txt') # use lapply() to read files into a list filelist <- lapply(files, read.table, header = FALSE) # Use the ldply() function from the plyr package to # process the list and return a data frame library('plyr') ldply(filelist, chrSumm) # Result from your example:> ldply(filelist, chrSumm)TR RG1 percent Min Q1 Median Mean Q3 Max 1 4 4 1.0 0.04 0.0475 0.07 0.07500 0.0975 0.12 2 3 2 1.5 0.00 0.0150 0.03 0.03667 0.0550 0.08 HTH, Dennis On Tue, Aug 9, 2011 at 9:31 PM, a217 <ajn21 at case.edu> wrote:> Hello, > > I have an R script that I use as a template to perform a task for multiple > files (in this case, multiple chromosomes). > > What I would like to do is to utilize a simple loop to parse through each > chromosome number so that I don't have to type the same code over and over > again in the R console. > > I've tried using: > > for(i in 1:22){ > etc.. > } > > and replacing each chromosome number with [[i]], but that did not seem to > work. > > Below is the script I have. Basically everywhere you see a '2' I would like > there to be an 'i' so that the script can be applied in a general sense. > ################################Code############################### > > chr2.data<-read.table(file="chr2.out.txt", header=F) > colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand") > splc2<-split(chr2.data, paste(chr2.data$chr)) > chr2.df<-as.data.frame(t(sapply(splc2, function(x) > list(TR=NROW(x[['totalreads']]), ? ?RG1=sum(x[['totalreads']]>=1), > percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']])))))) > chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x) > summary(x$methylation)))) > chr2.summ<-cbind(chr2.df,chr2.df.summ) > > ################################################################## > > > Here are some sample input files in case you'd like to test the code: > ########## > # chr1.out.txt > ########## > chr1 ? ?100 ? ? 159 ? ? 104 ? ? 104 ? ? 1 ? ? ? 0.05 ? ?+ > chr1 ? ?100 ? ? 159 ? ? 145 ? ? 145 ? ? 1 ? ? ? 0.04 ? ?+ > chr1 ? ?200 ? ? 260 ? ? 205 ? ? 205 ? ? 1 ? ? ? 0.12 ? ?+ > chr1 ? ?500 ? ? 750 ? ? 600 ? ? 600 ? ? 1 ? ? ? 0.09 ? ?+ > > ########## > # chr2.out.txt > ########## > chr2 ? ?100 ? ? 200 ? ? 105 ? ? 105 ? ? 1 ? ? ? 0.03 ? ?+ > chr2 ? ?100 ? ? 200 ? ? 110 ? ? 110 ? ? 1 ? ? ? 0.08 ? ?+ > chr2 ? ?300 ? ? 400 ? ? 350 ? ? 350 ? ? 0 ? ? ? 0 ? ? ? + > > > The code works perfectly fine just typing everything out by hand, but that > is very inefficient given that there are 24 chromosomes for each dataset. I > am just looking for any suggestions as to how I can write a general version > of this code. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- R functions
- How to Store the executed values in a dataframe & rle function
- splitting a dataframe in R based on multiple gene names in a specific column
- how to search to value to another table
- splitting a dataframe in R based on multiple gene names in a specific column