Hello, I have over 1000 csv data sets I need to read into R, so I want to read them in using a loop. The data sets are named as pheno_1000ind_4000m_add_h70_prog_1_2.csv, pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the last 2 numbers in the names). What I would like to do is the following: setwd("C:/Research3/simulation1/second_gen") d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") . . . I am wondering how I can accomplish this with a loop. Any suggestion is appreciated! I tried the following but it does not work: data <- lapply( paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), read.csv, header=TRUE, sep=',' ) names(data) <- paste("d", LETTERS[1:3], sep='') Thanks! Reka [[alternative HTML version deleted]]
Hi Reka, Try this: header<-"C:/Research3/simulation1/second_gen/pheno_ 1000ind_4000m_add_h70_prog" for(index1 in 1:2) { for(index2 in 2:3) read.csv(paste(paste(header,index1,index2,sep="_"),".csv",sep="")) } Jim On Sat, Feb 6, 2016 at 4:53 PM, Reka Howard <howardr at iastate.edu> wrote:> Hello, > I have over 1000 csv data sets I need to read into R, so I want to read > them in using a loop. The data sets are named as > pheno_1000ind_4000m_add_h70_prog_1_2.csv, > pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the > last 2 numbers in the names). What I would like to do is the following: > > setwd("C:/Research3/simulation1/second_gen") > d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") > d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") > d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") > . > . > . > > I am wondering how I can accomplish this with a loop. Any suggestion is > appreciated! > I tried the following but it does not work: > > data <- lapply( > > paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), > read.csv, header=TRUE, sep=',' ) > names(data) <- paste("d", LETTERS[1:3], sep='') > > Thanks! > Reka > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Normally one wants not only to read the data, but to save it in an object as well. Here are some modifications toward achieving that (untested): header<-"C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog" fnums <- expand.grid( a = 1:2, b = 2:3 ) result <- vector( "list", nrow( fnums ) ) for ( idx in seq.int( nrow( fnums ) ) ) { result[[ idx ]] <- read.csv( paste( paste( header , fnums$a[ idx ] , fnums$b[ idx ] , sep = "_" ) , ".csv" , sep = "" ) ) # optionally remember which file each data record came from # assumes none of your input columns are labelled "a" or "b" result[[ idx ]]$a <- fnums$a[ idx ] result[[ idx ]]$b <- fnums$b[ idx ] } # you could also put all of the data into one data frame result2 <- do.call( rbind, result ) # you could also do all of this in one dplyr pipe library(dplyr) result3 <- ( expand.grid( a = 1:2, b = 2:3 ) %>% rowwise # work through each row of the a/b combinations %>% do( data.frame( a = .$a , b = .$b , read.csv( paste( paste( header , .$a , .$b , sep = "_" ) , ".csv" , sep = "" ) ) ) ) %>% as.data.frame ) On Sat, 6 Feb 2016, Jim Lemon wrote:> Hi Reka, > Try this: > > header<-"C:/Research3/simulation1/second_gen/pheno_ > 1000ind_4000m_add_h70_prog" > for(index1 in 1:2) { > for(index2 in 2:3) > read.csv(paste(paste(header,index1,index2,sep="_"),".csv",sep="")) > } > > Jim > > On Sat, Feb 6, 2016 at 4:53 PM, Reka Howard <howardr at iastate.edu> wrote: > >> Hello, >> I have over 1000 csv data sets I need to read into R, so I want to read >> them in using a loop. The data sets are named as >> pheno_1000ind_4000m_add_h70_prog_1_2.csv, >> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the >> last 2 numbers in the names). What I would like to do is the following: >> >> setwd("C:/Research3/simulation1/second_gen") >> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") >> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") >> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") >> . >> . >> . >> >> I am wondering how I can accomplish this with a loop. Any suggestion is >> appreciated! >> I tried the following but it does not work: >> >> data <- lapply( >> >> paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), >> read.csv, header=TRUE, sep=',' ) >> names(data) <- paste("d", LETTERS[1:3], sep='') >> >> Thanks! >> Reka >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
I tried the following but it does not work: data <- lapply( paste(("C:/Research3/simulation1/second_gen/pheno_ 1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), read.csv, header=TRUE, sep=',' ) names(data) <- paste("d", LETTERS[1:3], sep='') I tried that and R complained about syntax errors - unexpected commas, mismatched parentheses, illegal square brackets, etc. Using lapply like this a perfectly fine way to solve the problem but you need to get the details right. I find it easier to break that statement into parts and make sure each part is working. E.g., after a minimal cleanup of your code the file names would be computed as fileNames <- paste("C:/Research3/simulation1/second_gen/pheno_ 1000ind_4000m_add_h70_prog_", 1:2 ,"_", 2:3 ,".csv",sep='') print(fileNames) # do they look right? You said you wanted 1_2, 1_3, 2_3 but that will give you only 2 of them or perhaps you want all the files in that directory with a given pattern fileNames <- dir("C:/Research3/simulation1/second_gen", pattern="^pheno_1000ind_4000m_add_h70_prog_[[:digit:]]+_[[:digit:]]+\\.csv$", full.names=TRUE, ignore.case=TRUE) head(fileNames) # keep at it until the fileNames list looks good tail(fileNames) Then read the data from the files with data <- lapply(fileNames, read.csv, header=TRUE, sep=",") If there are errors reading the files in csv format you could try data <- lapply(fileNames, function(fileName) { cat(fileName, "\n"); read.csv(fileName, header=TRUE, sep=",")} so you can see the name of the first offending file. When you attach names you probably want to get the names from the fileNames variable, perhaps just the digits part names(data) <- gsub("^.*([[:digit:]]+_[[:digit:]]+)\\.csv$", "d_\\1", fileNames) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Feb 5, 2016 at 9:53 PM, Reka Howard <howardr at iastate.edu> wrote:> Hello, > I have over 1000 csv data sets I need to read into R, so I want to read > them in using a loop. The data sets are named as > pheno_1000ind_4000m_add_h70_prog_1_2.csv, > pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the > last 2 numbers in the names). What I would like to do is the following: > > setwd("C:/Research3/simulation1/second_gen") > d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") > d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") > d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") > . > . > . > > I am wondering how I can accomplish this with a loop. Any suggestion is > appreciated! > I tried the following but it does not work: > > data <- lapply( > > paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), > read.csv, header=TRUE, sep=',' ) > names(data) <- paste("d", LETTERS[1:3], sep='') > > Thanks! > Reka > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Computing filenames is a dangerous, backwards approach. If you already _have_ files, it's wrong to create filenames from assumptions. Rather you need to capture the existing filenames with an appropriate use of list.files(), and then process that vector. Computing filenames only has a place when you are creating new files. Cheers, Boris On Feb 6, 2016, at 6:27 PM, William Dunlap via R-help <r-help at r-project.org> wrote:> I tried the following but it does not work: > > data <- lapply( > paste(("C:/Research3/simulation1/second_gen/pheno_ > 1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), > read.csv, header=TRUE, sep=',' ) > names(data) <- paste("d", LETTERS[1:3], sep='') > > I tried that and R complained about syntax errors - unexpected commas, > mismatched parentheses, illegal square brackets, etc. > > Using lapply like this a perfectly fine way to solve the problem but you > need to get the details right. I find it easier to break that statement > into parts and make sure each part is working. E.g., after a minimal > cleanup of your code the file names would be computed as > fileNames <- paste("C:/Research3/simulation1/second_gen/pheno_ > 1000ind_4000m_add_h70_prog_", 1:2 ,"_", 2:3 ,".csv",sep='') > print(fileNames) # do they look right? You said you wanted 1_2, 1_3, > 2_3 but that will give you only 2 of them > or perhaps you want all the files in that directory with a given pattern > fileNames <- dir("C:/Research3/simulation1/second_gen", > pattern="^pheno_1000ind_4000m_add_h70_prog_[[:digit:]]+_[[:digit:]]+\\.csv$", > full.names=TRUE, ignore.case=TRUE) > head(fileNames) # keep at it until the fileNames list looks good > tail(fileNames) > > Then read the data from the files with > data <- lapply(fileNames, read.csv, header=TRUE, sep=",") > If there are errors reading the files in csv format you could try > data <- lapply(fileNames, function(fileName) { cat(fileName, "\n"); > read.csv(fileName, header=TRUE, sep=",")} > so you can see the name of the first offending file. > > When you attach names you probably want to get the names from the fileNames > variable, perhaps just the digits part > names(data) <- gsub("^.*([[:digit:]]+_[[:digit:]]+)\\.csv$", "d_\\1", > fileNames) > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Feb 5, 2016 at 9:53 PM, Reka Howard <howardr at iastate.edu> wrote: > >> Hello, >> I have over 1000 csv data sets I need to read into R, so I want to read >> them in using a loop. The data sets are named as >> pheno_1000ind_4000m_add_h70_prog_1_2.csv, >> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the >> last 2 numbers in the names). What I would like to do is the following: >> >> setwd("C:/Research3/simulation1/second_gen") >> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") >> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") >> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") >> . >> . >> . >> >> I am wondering how I can accomplish this with a loop. Any suggestion is >> appreciated! >> I tried the following but it does not work: >> >> data <- lapply( >> >> paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), >> read.csv, header=TRUE, sep=',' ) >> names(data) <- paste("d", LETTERS[1:3], sep='') >> >> Thanks! >> Reka >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Try this: # install package HelpersMG from CRAN including dependencies install.packages("HelpersMG") # Update to the lastest version install.packages("http://www.ese.u-psud.fr/epc/conservation/CRAN/HelpersMG.tar.gz", repos=NULL, type="source") # Use the function read_folder() library("HelpersMG") content_as_list <- read_folder(folder = "C:/Research3/simulation1/second_gen", wildcard = "*.csv", read = read.csv) I have created this function because I had exactely the same poblem that you described ! Sincerely, Marc Le 06/02/2016 06:53, Reka Howard a ?crit :> Hello, > I have over 1000 csv data sets I need to read into R, so I want to read > them in using a loop. The data sets are named as > pheno_1000ind_4000m_add_h70_prog_1_2.csv, > pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the > last 2 numbers in the names). What I would like to do is the following: > > setwd("C:/Research3/simulation1/second_gen") > d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv") > d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv") > d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv") > . > . > . > > I am wondering how I can accomplish this with a loop. Any suggestion is > appreciated! > I tried the following but it does not work: > > data <- lapply( > paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''), > read.csv, header=TRUE, sep=',' ) > names(data) <- paste("d", LETTERS[1:3], sep='') > > Thanks! > Reka > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >