thr3ads.net - R help - [R] reading in multiple data sets in 2 loops [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Reka Howard

2016-Feb-06 05:53 UTC

[R] reading in multiple data sets in 2 loops

Hello,
I have over 1000 csv data sets I need to read into R, so I want to read
them in using a loop. The data sets are named as
pheno_1000ind_4000m_add_h70_prog_1_2.csv,
pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the
last 2 numbers in the names). What I would like to do is the following:

setwd("C:/Research3/simulation1/second_gen")
d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
.
.
.

I am wondering how I can accomplish this with a loop. Any suggestion is
appreciated!
I tried the following but it does not work:

data <- lapply(
 paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
read.csv, header=TRUE, sep=',' )
names(data) <- paste("d", LETTERS[1:3], sep='')

Thanks!
Reka

	[[alternative HTML version deleted]]

Jim Lemon

2016-Feb-06 09:44 UTC

head link

[R] reading in multiple data sets in 2 loops

Hi Reka,
Try this:

header<-"C:/Research3/simulation1/second_gen/pheno_
1000ind_4000m_add_h70_prog"
for(index1 in 1:2) {
 for(index2 in 2:3)
 
read.csv(paste(paste(header,index1,index2,sep="_"),".csv",sep=""))
}

Jim

On Sat, Feb 6, 2016 at 4:53 PM, Reka Howard <howardr at iastate.edu>
wrote:
> Hello,
> I have over 1000 csv data sets I need to read into R, so I want to read
> them in using a loop. The data sets are named as
> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the
> last 2 numbers in the names). What I would like to do is the following:
>
> setwd("C:/Research3/simulation1/second_gen")
> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
> .
> .
> .
>
> I am wondering how I can accomplish this with a loop. Any suggestion is
> appreciated!
> I tried the following but it does not work:
>
> data <- lapply(
>
> 
paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
> read.csv, header=TRUE, sep=',' )
> names(data) <- paste("d", LETTERS[1:3], sep='')
>
> Thanks!
> Reka
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2016-Feb-06 20:01 UTC

head link

[R] reading in multiple data sets in 2 loops

Normally one wants not only to read the data, but to save it in an object 
as well. Here are some modifications toward achieving that (untested):

header<-"C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog"
fnums <- expand.grid( a = 1:2, b = 2:3 )
result <- vector( "list", nrow( fnums ) )
for ( idx in seq.int( nrow( fnums ) ) ) {
   result[[ idx ]] <- read.csv( paste( paste( header
                                            , fnums$a[ idx ]
                                            , fnums$b[ idx ]
                                            , sep = "_"
                                            )
                                     , ".csv"
                                     , sep = ""
                                     )
                              )
   # optionally remember which file each data record came from
   # assumes none of your input columns are labelled "a" or
"b"
   result[[ idx ]]$a <- fnums$a[ idx ]
   result[[ idx ]]$b <- fnums$b[ idx ]
}

# you could also put all of the data into one data frame
result2 <- do.call( rbind, result )

# you could also do all of this in one dplyr pipe
library(dplyr)
result3 <- (   expand.grid( a = 1:2, b = 2:3 )
            %>% rowwise # work through each row of the a/b combinations
            %>% do( data.frame( a = .$a
                              , b = .$b
                              , read.csv( paste( paste( header
                                                      , .$a
                                                      , .$b
                                                      , sep = "_"
                                                      )
                                               , ".csv"
                                               , sep = ""
                                               )
                                        )
                              )
                  )
            %>% as.data.frame
            )


On Sat, 6 Feb 2016, Jim Lemon wrote:
> Hi Reka,
> Try this:
>
> header<-"C:/Research3/simulation1/second_gen/pheno_
> 1000ind_4000m_add_h70_prog"
> for(index1 in 1:2) {
> for(index2 in 2:3)
> 
read.csv(paste(paste(header,index1,index2,sep="_"),".csv",sep=""))
> }
>
> Jim
>
> On Sat, Feb 6, 2016 at 4:53 PM, Reka Howard <howardr at iastate.edu>
wrote:
>
>> Hello,
>> I have over 1000 csv data sets I need to read into R, so I want to read
>> them in using a loop. The data sets are named as
>> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
>> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for
the
>> last 2 numbers in the names). What I would like to do is the following:
>>
>> setwd("C:/Research3/simulation1/second_gen")
>> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
>> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
>> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
>> .
>> .
>> .
>>
>> I am wondering how I can accomplish this with a loop. Any suggestion is
>> appreciated!
>> I tried the following but it does not work:
>>
>> data <- lapply(
>>
>> 
paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
>> read.csv, header=TRUE, sep=',' )
>> names(data) <- paste("d", LETTERS[1:3], sep='')
>>
>> Thanks!
>> Reka
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

William Dunlap

2016-Feb-06 23:27 UTC

head link

[R] reading in multiple data sets in 2 loops

I tried the following but it does not work:

    data <- lapply(
     paste(("C:/Research3/simulation1/second_gen/pheno_
1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
    read.csv, header=TRUE, sep=',' )
    names(data) <- paste("d", LETTERS[1:3], sep='')

I tried that and R complained about syntax errors - unexpected commas,
mismatched parentheses, illegal square brackets, etc.

Using lapply like this a perfectly fine way to solve  the problem but you
need to get the details right.  I find it easier to break  that statement
into parts and make sure each part is working.  E.g., after a minimal
cleanup of your code the file names would be computed as
    fileNames <-  paste("C:/Research3/simulation1/second_gen/pheno_
1000ind_4000m_add_h70_prog_", 1:2 ,"_", 2:3
,".csv",sep='')
    print(fileNames) # do they look right?  You said you wanted 1_2, 1_3,
2_3 but that will give you only 2 of them
or perhaps you want all the files in that directory with a given pattern
    fileNames <- dir("C:/Research3/simulation1/second_gen",
pattern="^pheno_1000ind_4000m_add_h70_prog_[[:digit:]]+_[[:digit:]]+\\.csv$",
full.names=TRUE, ignore.case=TRUE)
    head(fileNames) # keep at it until the fileNames list looks good
    tail(fileNames)

Then read the data from the files with
    data <- lapply(fileNames, read.csv, header=TRUE, sep=",")
If there are errors reading the files in csv format you could try
    data <- lapply(fileNames, function(fileName) { cat(fileName,
"\n");
read.csv(fileName, header=TRUE, sep=",")}
so you can see the name of the first offending file.

When you attach names you probably want to get the names from the fileNames
variable, perhaps just the digits part
    names(data) <- gsub("^.*([[:digit:]]+_[[:digit:]]+)\\.csv$",
"d_\\1",
fileNames)



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Feb 5, 2016 at 9:53 PM, Reka Howard <howardr at iastate.edu>
wrote:
> Hello,
> I have over 1000 csv data sets I need to read into R, so I want to read
> them in using a loop. The data sets are named as
> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the
> last 2 numbers in the names). What I would like to do is the following:
>
> setwd("C:/Research3/simulation1/second_gen")
> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
> .
> .
> .
>
> I am wondering how I can accomplish this with a loop. Any suggestion is
> appreciated!
> I tried the following but it does not work:
>
> data <- lapply(
>
> 
paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
> read.csv, header=TRUE, sep=',' )
> names(data) <- paste("d", LETTERS[1:3], sep='')
>
> Thanks!
> Reka
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Boris Steipe

2016-Feb-07 01:18 UTC

head link

[R] reading in multiple data sets in 2 loops

Computing filenames is a dangerous, backwards approach. If you already _have_
files, it's wrong to create filenames from assumptions. Rather you need to
capture the existing filenames with an appropriate use of list.files(), and then
process that vector. Computing filenames only has a place when you are creating
new files.

Cheers,
Boris




On Feb 6, 2016, at 6:27 PM, William Dunlap via R-help <r-help at
r-project.org> wrote:
>    I tried the following but it does not work:
> 
>    data <- lapply(
>     paste(("C:/Research3/simulation1/second_gen/pheno_
>
1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
>    read.csv, header=TRUE, sep=',' )
>    names(data) <- paste("d", LETTERS[1:3], sep='')
> 
> I tried that and R complained about syntax errors - unexpected commas,
> mismatched parentheses, illegal square brackets, etc.
> 
> Using lapply like this a perfectly fine way to solve  the problem but you
> need to get the details right.  I find it easier to break  that statement
> into parts and make sure each part is working.  E.g., after a minimal
> cleanup of your code the file names would be computed as
>    fileNames <-  paste("C:/Research3/simulation1/second_gen/pheno_
> 1000ind_4000m_add_h70_prog_", 1:2 ,"_", 2:3
,".csv",sep='')
>    print(fileNames) # do they look right?  You said you wanted 1_2, 1_3,
> 2_3 but that will give you only 2 of them
> or perhaps you want all the files in that directory with a given pattern
>    fileNames <- dir("C:/Research3/simulation1/second_gen",
>
pattern="^pheno_1000ind_4000m_add_h70_prog_[[:digit:]]+_[[:digit:]]+\\.csv$",
> full.names=TRUE, ignore.case=TRUE)
>    head(fileNames) # keep at it until the fileNames list looks good
>    tail(fileNames)
> 
> Then read the data from the files with
>    data <- lapply(fileNames, read.csv, header=TRUE, sep=",")
> If there are errors reading the files in csv format you could try
>    data <- lapply(fileNames, function(fileName) { cat(fileName,
"\n");
> read.csv(fileName, header=TRUE, sep=",")}
> so you can see the name of the first offending file.
> 
> When you attach names you probably want to get the names from the fileNames
> variable, perhaps just the digits part
>    names(data) <-
gsub("^.*([[:digit:]]+_[[:digit:]]+)\\.csv$", "d_\\1",
> fileNames)
> 
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Fri, Feb 5, 2016 at 9:53 PM, Reka Howard <howardr at iastate.edu>
wrote:
> 
>> Hello,
>> I have over 1000 csv data sets I need to read into R, so I want to read
>> them in using a loop. The data sets are named as
>> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
>> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for
the
>> last 2 numbers in the names). What I would like to do is the following:
>> 
>> setwd("C:/Research3/simulation1/second_gen")
>> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
>> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
>> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
>> .
>> .
>> .
>> 
>> I am wondering how I can accomplish this with a loop. Any suggestion is
>> appreciated!
>> I tried the following but it does not work:
>> 
>> data <- lapply(
>> 
>>
paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
>> read.csv, header=TRUE, sep=',' )
>> names(data) <- paste("d", LETTERS[1:3], sep='')
>> 
>> Thanks!
>> Reka
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Marc Girondot

2016-Feb-07 13:51 UTC

head link

[R] reading in multiple data sets in 2 loops

Try this:

# install package HelpersMG from CRAN including dependencies
install.packages("HelpersMG")
# Update to the lastest version
install.packages("http://www.ese.u-psud.fr/epc/conservation/CRAN/HelpersMG.tar.gz",
repos=NULL, type="source")

# Use the function read_folder()
library("HelpersMG")

content_as_list <- read_folder(folder = 
"C:/Research3/simulation1/second_gen", wildcard = "*.csv",
   read = read.csv)

I have created this function because I had exactely the same poblem that 
you described !

Sincerely,

Marc

Le 06/02/2016 06:53, Reka Howard a ?crit :> Hello,
> I have over 1000 csv data sets I need to read into R, so I want to read
> them in using a loop. The data sets are named as
> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the
> last 2 numbers in the names). What I would like to do is the following:
>
> setwd("C:/Research3/simulation1/second_gen")
> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
> .
> .
> .
>
> I am wondering how I can accomplish this with a loop. Any suggestion is
> appreciated!
> I tried the following but it does not work:
>
> data <- lapply(
>  
paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
> read.csv, header=TRUE, sep=',' )
> names(data) <- paste("d", LETTERS[1:3], sep='')
>
> Thanks!
> Reka
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Feb 2016 - reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops

[R] reading in multiple data sets in 2 loops