Dear R users, I have various vector geometry operations to perform on 3-D coordinate data located on multiple (500+) csv files. The code I have written for the calculations works just fine. I have written a 'for' loop to automate the task of extracting the coordinates from the files and perform the analyses. The loop works reasonable well, but if the number of csv files is greater than the number of output variables the latter get repeated in the output data frame until they equal the number of files read. I think this is because at one stage I force the output into a matrix in order to transpose it so that I can have variables in columns and observations in rows - apparently this forces the matrix to be square... Is there another way to do this that avoids the redundant columns? Also, I use the filenames (extracted from Sys_glob) )as rownames, but this inserts the entire file path - is there a way to just extract the actual .csv filename and use this? My code is below (with three, much simplified, operations). A typical (again, much simplified) csv file looks like this: x y z 10 5 2 20 15 12 30 25 22 I'm using OSX 10.5.6 and R 2.8.1. Thanks a lot for any advice. Steve # ----------------------------------------START R-CODE----------------------------------- filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of files to process # use * to get all variables <- data.frame(1:length(filenames)) # preallocate assuming multiple values from each file # creates a dataframe with the same length of rows as the number of .csv files to process for (i in seq_along(filenames)){ input <- read.csv(filenames[i], header=TRUE, na.strings="NA") data.frame("input") attach(input) result.A <- x[2]*y[1] result.B <- y[2]-x[1] result.C <- x[3]+y[1] results <- c(result.A, result.B, result.C) # concatenate result vectors variables[i] <- results } variables <- as.data.frame(t(as.matrix(variables))) # turn result vectors into a matrix, then transpose it and output as a data frame # add column and row names c.names <- c("ResultA", "ResultB", "ResultC") # set names for result vectors colnames(variables) <- c.names rownames(variables) <- filenames # export to csv file write.csv(variables, file="/Users/Desktop/Test.csv") # ----------------------------------------END R-CODE----------------------------------- -- View this message in context: http://www.nabble.com/Looping-multiple-output-values-to-dataframe-tp21981108p21981108.html Sent from the R help mailing list archive at Nabble.com.
Stropharia wrote:> # ----------------------------------------START > R-CODE----------------------------------- > filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of files to > process # use * to get all > > variables <- data.frame(1:length(filenames)) # preallocate assuming multiple > values from each file # creates a dataframe with the same length of rows as > the number of .csv files to process > > for (i in seq_along(filenames)){ > input <- read.csv(filenames[i], header=TRUE, na.strings="NA") > data.frame("input") > attach(input) > > result.A <- x[2]*y[1] > result.B <- y[2]-x[1] > result.C <- x[3]+y[1] > > results <- c(result.A, result.B, result.C) # concatenate result vectors > > variables[i] <- results > } > > variables <- as.data.frame(t(as.matrix(variables))) # turn result vectors > into a matrix, then transpose it and output as a data frame > > # add column and row names > c.names <- c("ResultA", "ResultB", "ResultC") # set names for result vectors > colnames(variables) <- c.names > rownames(variables) <- filenames > > # export to csv file > write.csv(variables, file="/Users/Desktop/Test.csv") > # ----------------------------------------END > R-CODE----------------------------------- >I think something like this should work better: docalc <- function(thisfile){ input <- read.csv(filenames[i], header=TRUE, na.strings="NA") attach(input) result.A <- x[2]*y[1] result.B <- y[2]-x[1] result.C <- x[3]+y[1] results <- c(result.A, result.B, result.C) # concatenate result vectors names(results) <- c("ResultA", "ResultB", "ResultC") return(results) } variables <- sapply(filenames,docalc) -- Levi Waldron post-doctoral fellow Jurisica Lab, Ontario Cancer Institute Division of Signaling Biology IBM Life Sciences Discovery Centre TMDT 9-304D 101 College Street Toronto, Ontario M5G 1L7 (416)581-7453
Thanks a lot Levi. Your code was much shorter and more elegant. With a few minor alterations I got this (see below) to work. Does anyone know if there is a way to automate getting only the csv filenames in a folder (rather than the whole file path)? Or to automate extracting the file names from the file paths, once they have been extracted using Sys-glob? Thanks. Steve # ----------------------------------------START R-CODE----------------------------------- filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of files to process # use * to get all variables <- data.frame(1:length(filenames)) # preallocate assuming multiple values from each file # creates a dataframe with the same length of rows as the number of .csv files to process docalc <- function(filenames){ input <- read.csv(filenames, header=TRUE, na.strings="NA") attach(input) result.A <- x[2]*y[1] result.B <- y[2]-x[1] result.C <- x[3]+y[1] results <- c(result.A, result.B, result.C) # concatenate result vectors names(results) <- c("ResultA", "ResultB", "ResultC") # set names for result vectors return(results) } variables <- t(sapply(filenames, docalc)) # transpose and sapply filenames # export to csv file write.csv(variables, file="/Users/Desktop/Test.csv") # ----------------------------------------END R-CODE----------------------------------- Levi Waldron-3 wrote:> > Stropharia wrote: >> # ----------------------------------------START >> R-CODE----------------------------------- >> filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of files >> to >> process # use * to get all >> >> variables <- data.frame(1:length(filenames)) # preallocate assuming >> multiple >> values from each file # creates a dataframe with the same length of rows >> as >> the number of .csv files to process >> >> for (i in seq_along(filenames)){ >> input <- read.csv(filenames[i], header=TRUE, na.strings="NA") >> data.frame("input") >> attach(input) >> >> result.A <- x[2]*y[1] >> result.B <- y[2]-x[1] >> result.C <- x[3]+y[1] >> >> results <- c(result.A, result.B, result.C) # concatenate result vectors >> >> variables[i] <- results >> } >> >> variables <- as.data.frame(t(as.matrix(variables))) # turn result vectors >> into a matrix, then transpose it and output as a data frame >> >> # add column and row names >> c.names <- c("ResultA", "ResultB", "ResultC") # set names for result >> vectors >> colnames(variables) <- c.names >> rownames(variables) <- filenames >> >> # export to csv file >> write.csv(variables, file="/Users/Desktop/Test.csv") >> # ----------------------------------------END >> R-CODE----------------------------------- >> > I think something like this should work better: > > docalc <- function(thisfile){ > input <- read.csv(filenames[i], header=TRUE, na.strings="NA") > attach(input) > result.A <- x[2]*y[1] > result.B <- y[2]-x[1] > result.C <- x[3]+y[1] > results <- c(result.A, result.B, result.C) # concatenate result > vectors > names(results) <- c("ResultA", "ResultB", "ResultC") > return(results) > } > > variables <- sapply(filenames,docalc) > > -- > Levi Waldron > post-doctoral fellow > Jurisica Lab, Ontario Cancer Institute > Division of Signaling Biology > IBM Life Sciences Discovery Centre > TMDT 9-304D > 101 College Street > Toronto, Ontario M5G 1L7 > (416)581-7453 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Looping-multiple-output-values-to-dataframe-tp21981108p21984499.html Sent from the R help mailing list archive at Nabble.com.
?list files ... in particular the pattern argument -- David Winsemius On Feb 12, 2009, at 3:38 PM, Stropharia wrote:> > Thanks a lot Levi. Your code was much shorter and more elegant. With > a few > minor alterations I got this (see below) to work. > > Does anyone know if there is a way to automate getting only the csv > filenames in a folder (rather than the whole file path)? Or to > automate > extracting the file names from the file paths, once they have been > extracted > using Sys-glob? Thanks. > > Steve > > # ----------------------------------------START > R-CODE----------------------------------- > filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of > files to > process # use * to get all > > variables <- data.frame(1:length(filenames)) # preallocate assuming > multiple > values from each file # creates a dataframe with the same length of > rows as > the number of .csv files to process > > docalc <- function(filenames){ > input <- read.csv(filenames, header=TRUE, na.strings="NA") > attach(input) > result.A <- x[2]*y[1] > result.B <- y[2]-x[1] > result.C <- x[3]+y[1] > results <- c(result.A, result.B, result.C) # concatenate result > vectors > names(results) <- c("ResultA", "ResultB", "ResultC") # set names > for > result vectors > return(results) > } > > variables <- t(sapply(filenames, docalc)) # transpose and sapply > filenames > > # export to csv file > write.csv(variables, file="/Users/Desktop/Test.csv") > # ----------------------------------------END > R-CODE----------------------------------- > > > > > > > Levi Waldron-3 wrote: >> >> Stropharia wrote: >>> # ----------------------------------------START >>> R-CODE----------------------------------- >>> filenames <- Sys.glob("/Users/Desktop/Test/*.csv") # get names of >>> files >>> to >>> process # use * to get all >>> >>> variables <- data.frame(1:length(filenames)) # preallocate assuming >>> multiple >>> values from each file # creates a dataframe with the same length >>> of rows >>> as >>> the number of .csv files to process >>> >>> for (i in seq_along(filenames)){ >>> input <- read.csv(filenames[i], header=TRUE, na.strings="NA") >>> data.frame("input") >>> attach(input) >>> >>> result.A <- x[2]*y[1] >>> result.B <- y[2]-x[1] >>> result.C <- x[3]+y[1] >>> >>> results <- c(result.A, result.B, result.C) # concatenate result >>> vectors >>> >>> variables[i] <- results >>> } >>> >>> variables <- as.data.frame(t(as.matrix(variables))) # turn result >>> vectors >>> into a matrix, then transpose it and output as a data frame >>> >>> # add column and row names >>> c.names <- c("ResultA", "ResultB", "ResultC") # set names for result >>> vectors >>> colnames(variables) <- c.names >>> rownames(variables) <- filenames >>> >>> # export to csv file >>> write.csv(variables, file="/Users/Desktop/Test.csv") >>> # ----------------------------------------END >>> R-CODE----------------------------------- >>> >> I think something like this should work better: >> >> docalc <- function(thisfile){ >> input <- read.csv(filenames[i], header=TRUE, na.strings="NA") >> attach(input) >> result.A <- x[2]*y[1] >> result.B <- y[2]-x[1] >> result.C <- x[3]+y[1] >> results <- c(result.A, result.B, result.C) # concatenate result >> vectors >> names(results) <- c("ResultA", "ResultB", "ResultC") >> return(results) >> } >> >> variables <- sapply(filenames,docalc) >> >> -- >> Levi Waldron >> post-doctoral fellow >> Jurisica Lab, Ontario Cancer Institute >> Division of Signaling Biology >> IBM Life Sciences Discovery Centre >> TMDT 9-304D >> 101 College Street >> Toronto, Ontario M5G 1L7 >> (416)581-7453 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: http://www.nabble.com/Looping-multiple-output-values-to-dataframe-tp21981108p21984499.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.