Dear R help list, I am very new to R and I apologize in advance if this has been answered before. I have done my best to google/R search what I need but no luck. Here is what I am attempting: I have hundreds of .csv files that I need to sort based on a single column of alphanumeric data. All of the files contain matrices that have identical dimensions and headers, however the data table doesn't begin until the 74th line in each file. Doing some searching, I have been able to create an object with elements consisting of each file in the folder containing the targets (please note this is my working directory): filenames<-list.files() alldata<-lapply(filenames, read.csv, skip=73, header=TRUE) At this point I believe I have created an object with N elements (where N=# files in the wd), each containing the matrix I am attempting to sort. I am completely lost as to how I can sort each matrix based on a single column (say, "Name") and then either overwrite the source files or write to a new directory all of the sorted data. I half wonder if I should be creating individual objects for each file that I read in, but I haven't been able to figure this out either. Please note that I am trying to sort these files individually - would a loop be more efficient? I appreciate the help, BustedAvi [[alternative HTML version deleted]]
You appear to have a good start. If you type alldata[[1]] do you get what you expect for the first file? This is not tested, but I would start with something like this: sorteddata <- lapply(alldata, function(df) df[order(df$Name),] ) ## then this will overwrite for (id in seq(filenames)) { write.csv( sorteddata[[id]] , filenames[id] ) } ## or changed something like this for new files write.csv( sorteddata[[id]] , paste('sorted_',filenames[id],sep='') ) An you'll want to check the other arguments to write.csv(), or possibly use write.table(). For learning purposes: tmp <- alldata[[1]] tmp[order(tmp$Name),] ## to sort by Name tmp[order(tmp[,2]),] ## to sort by 2nd column -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 5/18/12 9:56 AM, "Matthew Ouellette" <mouellette89 at gmail.com> wrote:>Dear R help list, > >I am very new to R and I apologize in advance if this has been answered >before. I have done my best to google/R search what I need but no luck. > Here is what I am attempting: > >I have hundreds of .csv files that I need to sort based on a single column >of alphanumeric data. All of the files contain matrices that have >identical dimensions and headers, however the data table doesn't begin >until the 74th line in each file. Doing some searching, I have been able >to create an object with elements consisting of each file in the folder >containing the targets (please note this is my working directory): > >filenames<-list.files() >alldata<-lapply(filenames, read.csv, skip=73, header=TRUE) > >At this point I believe I have created an object with N elements (where >N=# >files in the wd), each containing the matrix I am attempting to sort. I >am >completely lost as to how I can sort each matrix based on a single column >(say, "Name") and then either overwrite the source files or write to a new >directory all of the sorted data. I half wonder if I should be creating >individual objects for each file that I read in, but I haven't been able >to >figure this out either. Please note that I am trying to sort these files >individually - would a loop be more efficient? > >I appreciate the help, >BustedAvi > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hello, Try the following. # Make some data alldata <- list(matrix(rnorm(12), ncol=3), matrix(sample(100), ncol=10)) (alldata <- lapply(alldata, function(x){colnames(x) <- c("Name", LETTERS[2:ncol(x)]); x})) # This does the trick all.order <- lapply(alldata, function(x) order(x[, "Name"])) lapply(seq.int(length(alldata)), function(i) alldata[[i]][all.order[[i]], ]) Hope this helps, Rui Barradas BustedAvi wrote> > Dear R help list, > > I am very new to R and I apologize in advance if this has been answered > before. I have done my best to google/R search what I need but no luck. > Here is what I am attempting: > > I have hundreds of .csv files that I need to sort based on a single column > of alphanumeric data. All of the files contain matrices that have > identical dimensions and headers, however the data table doesn't begin > until the 74th line in each file. Doing some searching, I have been able > to create an object with elements consisting of each file in the folder > containing the targets (please note this is my working directory): > > filenames<-list.files() > alldata<-lapply(filenames, read.csv, skip=73, header=TRUE) > > At this point I believe I have created an object with N elements (where > N=# > files in the wd), each containing the matrix I am attempting to sort. I > am > completely lost as to how I can sort each matrix based on a single column > (say, "Name") and then either overwrite the source files or write to a new > directory all of the sorted data. I half wonder if I should be creating > individual objects for each file that I read in, but I haven't been able > to > figure this out either. Please note that I am trying to sort these files > individually - would a loop be more efficient? > > I appreciate the help, > BustedAvi > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/Sort-across-multiple-csv-tp4630531p4630537.html Sent from the R help mailing list archive at Nabble.com.
On May 18, 2012, at 12:56 PM, Matthew Ouellette wrote:> Dear R help list, > > I am very new to R and I apologize in advance if this has been > answered > before. I have done my best to google/R search what I need but no > luck. > Here is what I am attempting: > > I have hundreds of .csv files that I need to sort based on a single > column > of alphanumeric data. All of the files contain matrices that have > identical dimensions and headers, however the data table doesn't begin > until the 74th line in each file. Doing some searching, I have been > able > to create an object with elements consisting of each file in the > folder > containing the targets (please note this is my working directory): > > filenames<-list.files() > alldata<-lapply(filenames, read.csv, skip=73, header=TRUE) > > At this point I believe I have created an object with N elements > (where N=# > files in the wd), each containing the matrix I am attempting to > sort. I am > completely lost as to how I can sort each matrixYou should learn to use precise terminology to refer to R objects. You have a list of dataframes (not matrices) You can loop over then and return a list of transformed (.e.g. sorted) dataframes: alldata <- lapply (alldata, function(x) x[order(x[["Name"]], ] )> based on a single column > (say, "Name") and then either overwriteThe above code would overwrite.> the source files or write to a new > directory all of the sorted data.If you didn't want it overwritten then assign it to a different name.> I half wonder if I should be creating > individual objects for each file that I read in, but I haven't been > able to > figure this out either.Much better to stick with lists.> Please note that I am trying to sort these files > individually - would a loop be more efficient?`lapply` is really a loop.> > I appreciate the help, > BustedAvi > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT