Hi All I want to import several .dat files every day of the week from the same folder. Let say, on day 1, I have about 100 files in the folder. By using this code, everything works perfectly (maybe there is a more efficient way to do it): filenames <-list.files(path="pathtofile", full.names=TRUE) library(plyr) import.list <- llply(filenames, read.table, header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE) # #MERGE and RESHAPE some of the files have different columns library(reshape) data3 <- merge_recurse(import.list) # At the end of each day I will export the file as a cvs file in a different folder. My question is how can I import on day 2 (day3, day4, etc) the new files in the folder but without importing the files already imported from the previous days. any help would be appreciated martin [[alternative HTML version deleted]]
On 04/22/2013 05:36 AM, Martin Lavoie wrote:> Hi All > > I want to import several .dat files every day of the week from the same > folder. > > Let say, on day 1, I have about 100 files in the folder. > > By using this code, everything works perfectly (maybe there is a more > efficient way to do it): > > filenames<-list.files(path="pathtofile", full.names=TRUE) > library(plyr) > import.list<- llply(filenames, read.table, header=TRUE, sep="", > na.strings="NA", dec=".", strip.white=TRUE) > # > #MERGE and RESHAPE some of the files have different columns > library(reshape) > data3<- merge_recurse(import.list) > # > At the end of each day I will export the file as a cvs file in a different > folder. > > My question is how can I import on day 2 (day3, day4, etc) the new files in > the folder but without importing the files already imported from the > previous days. >Hi Martin, Here is one method: # first set up a dummy file for the first run to avoid an error system("echo xxx > old.filenames.tab") # obviously don't do this again all.filenames<-list.files(path="pathtofile", full.names=TRUE) old.filenames<-read.table("old.filenames.tab") filenames<-all.filenames[!(all.filenames %in% old.filenames)] write.table(all.filenames,file="old.filenames.tab",row.names=FALSE) library(plyr) import.list<- llply(filenames, read.table, header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE) ... Jim
Jim Lemon showed one option of keeping a list of the files that have already been processed and comparing to this list to see which files are new. Here are a couple of other possibilities: You could move a file from the input folder to an archive folder after processing it (file.rename or system command) that way only new files to be processed will be in the input folder (and you still have all the files in the archive folder). You could use the file.info function to see which files have been accessed, modified, or created since a given time (saved time of the last processing, or just last 24 hours) and only process those files. Keep a list of which files have already been processed as Jim showed (I might use the setdiff function instead of %in%, but that is probably more a matter of preference). On Sun, Apr 21, 2013 at 1:36 PM, Martin Lavoie <martin21skifond@gmail.com>wrote:> Hi All > > I want to import several .dat files every day of the week from the same > folder. > > Let say, on day 1, I have about 100 files in the folder. > > By using this code, everything works perfectly (maybe there is a more > efficient way to do it): > > filenames <-list.files(path="pathtofile", full.names=TRUE) > library(plyr) > import.list <- llply(filenames, read.table, header=TRUE, sep="", > na.strings="NA", dec=".", strip.white=TRUE) > # > #MERGE and RESHAPE some of the files have different columns > library(reshape) > data3 <- merge_recurse(import.list) > # > At the end of each day I will export the file as a cvs file in a different > folder. > > My question is how can I import on day 2 (day3, day4, etc) the new files in > the folder but without importing the files already imported from the > previous days. > > any help would be appreciated > martin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280@gmail.com [[alternative HTML version deleted]]