Hi R list, I would like to automate, or speed up the process from which I take several separate datasets, stored in .csv formate, import and merge them by a common variable. So far I have greatly sped up the loading process but cannot think of a way to automate the merger of all datasets into a common data.frame. My apologies if this has been covered, any R search suggestions are appreciated. # All scripts function out of the base directory rm(list=ls()) setwd('/Users/myuser/Documents/workfolder/') # Check files and list all .csv in directory files<-list.files() files<-files[grep('.csv', files)] # Create labels for each file (ex. June08.csv becomes June08) labels<-gsub('.csv', '', files) # Load all .csv datasets and assign name item<-vector() # preallocate an index of all items in datasets for(i in 1:length(files)){ X<-read.csv(files[i]) item<-union(item, X$Item_Name) assign(labels[i], X) } # What is loaded ls() # [1] "files" "i" "item" "June01" "June02" "June03" "labels" # What does everything look like? str(June03) #'data.frame': 992 obs. of 8 variables: # $ Item_Name : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4 5 6 7 8 9 10 ... # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ... str(June01) #'data.frame': 819 obs. of 8 variables: # $ Item_Name : Factor w/ 819 levels "Birds","Turtles",..: 1 2 3 4 5 6 7 8 9 10 ... # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ... # Here is where I'm stuck... #I would like to: # Create a data.frame with an index column composed of the union of all items # Create columns in the frame by a merger of the 'Occurance' in each loaded dataset and are labeled by their name (eg. June01) # Automate this procedure so that I do not have to manuualy type in each column addition when I have a new dataset. # This is my current strategy, but when I have new datasets I have to mannually setup the preallocation and merger allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA) allData[match(June01$Item_Name, allData$Item ),]$June01 <- June01$Occurance allData[match(June02$Item_Name, allData$Item ),]$June02 <- June02$Occurance allData[match(June03$Item_Name, allData$Item ),]$June03 <- June03$Occurance # Any help to automate this process is greatly appreciated!!! sessionInfo() #R version 2.9.0 (2009-04-17) #i386-apple-darwin8.11.1 # #locale: #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base Jon Loehrke Graduate Research Assistant Department of Fisheries Oceanography School for Marine Science and Technology University of Massachusetts 200 Mill Road, Suite 325 Fairhaven, MA 02719 jloehrke@umassd.edu T 508-910-6393 F 508-910-6396 [[alternative HTML version deleted]]
See if this works for you: # read into a list and then rbind to single data frame input <- do.call(rbind, lapply(files, function(.file){ X <- read.csv(.file) X$label <- gsub('.csv$', '', .file) # add name X })) # use the reshape package require(reshape) i.melt <- melt(input, id=c("label", "Item_name"), measure="Occurance") output <- cast(i.melt, Item_name ~ label) On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:> Hi R list, > I would like to automate, or speed up the process from which I take > several separate datasets, stored in .csv formate, import and merge > them by a common variable. So far I have greatly sped up the loading > process but cannot think of a way to automate the merger of all > datasets into a common data.frame. > My apologies if this has been covered, any R search suggestions are > appreciated. > > # All scripts function out of the base directory > rm(list=ls()) > setwd('/Users/myuser/Documents/workfolder/') > > # Check files and list all .csv in directory > files<-list.files() > files<-files[grep('.csv', files)] > # Create labels for each file (ex. June08.csv becomes June08) > labels<-gsub('.csv', '', files) > > # Load all .csv datasets and assign name > > item<-vector() # preallocate an index of all items in datasets > for(i in 1:length(files)){ > X<-read.csv(files[i]) > item<-union(item, X$Item_Name) > assign(labels[i], X) > } > # What is loaded > ls() > # [1] "files" "i" "item" "June01" "June02" "June03" > "labels" > > # What does everything look like? > str(June03) > #'data.frame': 992 obs. of 8 variables: > # $ Item_Name : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4 > 5 6 7 8 9 10 ... > # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ... > > str(June01) > #'data.frame': 819 obs. of 8 variables: > # $ Item_Name : Factor w/ 819 levels "Birds","Turtles",..: 1 2 > 3 4 5 6 7 8 9 10 ... > # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ... > > # Here is where I'm stuck... > #I would like to: > # Create a data.frame with an index column composed of the union of > all items > # Create columns in the frame by a merger of the 'Occurance' in each > loaded dataset and are labeled by their name (eg. June01) > # Automate this procedure so that I do not have to manuualy type in > each column addition when I have a new dataset. > > # This is my current strategy, but when I have new datasets I have to > mannually setup the preallocation and merger > > allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA) > allData[match(June01$Item_Name, allData$Item ),]$June01 <- > June01$Occurance > allData[match(June02$Item_Name, allData$Item ),]$June02 <- > June02$Occurance > allData[match(June03$Item_Name, allData$Item ),]$June03 <- > June03$Occurance > > # Any help to automate this process is greatly appreciated!!! > > sessionInfo() > #R version 2.9.0 (2009-04-17) > #i386-apple-darwin8.11.1 > # > #locale: > #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > # > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > > > Jon Loehrke > Graduate Research Assistant > Department of Fisheries Oceanography > School for Marine Science and Technology > University of Massachusetts > 200 Mill Road, Suite 325 > Fairhaven, MA 02719 > jloehrke@umassd.edu > T 508-910-6393 > F 508-910-6396 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
I think the last should be: output <- cast(i.melt, Item_name ~ label, sum) On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:> Hi R list, > I would like to automate, or speed up the process from which I take > several separate datasets, stored in .csv formate, import and merge > them by a common variable. So far I have greatly sped up the loading > process but cannot think of a way to automate the merger of all > datasets into a common data.frame. > My apologies if this has been covered, any R search suggestions are > appreciated. > > # All scripts function out of the base directory > rm(list=ls()) > setwd('/Users/myuser/Documents/workfolder/') > > # Check files and list all .csv in directory > files<-list.files() > files<-files[grep('.csv', files)] > # Create labels for each file (ex. June08.csv becomes June08) > labels<-gsub('.csv', '', files) > > # Load all .csv datasets and assign name > > item<-vector() # preallocate an index of all items in datasets > for(i in 1:length(files)){ > X<-read.csv(files[i]) > item<-union(item, X$Item_Name) > assign(labels[i], X) > } > # What is loaded > ls() > # [1] "files" "i" "item" "June01" "June02" "June03" > "labels" > > # What does everything look like? > str(June03) > #'data.frame': 992 obs. of 8 variables: > # $ Item_Name : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4 > 5 6 7 8 9 10 ... > # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ... > > str(June01) > #'data.frame': 819 obs. of 8 variables: > # $ Item_Name : Factor w/ 819 levels "Birds","Turtles",..: 1 2 > 3 4 5 6 7 8 9 10 ... > # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ... > > # Here is where I'm stuck... > #I would like to: > # Create a data.frame with an index column composed of the union of > all items > # Create columns in the frame by a merger of the 'Occurance' in each > loaded dataset and are labeled by their name (eg. June01) > # Automate this procedure so that I do not have to manuualy type in > each column addition when I have a new dataset. > > # This is my current strategy, but when I have new datasets I have to > mannually setup the preallocation and merger > > allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA) > allData[match(June01$Item_Name, allData$Item ),]$June01 <- > June01$Occurance > allData[match(June02$Item_Name, allData$Item ),]$June02 <- > June02$Occurance > allData[match(June03$Item_Name, allData$Item ),]$June03 <- > June03$Occurance > > # Any help to automate this process is greatly appreciated!!! > > sessionInfo() > #R version 2.9.0 (2009-04-17) > #i386-apple-darwin8.11.1 > # > #locale: > #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > # > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > > > Jon Loehrke > Graduate Research Assistant > Department of Fisheries Oceanography > School for Marine Science and Technology > University of Massachusetts > 200 Mill Road, Suite 325 > Fairhaven, MA 02719 > jloehrke@umassd.edu > T 508-910-6393 > F 508-910-6396 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]