Hi R list,
I would like to automate, or speed up the process from which I take
several separate datasets, stored in .csv formate, import and merge
them by a common variable. So far I have greatly sped up the loading
process but cannot think of a way to automate the merger of all
datasets into a common data.frame.
My apologies if this has been covered, any R search suggestions are
appreciated.
# All scripts function out of the base directory
rm(list=ls())
setwd('/Users/myuser/Documents/workfolder/')
# Check files and list all .csv in directory
files<-list.files()
files<-files[grep('.csv', files)]
# Create labels for each file (ex. June08.csv becomes June08)
labels<-gsub('.csv', '', files)
# Load all .csv datasets and assign name
item<-vector() # preallocate an index of all items in datasets
for(i in 1:length(files)){
X<-read.csv(files[i])
item<-union(item, X$Item_Name)
assign(labels[i], X)
}
# What is loaded
ls()
# [1] "files" "i" "item"
"June01" "June02" "June03"
"labels"
# What does everything look like?
str(June03)
#'data.frame': 992 obs. of 8 variables:
# $ Item_Name : Factor w/ 992 levels
"Birds","Fish",..: 1 2 3 4
5 6 7 8 9 10 ...
# $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ...
str(June01)
#'data.frame': 819 obs. of 8 variables:
# $ Item_Name : Factor w/ 819 levels
"Birds","Turtles",..: 1 2
3 4 5 6 7 8 9 10 ...
# $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ...
# Here is where I'm stuck...
#I would like to:
# Create a data.frame with an index column composed of the union of
all items
# Create columns in the frame by a merger of the 'Occurance' in each
loaded dataset and are labeled by their name (eg. June01)
# Automate this procedure so that I do not have to manuualy type in
each column addition when I have a new dataset.
# This is my current strategy, but when I have new datasets I have to
mannually setup the preallocation and merger
allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA)
allData[match(June01$Item_Name, allData$Item ),]$June01 <-
June01$Occurance
allData[match(June02$Item_Name, allData$Item ),]$June02 <-
June02$Occurance
allData[match(June03$Item_Name, allData$Item ),]$June03 <-
June03$Occurance
# Any help to automate this process is greatly appreciated!!!
sessionInfo()
#R version 2.9.0 (2009-04-17)
#i386-apple-darwin8.11.1
#
#locale:
#en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloehrke@umassd.edu
T 508-910-6393
F 508-910-6396
[[alternative HTML version deleted]]
See if this works for you:
# read into a list and then rbind to single data frame
input <- do.call(rbind, lapply(files, function(.file){
X <- read.csv(.file)
X$label <- gsub('.csv$', '', .file) # add name
X
}))
# use the reshape package
require(reshape)
i.melt <- melt(input, id=c("label", "Item_name"),
measure="Occurance")
output <- cast(i.melt, Item_name ~ label)
On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:
> Hi R list,
> I would like to automate, or speed up the process from which I take
> several separate datasets, stored in .csv formate, import and merge
> them by a common variable. So far I have greatly sped up the loading
> process but cannot think of a way to automate the merger of all
> datasets into a common data.frame.
> My apologies if this has been covered, any R search suggestions are
> appreciated.
>
> # All scripts function out of the base directory
> rm(list=ls())
> setwd('/Users/myuser/Documents/workfolder/')
>
> # Check files and list all .csv in directory
> files<-list.files()
> files<-files[grep('.csv', files)]
> # Create labels for each file (ex. June08.csv becomes June08)
> labels<-gsub('.csv', '', files)
>
> # Load all .csv datasets and assign name
>
> item<-vector() # preallocate an index of all items in datasets
> for(i in 1:length(files)){
> X<-read.csv(files[i])
> item<-union(item, X$Item_Name)
> assign(labels[i], X)
> }
> # What is loaded
> ls()
> # [1] "files" "i" "item"
"June01" "June02" "June03"
> "labels"
>
> # What does everything look like?
> str(June03)
> #'data.frame': 992 obs. of 8 variables:
> # $ Item_Name : Factor w/ 992 levels
"Birds","Fish",..: 1 2 3 4
> 5 6 7 8 9 10 ...
> # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ...
>
> str(June01)
> #'data.frame': 819 obs. of 8 variables:
> # $ Item_Name : Factor w/ 819 levels
"Birds","Turtles",..: 1 2
> 3 4 5 6 7 8 9 10 ...
> # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ...
>
> # Here is where I'm stuck...
> #I would like to:
> # Create a data.frame with an index column composed of the union of
> all items
> # Create columns in the frame by a merger of the 'Occurance'
in each
> loaded dataset and are labeled by their name (eg. June01)
> # Automate this procedure so that I do not have to manuualy type in
> each column addition when I have a new dataset.
>
> # This is my current strategy, but when I have new datasets I have to
> mannually setup the preallocation and merger
>
> allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA)
> allData[match(June01$Item_Name, allData$Item ),]$June01 <-
> June01$Occurance
> allData[match(June02$Item_Name, allData$Item ),]$June02 <-
> June02$Occurance
> allData[match(June03$Item_Name, allData$Item ),]$June03 <-
> June03$Occurance
>
> # Any help to automate this process is greatly appreciated!!!
>
> sessionInfo()
> #R version 2.9.0 (2009-04-17)
> #i386-apple-darwin8.11.1
> #
> #locale:
> #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> #
> #attached base packages:
> #[1] stats graphics grDevices utils datasets methods base
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> jloehrke@umassd.edu
> T 508-910-6393
> F 508-910-6396
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
[[alternative HTML version deleted]]
I think the last should be: output <- cast(i.melt, Item_name ~ label, sum) On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:> Hi R list, > I would like to automate, or speed up the process from which I take > several separate datasets, stored in .csv formate, import and merge > them by a common variable. So far I have greatly sped up the loading > process but cannot think of a way to automate the merger of all > datasets into a common data.frame. > My apologies if this has been covered, any R search suggestions are > appreciated. > > # All scripts function out of the base directory > rm(list=ls()) > setwd('/Users/myuser/Documents/workfolder/') > > # Check files and list all .csv in directory > files<-list.files() > files<-files[grep('.csv', files)] > # Create labels for each file (ex. June08.csv becomes June08) > labels<-gsub('.csv', '', files) > > # Load all .csv datasets and assign name > > item<-vector() # preallocate an index of all items in datasets > for(i in 1:length(files)){ > X<-read.csv(files[i]) > item<-union(item, X$Item_Name) > assign(labels[i], X) > } > # What is loaded > ls() > # [1] "files" "i" "item" "June01" "June02" "June03" > "labels" > > # What does everything look like? > str(June03) > #'data.frame': 992 obs. of 8 variables: > # $ Item_Name : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4 > 5 6 7 8 9 10 ... > # $ Occurance : int 30 30 50 450 75 550 100 500 250 75 ... > > str(June01) > #'data.frame': 819 obs. of 8 variables: > # $ Item_Name : Factor w/ 819 levels "Birds","Turtles",..: 1 2 > 3 4 5 6 7 8 9 10 ... > # $ Occurance : int 30 50 450 750 550 100 500 250 275 450 ... > > # Here is where I'm stuck... > #I would like to: > # Create a data.frame with an index column composed of the union of > all items > # Create columns in the frame by a merger of the 'Occurance' in each > loaded dataset and are labeled by their name (eg. June01) > # Automate this procedure so that I do not have to manuualy type in > each column addition when I have a new dataset. > > # This is my current strategy, but when I have new datasets I have to > mannually setup the preallocation and merger > > allData<-data.frame(Item=item, June01 =NA, June02=NA, June03 =NA) > allData[match(June01$Item_Name, allData$Item ),]$June01 <- > June01$Occurance > allData[match(June02$Item_Name, allData$Item ),]$June02 <- > June02$Occurance > allData[match(June03$Item_Name, allData$Item ),]$June03 <- > June03$Occurance > > # Any help to automate this process is greatly appreciated!!! > > sessionInfo() > #R version 2.9.0 (2009-04-17) > #i386-apple-darwin8.11.1 > # > #locale: > #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > # > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > > > Jon Loehrke > Graduate Research Assistant > Department of Fisheries Oceanography > School for Marine Science and Technology > University of Massachusetts > 200 Mill Road, Suite 325 > Fairhaven, MA 02719 > jloehrke@umassd.edu > T 508-910-6393 > F 508-910-6396 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]