thr3ads.net - R help - [R] Automate a data load and merge [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Jon Loehrke

2009-Jun-12 13:27 UTC

[R] Automate a data load and merge

Hi R list,
	I would like to automate, or speed up the process from which I take  
several separate datasets, stored in .csv formate, import and merge  
them by a common variable.  So far I have greatly sped up the loading  
process but cannot think of a way to automate the merger of all  
datasets into a common data.frame.
	My apologies if this has been covered, any R search suggestions are  
appreciated.

# All scripts function out of the base directory
rm(list=ls())
setwd('/Users/myuser/Documents/workfolder/')

# Check files and list all .csv in directory
files<-list.files()
files<-files[grep('.csv', files)]
# Create labels for each file (ex. June08.csv becomes June08)
labels<-gsub('.csv', '', files)

# Load all .csv datasets and assign name

item<-vector() # preallocate an index of all items in datasets
for(i in 1:length(files)){
	X<-read.csv(files[i])
	item<-union(item, X$Item_Name)
	assign(labels[i], X)		
	}
# What is loaded
ls()
# [1] "files"    "i"        "item"    
"June01" "June02" "June03"
"labels"

# What does everything look like?
str(June03)
#'data.frame':	992 obs. of  8 variables:
# $ Item_Name        : Factor w/ 992 levels
"Birds","Fish",..: 1 2 3 4
5 6 7 8 9 10 ...
# $ Occurance     : int  30 30 50 450 75 550 100 500 250 75 ...

str(June01)
#'data.frame':	819 obs. of  8 variables:
# $ Item_Name        : Factor w/ 819 levels
"Birds","Turtles",..: 1 2
3 4 5 6 7 8 9 10 ...
# $ Occurance     : int  30 50 450 750 550 100 500 250 275 450 ...

# Here is where I'm stuck...
#I would like to:
#	Create a data.frame with an index column composed of the union of  
all items
#	Create columns in the frame by a merger of the 'Occurance' in each  
loaded dataset and are labeled by their name (eg. June01)
#	Automate this procedure so that I do not have to manuualy type in  
each column addition when I have a new dataset.
	
# This is my current strategy, but when I have new datasets I have to  
mannually setup the preallocation and merger

allData<-data.frame(Item=item, June01 =NA, June02=NA,  June03 =NA)
allData[match(June01$Item_Name, allData$Item ),]$June01 <-  
June01$Occurance
allData[match(June02$Item_Name, allData$Item ),]$June02 <-  
June02$Occurance
allData[match(June03$Item_Name, allData$Item ),]$June03 <-  
June03$Occurance

# Any help to automate this process is greatly appreciated!!!

sessionInfo()
#R version 2.9.0 (2009-04-17)
#i386-apple-darwin8.11.1
#
#locale:
#en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base


Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloehrke@umassd.edu
T 508-910-6393
F 508-910-6396


	[[alternative HTML version deleted]]

jim holtman

2009-Jun-12 16:25 UTC

head link

[R] Automate a data load and merge

See if this works for you:

# read into a list and then rbind to single data frame
input <- do.call(rbind, lapply(files, function(.file){
    X <- read.csv(.file)
    X$label <- gsub('.csv$', '', .file)  # add name
    X
}))
# use the reshape package
require(reshape)
i.melt <- melt(input, id=c("label", "Item_name"),
measure="Occurance")
output <- cast(i.melt, Item_name ~ label)



On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:
> Hi R list,
>        I would like to automate, or speed up the process from which I take
> several separate datasets, stored in .csv formate, import and merge
> them by a common variable.  So far I have greatly sped up the loading
> process but cannot think of a way to automate the merger of all
> datasets into a common data.frame.
>        My apologies if this has been covered, any R search suggestions are
> appreciated.
>
> # All scripts function out of the base directory
> rm(list=ls())
> setwd('/Users/myuser/Documents/workfolder/')
>
> # Check files and list all .csv in directory
> files<-list.files()
> files<-files[grep('.csv', files)]
> # Create labels for each file (ex. June08.csv becomes June08)
> labels<-gsub('.csv', '', files)
>
> # Load all .csv datasets and assign name
>
> item<-vector() # preallocate an index of all items in datasets
> for(i in 1:length(files)){
>        X<-read.csv(files[i])
>        item<-union(item, X$Item_Name)
>        assign(labels[i], X)
>        }
> # What is loaded
> ls()
> # [1] "files"    "i"        "item"    
"June01" "June02" "June03"
> "labels"
>
> # What does everything look like?
> str(June03)
> #'data.frame':  992 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 992 levels
"Birds","Fish",..: 1 2 3 4
> 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 30 50 450 75 550 100 500 250 75 ...
>
> str(June01)
> #'data.frame':  819 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 819 levels
"Birds","Turtles",..: 1 2
> 3 4 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 50 450 750 550 100 500 250 275 450 ...
>
> # Here is where I'm stuck...
> #I would like to:
> #       Create a data.frame with an index column composed of the union of
> all items
> #       Create columns in the frame by a merger of the 'Occurance'
in each
> loaded dataset and are labeled by their name (eg. June01)
> #       Automate this procedure so that I do not have to manuualy type in
> each column addition when I have a new dataset.
>
> # This is my current strategy, but when I have new datasets I have to
> mannually setup the preallocation and merger
>
> allData<-data.frame(Item=item, June01 =NA, June02=NA,  June03 =NA)
> allData[match(June01$Item_Name, allData$Item ),]$June01 <-
> June01$Occurance
> allData[match(June02$Item_Name, allData$Item ),]$June02 <-
> June02$Occurance
> allData[match(June03$Item_Name, allData$Item ),]$June03 <-
> June03$Occurance
>
> # Any help to automate this process is greatly appreciated!!!
>
> sessionInfo()
> #R version 2.9.0 (2009-04-17)
> #i386-apple-darwin8.11.1
> #
> #locale:
> #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> #
> #attached base packages:
> #[1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> jloehrke@umassd.edu
> T 508-910-6393
> F 508-910-6396
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

jim holtman

2009-Jun-12 16:27 UTC

head link

[R] Automate a data load and merge

I think the last should be:

output <- cast(i.melt, Item_name ~ label, sum)




On Fri, Jun 12, 2009 at 9:27 AM, Jon Loehrke <jloehrke@umassd.edu> wrote:
> Hi R list,
>        I would like to automate, or speed up the process from which I take
> several separate datasets, stored in .csv formate, import and merge
> them by a common variable.  So far I have greatly sped up the loading
> process but cannot think of a way to automate the merger of all
> datasets into a common data.frame.
>        My apologies if this has been covered, any R search suggestions are
> appreciated.
>
> # All scripts function out of the base directory
> rm(list=ls())
> setwd('/Users/myuser/Documents/workfolder/')
>
> # Check files and list all .csv in directory
> files<-list.files()
> files<-files[grep('.csv', files)]
> # Create labels for each file (ex. June08.csv becomes June08)
> labels<-gsub('.csv', '', files)
>
> # Load all .csv datasets and assign name
>
> item<-vector() # preallocate an index of all items in datasets
> for(i in 1:length(files)){
>        X<-read.csv(files[i])
>        item<-union(item, X$Item_Name)
>        assign(labels[i], X)
>        }
> # What is loaded
> ls()
> # [1] "files"    "i"        "item"    
"June01" "June02" "June03"
> "labels"
>
> # What does everything look like?
> str(June03)
> #'data.frame':  992 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 992 levels
"Birds","Fish",..: 1 2 3 4
> 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 30 50 450 75 550 100 500 250 75 ...
>
> str(June01)
> #'data.frame':  819 obs. of  8 variables:
> # $ Item_Name        : Factor w/ 819 levels
"Birds","Turtles",..: 1 2
> 3 4 5 6 7 8 9 10 ...
> # $ Occurance     : int  30 50 450 750 550 100 500 250 275 450 ...
>
> # Here is where I'm stuck...
> #I would like to:
> #       Create a data.frame with an index column composed of the union of
> all items
> #       Create columns in the frame by a merger of the 'Occurance'
in each
> loaded dataset and are labeled by their name (eg. June01)
> #       Automate this procedure so that I do not have to manuualy type in
> each column addition when I have a new dataset.
>
> # This is my current strategy, but when I have new datasets I have to
> mannually setup the preallocation and merger
>
> allData<-data.frame(Item=item, June01 =NA, June02=NA,  June03 =NA)
> allData[match(June01$Item_Name, allData$Item ),]$June01 <-
> June01$Occurance
> allData[match(June02$Item_Name, allData$Item ),]$June02 <-
> June02$Occurance
> allData[match(June03$Item_Name, allData$Item ),]$June03 <-
> June03$Occurance
>
> # Any help to automate this process is greatly appreciated!!!
>
> sessionInfo()
> #R version 2.9.0 (2009-04-17)
> #i386-apple-darwin8.11.1
> #
> #locale:
> #en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> #
> #attached base packages:
> #[1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> jloehrke@umassd.edu
> T 508-910-6393
> F 508-910-6396
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jun 2009 - Automate a data load and merge

[R] Automate a data load and merge

[R] Automate a data load and merge

[R] Automate a data load and merge

Possibly Parallel Threads