Hi all, I've got a script that generates a few moderate-size data frames, and then puts them together into one big data frame at the end in order to write that data frame to disk, so that it may be re-opened later on... I'm trying to trim down memory requirements in this script, so I was wondering if there was any way to append to a data frame already saved on disk (just like appending to a text file)..all the data frames here have identical row names; what I want to do is to tack on additional columns to a data frame stored in the working directory... Alternatively, is there another data structure that would allow me to do this (and could preferably be converted to a data frame) ? Thanks in advance, Ken
Duncan Murdoch
2005-Oct-11 20:38 UTC
[R] Any way to add to data frame saved as .rData file?
Ken Termiso wrote:> Hi all, > > I've got a script that generates a few moderate-size data frames, and then > puts them together into one big data frame at the end in order to write that > data frame to disk, so that it may be re-opened later on... > > I'm trying to trim down memory requirements in this script, so I was > wondering if there was any way to append to a data frame already saved on > disk (just like appending to a text file)..all the data frames here have > identical row names; what I want to do is to tack on additional columns to a > data frame stored in the working directory...No, I don't think so.> > Alternatively, is there another data structure that would allow me to do > this (and could preferably be converted to a data frame) ?I'd put the extra columns in their own data frame, and save that to disk (use dates/times/process ids or some other unique identifier in the filenames to distinguish them). When you need access to a mixture of columns, load (or source, depending how you did the save) the columns you need, and cbind them together into one big data frame. If you are concerned about memory requirements when producing the pieces, watch out that you don't write out so much data that you'll never have enough memory to load all you need at once. Duncan Murdoch
ugh! scan(what= does this... thx anyway,
Have you looked at the g.data package? It might be useful (but may still require some redesign of your dataset). Greg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care greg.snow at ihc.com (801) 408-8111>>> "Ken Termiso" <jerk_alert at hotmail.com> 10/13/05 08:14AM >>>> >I'd put the extra columns in their own data frame, and save that todisk>(use dates/times/process ids or some other unique identifier in the >filenames to distinguish them). When you need access to a mixture of>columns, load (or source, depending how you did the save) the columnsyou>need, and cbind them together into one big data frame. > >If you are concerned about memory requirements when producing thepieces,>watch out that you don't write out so much data that you'll never have>enough memory to load all you need at once. > >Duncan Murdochhmm...maybe i should just be dumping to a text file instead of a data frame..is there any way (without using a real SQL database) in R to create a file that i can selectively load certain columns from? if not, maybe i should break the data frame up into pieces (as you suggested) and create a separate file that keeps track of which columns are stored in which files (like a hashtable) and just load the small file of keys each time i need to load something.. whaddya think?? ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
thx everyone for your help...for simplicity, i elected to stay with a text file and transpose it so that each new row of data is really a column...in this transposed file, the header is really the row labels. the first cell has the name of the row labels ("RowID" in this case)... here's code for what i ended up doing, in case anyone wants it (or wants to improve it) : outfile <- mydata.txt zz <- file(outfile, "w") rowlabels <- c(1:10000) cat(c("RowID", rowlabels, "\n"), file = zz, sep = "\t") # make the first row of the file have the row labels grep_text <- function(s) # 's' is a unique string that is contained in the col or cols that you want { temp_header <- scan(file = outfile, what = list("RowID"), flush = TRUE) temp_header <- unlist(temp_header) g <- grep(toString(s), temp_header) # gives the row number in outfile with the data you want if(length(g)==1) { temp_file <- scan(file = outfile, what = character(), skip = g-1, nlines = 1) # temp_file = a vector temp_file <- temp_file[2:length(temp_file)] # drop title temp_file <- as.numeric(temp_file) # now this is num vector tf_df <- as.data.frame(temp_file) } if(length(g)>1) { for(i in 1:length(g)) { temp_file <- scan(file = outfile, what = character(), skip = g[i]-1, nlines = 1) temp_file <- temp_file[2:length(temp_file)] # drop title temp_file <- as.numeric(temp_file) # now this is num vector if(i==1) { tf_df <- as.data.frame(temp_file) } if(i!=1) { tf_df[i] <- temp_file } } } return(tf_df) } you would use grep_text(s) to return a data frame with column titles contained in the string s...if i had a column named "Year05_population" in the "mydata.txt" file, to return a data frame named 'df' with only that one column titles "Year05_population" i would simply type : outfile <- mydata.txt df <- grep_text("Year05_population")>From: "Greg Snow" <greg.snow at ihc.com> >To: jerk_alert at hotmail.com,murdoch at stats.uwo.ca >CC: gunter.berton at gene.com,r-help at stat.math.ethz.ch >Subject: Re: [R] Any way to add to data frame saved as .rData file? >Date: Thu, 13 Oct 2005 12:53:10 -0600 > >Have you looked at the g.data package? It might be useful >(but may still require some redesign of your dataset). > >Greg Snow, Ph.D. >Statistical Data Center, LDS Hospital >Intermountain Health Care >greg.snow at ihc.com >(801) 408-8111 > > >>> "Ken Termiso" <jerk_alert at hotmail.com> 10/13/05 08:14AM >>> > > > > >I'd put the extra columns in their own data frame, and save that to >disk > >(use dates/times/process ids or some other unique identifier in the > >filenames to distinguish them). When you need access to a mixture of > > >columns, load (or source, depending how you did the save) the columns >you > >need, and cbind them together into one big data frame. > > > >If you are concerned about memory requirements when producing the >pieces, > >watch out that you don't write out so much data that you'll never have > > >enough memory to load all you need at once. > > > >Duncan Murdoch > > >hmm...maybe i should just be dumping to a text file instead of a data >frame..is there any way (without using a real SQL database) in R to >create a >file that i can selectively load certain columns from? > >if not, maybe i should break the data frame up into pieces (as you >suggested) and create a separate file that keeps track of which columns >are >stored in which files (like a hashtable) and just load the small file >of >keys each time i need to load something.. > >whaddya think?? > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html >
Maybe Matching Threads
- Rounding fractional numbers to nearest fraction
- Problem building/checking library that requires input from user
- Problems with scan() in a tab-sep .txt file with cells that have '///' (three frontslashes)
- Isolating string containing only file name from complete path
- ordering a data frame to same order as a chr vector