thr3ads.net - R help - [R] Any way to add to data frame saved as .rData file? [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Ken Termiso

2005-Oct-11 17:29 UTC

[R] Any way to add to data frame saved as .rData file?

Hi all,

I've got a script that generates a few moderate-size data frames, and then 
puts them together into one big data frame at the end in order to write that 
data frame to disk, so that it may be re-opened later on...

I'm trying to trim down memory requirements in this script, so I was 
wondering if there was any way to append to a data frame already saved on 
disk (just like appending to a text file)..all the data frames here have 
identical row names; what I want to do is to tack on additional columns to a 
data frame stored in the working directory...

Alternatively, is there another data structure that would allow me to do 
this (and could preferably be converted to a data frame) ?

Thanks in advance,
Ken

Duncan Murdoch

2005-Oct-11 20:38 UTC

head link

[R] Any way to add to data frame saved as .rData file?

Ken Termiso wrote:> Hi all,
> 
> I've got a script that generates a few moderate-size data frames, and
then
> puts them together into one big data frame at the end in order to write
that
> data frame to disk, so that it may be re-opened later on...
> 
> I'm trying to trim down memory requirements in this script, so I was 
> wondering if there was any way to append to a data frame already saved on 
> disk (just like appending to a text file)..all the data frames here have 
> identical row names; what I want to do is to tack on additional columns to
a
> data frame stored in the working directory...
No, I don't think so.> 
> Alternatively, is there another data structure that would allow me to do 
> this (and could preferably be converted to a data frame) ?
I'd put the extra columns in their own data frame, and save that to disk 
(use dates/times/process ids or some other unique identifier in the 
filenames to distinguish them).  When you need access to a mixture of 
columns, load (or source, depending how you did the save) the columns 
you need, and cbind them together into one big data frame.

If you are concerned about memory requirements when producing the 
pieces, watch out that you don't write out so much data that you'll 
never have enough memory to load all you need at once.

Duncan Murdoch

Ken Termiso

2005-Oct-13 15:24 UTC

head link

[R] Any way to add to data frame saved as .rData file?

ugh!

scan(what=   does this...

thx anyway,

Greg Snow

2005-Oct-13 18:53 UTC

head link

[R] Any way to add to data frame saved as .rData file?

Have you looked at the g.data package?  It might be useful 
(but may still require some redesign of your dataset).

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111
>>> "Ken Termiso" <jerk_alert at hotmail.com> 10/13/05
08:14AM >>>
>
>I'd put the extra columns in their own data frame, and save that to
disk >(use dates/times/process ids or some other unique identifier in the 
>filenames to distinguish them).  When you need access to a mixture of
>columns, load (or source, depending how you did the save) the columns
you >need, and cbind them together into one big data frame.
>
>If you are concerned about memory requirements when producing the
pieces, >watch out that you don't write out so much data that you'll never
have
>enough memory to load all you need at once.
>
>Duncan Murdoch

hmm...maybe i should just be dumping to a text file instead of a data 
frame..is there any way (without using a real SQL database) in R to
create a 
file that i can selectively load certain columns from?

if not, maybe i should break the data frame up into pieces (as you 
suggested) and create a separate file that keeps track of which columns
are 
stored in which files (like a hashtable) and just load the small file
of 
keys each time i need to load something..

whaddya think??

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Ken Termiso

2005-Oct-24 16:29 UTC

head link

[R] Any way to add to data frame saved as .rData file?

thx everyone for your help...for simplicity, i elected to stay with a text 
file and transpose it so that each new row of data is really a column...in 
this transposed file, the header is really the row labels. the first cell 
has the name of the row labels ("RowID" in this case)...

here's code for what i ended up doing, in case anyone wants it (or wants to 
improve it) :


outfile <- mydata.txt

zz <- file(outfile, "w")

rowlabels <- c(1:10000)

cat(c("RowID", rowlabels, "\n"), file = zz, sep =
"\t")   # make the first
row of the file have the row labels

grep_text <- function(s)   # 's' is a unique string that is contained
in the
col or cols that you want
{
	temp_header <- scan(file = outfile, what = list("RowID"), flush =
TRUE)
	temp_header <- unlist(temp_header)
	g <- grep(toString(s), temp_header)  # gives the row number in outfile with 
the data you want

	if(length(g)==1)
	{
		temp_file <- scan(file = outfile, what = character(), skip = g-1, nlines = 
1)  # temp_file = a vector
		temp_file <- temp_file[2:length(temp_file)]  # drop title
		temp_file <- as.numeric(temp_file)  # now this is num vector
		tf_df <- as.data.frame(temp_file)
	}

	if(length(g)>1)
	{
		for(i in 1:length(g))
		{
			temp_file <- scan(file = outfile, what = character(), skip = g[i]-1, 
nlines = 1)
			temp_file <- temp_file[2:length(temp_file)]  # drop title
			temp_file <- as.numeric(temp_file)  # now this is num vector

			if(i==1)
			{
				tf_df <- as.data.frame(temp_file)
			}

			if(i!=1)
			{
				tf_df[i] <- temp_file
			}
		}
	}

	return(tf_df)
}


you would use grep_text(s) to return a data frame with column titles 
contained in the string s...if i had a column named
"Year05_population" in
the "mydata.txt" file, to return a data frame named 'df' with
only that one
column titles "Year05_population" i would simply type :

outfile <- mydata.txt
df <- grep_text("Year05_population")


>From: "Greg Snow" <greg.snow at ihc.com>
>To: jerk_alert at hotmail.com,murdoch at stats.uwo.ca
>CC: gunter.berton at gene.com,r-help at stat.math.ethz.ch
>Subject: Re: [R] Any way to add to data frame saved as .rData file?
>Date: Thu, 13 Oct 2005 12:53:10 -0600
>
>Have you looked at the g.data package?  It might be useful
>(but may still require some redesign of your dataset).
>
>Greg Snow, Ph.D.
>Statistical Data Center, LDS Hospital
>Intermountain Health Care
>greg.snow at ihc.com
>(801) 408-8111
>
> >>> "Ken Termiso" <jerk_alert at hotmail.com>
10/13/05 08:14AM >>>
>
> >
> >I'd put the extra columns in their own data frame, and save that to
>disk
> >(use dates/times/process ids or some other unique identifier in the
> >filenames to distinguish them).  When you need access to a mixture of
>
> >columns, load (or source, depending how you did the save) the columns
>you
> >need, and cbind them together into one big data frame.
> >
> >If you are concerned about memory requirements when producing the
>pieces,
> >watch out that you don't write out so much data that you'll
never have
>
> >enough memory to load all you need at once.
> >
> >Duncan Murdoch
>
>
>hmm...maybe i should just be dumping to a text file instead of a data
>frame..is there any way (without using a real SQL database) in R to
>create a
>file that i can selectively load certain columns from?
>
>if not, maybe i should break the data frame up into pieces (as you
>suggested) and create a separate file that keeps track of which columns
>are
>stored in which files (like a hashtable) and just load the small file
>of
>keys each time i need to load something..
>
>whaddya think??
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Oct 2005 - Any way to add to data frame saved as .rData file?

[R] Any way to add to data frame saved as .rData file?

[R] Any way to add to data frame saved as .rData file?

[R] Any way to add to data frame saved as .rData file?

[R] Any way to add to data frame saved as .rData file?

[R] Any way to add to data frame saved as .rData file?

Possibly Parallel Threads