Thomas Pujol
2007-Dec-06 17:10 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
R-help users,
Thanks in advance for any assistance ... I truly appreciate your expertise. I
searched help and could not figure this out, and think you can probably offer
some helpful tips. I apologize if I missed something, which I'm sure I
probably did.
I have data for many "samples". (e.g. 1950, 1951, 1952, etc.)
For each "sample", I have many data-frames. (e.g. temp.1952,
births.1952, gdp.1952, etc.)
(Because the data is rather "large" (and for other reasons), I have
chosen to store the data as individual files, as opposed to a list of data
frames.)
I wish to write a function that enables me to "run" any of many
custom "functions/processes" on each sample of data.
I currently accomplish this by using a custom function that uses:
"eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this
changes the "sample number" for each line of text I submit to
"eval(parse(t=text.i2))" ).
Is there a better/preferred/more flexible way to do this?
One issue/obstacle that I have encountered: Some of the custom functions I use
need to take as input the value of "d" in the loop below.
(Please see the sample function "fn.mn.d" below.)
#creates sample data
temp.1951 <- c(11,13,15)
births.1951 <- c(123, 156, 178)
temp.1952 <- c(21,23,25)
births.1952 <- c(223, 256, 278)
#######################
#function that looks for a a pattern "pat.i" within "x", and
replaces it with "rep"
recurse <- function(x, pat.i,rep.i) {
f <- function(x,pat,rep) if (mode(x) == "character") gsub(pat, rep,
x) else x
if (length(x) == 0) return(x)
if (is.list(x)) for(i in seq_along(x)) x[[i]] <- recurse(x[[i]],
pat.i,rep.i)
else x <- f(x,pat.i,rep.i)
x
#f <- function(x) if (mode(x) == "character") gsub("a",
"green", x) else x
}# end recurse end
#######################
#######################
#function that processes code submitted as "text.i" for each date in
"dates.i"
fn.dateloop <- function(text.i, dates.i ) {
for(d in 1: length(dates.i) ) {
tempdate <- dates.i[d]
text.i2 <- recurse(text.i, pat.i='#', rep.i=tempdate)
temp0=eval(parse(t=text.i2))
tempname <- paste(names(temp0)[1], tempdate, sep='.')
save(list='temp0', file = tempname)
} # next d
} # end fn.dateloop
#######################
#####################
#a sample custom function that I want to run on each sample of data
fn.mn <- function(x, y) {
res = x - y
names(res) = 'mn'
res
}
#####################
#####################
#example of function that takes d as input...
#I have not been able to get this to work with the custom function
"fn.dateloop" above
#I request assistance in learning how to accomplish this
fn.mn.d <- function(x, y, d) {x[d] - y[d]}
#####################
#####################
setwd('c:/') #specifies location where sample data will be saved
getwd() #checks location
fn.mn(x=temp.1951, y=births.1951)
fn.mn(x=temp.1952, y=births.1952)
#
fn.dateloop(text.i = "fn.mn(x=get('temp.#'),
y=get('births.#') )" , dates.i=c('1951','1952') )
get(load('mn.1951'))
get(load('mn.1952'))
---------------------------------
[[alternative HTML version deleted]]
Emmanuel Charpentier
2007-Dec-06 23:00 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Thomas Pujol a ?crit :> R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this?Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, "large" datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename="Births", rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the "R Data Import/Export guide"... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS... Furthermore, the variable "Year" bears your "d" information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier
Thomas Pujol
2007-Dec-07 15:12 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Emmanuel,
Thanks for your reply. Please allow me to clarify. I am already extensively
using a RDBMS and to store the data, and have used SQL and ODBC to extract the
data into a set of R-files. (I have experimented with this a bit, and for my
specific application, storing the data in R seems to improve speed and
convenience. For example, I can extract the data only once, store it as an
R-file, and then use the data an infinite number of times, whiteout ever again
needing to "hit" the RDBMS.)
What I am trying to do: I need to perform certain
operations/processes/custom-functions on each "sample". I can easily
write the code to do this, using a "FOR-loop". But I will then need
to have a separate loop for each process I want to run, and will re-write much
of the code within the "FOR-loop".
I have many different "processes" I might want to perform on each
sample on any given day. So instead of always re-writing the same loop, I want
to write a function that takes as its input the "process", and then
goes and runs it on each sample.
Thanks
From: Emmanuel Charpentier <charpent_at_bacbuc.dyndns.org>
Date: Fri, 07 Dec 2007 00:00:21 +0100
Thomas Pujol a écrit : > R-help users,
> Thanks in advance for any assistance ... I truly appreciate your expertise.
I searched help and could not figure this out, and think you can probably offer
some helpful tips. I apologize if I missed something, which I'm sure I
probably did.
>
> I have data for many "samples". (e.g. 1950, 1951, 1952, etc.)
>
> For each "sample", I have many data-frames. (e.g. temp.1952,
births.1952, gdp.1952, etc.)
>
> (Because the data is rather "large" (and for other reasons), I
have chosen to store the data as individual files, as opposed to a list of data
frames.)
>
> I wish to write a function that enables me to "run" any of many
custom "functions/processes" on each sample of data.
>
> I currently accomplish this by using a custom function that uses:
> "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)"
(this changes the "sample number" for each line of text I submit to
"eval(parse(t=text.i2))" ).
>
> Is there a better/preferred/more flexible way to do this?
Beware : what follows is the advice of someone used to use RDBMS and SQL to
work with data ; as anyone should know, everything is a nail to a man with a
hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat
piecewise a large dataset made of a large number of reasonably-sized independent
chunks. What you're trying to do seems to me a bit reinventing SAS macro
language. What's the point ? IMNSHO, "large" datasets that are
used only piecewise are much better handled in a real database (RDBMS), queried
at runtime via, for example, Brian Ripley's RODBC. In your example,
I'd create a table births with all your data + the relevant year. Out of the
top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is
probably more # apt than R dataframes for this kind of management
library(RODBC)
channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)
sqlSave(channel, tablename="Births",
rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),
births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))),
births.1953),
# ... ^W^Y ad nauseam ...
))
rm(births.1951, births.1952, ...) # get back breathing space Beware :
certain data types may be tricky to save ! I got bitten by Dates recently... See
RODBC documentation, your DBMS documentation and the "R Data Import/Export
guide"... At analysis time, you may use the result of the relevant query
exactly as one of your dataframes. instead of :
foo(... data=birth.1952, ...)
type :
foo(... data=sqlQuery(channel,"select * from \"Births\" where
\"Year\"=1952;", ...) # Syntax illustrating talking to a
"picky" DBMS... Furthermore, the variable "Year" bears
your "d" information. Problem (dis)solved. You may loop (or even
sapply()...) at will on d : for(year in 1952:1978) {
query<-sprintf("select * from \"Births\" where
\"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...)
...
} If you already use a DBMS with some connection to R (via RODBC or
otherwise), use that. If not, sqlite is a very lightweight library that enables
you to use a (very considerable) subset of SQL92 to manipulate your data. I
understand that some people of this list have undertaken the creation of a
sqlite-based package dedicated to this kind of large data management. HTH,
Emmanuel Charpentier
---------------------------------
---------------------------------
[[alternative HTML version deleted]]
Gabor Grothendieck
2007-Dec-07 18:41 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Use the same names (births, temp, ...) in each Rdata file and then load
each file into its own environment or proto object:
library(proto); x1951 <- proto() # or x1951 <- new.env()
load("1951.rda", envir = x1951)
Then pass the environment or proto object to each of your functions:
f <- function(x) x$difference <- x$births - x$temp
f(x1951)
The above completely avoids renaming variables and instead treats each
year as an object. If you use proto objects the home page
is: http://r-proto.googlecode.com
On Dec 6, 2007 12:10 PM, Thomas Pujol <thomas.pujol at yahoo.com>
wrote:> R-help users,
> Thanks in advance for any assistance ... I truly appreciate your
expertise. I searched help and could not figure this out, and think you can
probably offer some helpful tips. I apologize if I missed something, which
I'm sure I probably did.
>
> I have data for many "samples". (e.g. 1950, 1951, 1952, etc.)
>
> For each "sample", I have many data-frames. (e.g. temp.1952,
births.1952, gdp.1952, etc.)
>
> (Because the data is rather "large" (and for other reasons), I
have chosen to store the data as individual files, as opposed to a list of data
frames.)
>
> I wish to write a function that enables me to "run" any of many
custom "functions/processes" on each sample of data.
>
> I currently accomplish this by using a custom function that uses:
> "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)"
(this changes the "sample number" for each line of text I submit to
"eval(parse(t=text.i2))" ).
>
> Is there a better/preferred/more flexible way to do this?
>
> One issue/obstacle that I have encountered: Some of the custom functions I
use need to take as input the value of "d" in the loop below.
> (Please see the sample function "fn.mn.d" below.)
>
> #creates sample data
> temp.1951 <- c(11,13,15)
> births.1951 <- c(123, 156, 178)
> temp.1952 <- c(21,23,25)
> births.1952 <- c(223, 256, 278)
> #######################
> #function that looks for a a pattern "pat.i" within
"x", and replaces it with "rep"
> recurse <- function(x, pat.i,rep.i) {
> f <- function(x,pat,rep) if (mode(x) == "character") gsub(pat,
rep, x) else x
> if (length(x) == 0) return(x)
> if (is.list(x)) for(i in seq_along(x)) x[[i]] <- recurse(x[[i]],
pat.i,rep.i)
> else x <- f(x,pat.i,rep.i)
> x
> #f <- function(x) if (mode(x) == "character")
gsub("a", "green", x) else x
> }# end recurse end
> #######################
> #######################
> #function that processes code submitted as "text.i" for each date
in "dates.i"
> fn.dateloop <- function(text.i, dates.i ) {
> for(d in 1: length(dates.i) ) {
> tempdate <- dates.i[d]
> text.i2 <- recurse(text.i, pat.i='#', rep.i=tempdate)
> temp0=eval(parse(t=text.i2))
> tempname <- paste(names(temp0)[1], tempdate, sep='.')
> save(list='temp0', file = tempname)
> } # next d
> } # end fn.dateloop
> #######################
> #####################
> #a sample custom function that I want to run on each sample of data
> fn.mn <- function(x, y) {
> res = x - y
> names(res) = 'mn'
> res
> }
> #####################
> #####################
> #example of function that takes d as input...
> #I have not been able to get this to work with the custom function
"fn.dateloop" above
> #I request assistance in learning how to accomplish this
> fn.mn.d <- function(x, y, d) {x[d] - y[d]}
> #####################
> #####################
> setwd('c:/') #specifies location where sample data will be saved
> getwd() #checks location
> fn.mn(x=temp.1951, y=births.1951)
> fn.mn(x=temp.1952, y=births.1952)
> #
> fn.dateloop(text.i = "fn.mn(x=get('temp.#'),
y=get('births.#') )" , dates.i=c('1951','1952') )
> get(load('mn.1951'))
> get(load('mn.1952'))
>
>
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>