Thomas Pujol
2007-Dec-06 17:10 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
R-help users, Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. I currently accomplish this by using a custom function that uses: "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). Is there a better/preferred/more flexible way to do this? One issue/obstacle that I have encountered: Some of the custom functions I use need to take as input the value of "d" in the loop below. (Please see the sample function "fn.mn.d" below.) #creates sample data temp.1951 <- c(11,13,15) births.1951 <- c(123, 156, 178) temp.1952 <- c(21,23,25) births.1952 <- c(223, 256, 278) ####################### #function that looks for a a pattern "pat.i" within "x", and replaces it with "rep" recurse <- function(x, pat.i,rep.i) { f <- function(x,pat,rep) if (mode(x) == "character") gsub(pat, rep, x) else x if (length(x) == 0) return(x) if (is.list(x)) for(i in seq_along(x)) x[[i]] <- recurse(x[[i]], pat.i,rep.i) else x <- f(x,pat.i,rep.i) x #f <- function(x) if (mode(x) == "character") gsub("a", "green", x) else x }# end recurse end ####################### ####################### #function that processes code submitted as "text.i" for each date in "dates.i" fn.dateloop <- function(text.i, dates.i ) { for(d in 1: length(dates.i) ) { tempdate <- dates.i[d] text.i2 <- recurse(text.i, pat.i='#', rep.i=tempdate) temp0=eval(parse(t=text.i2)) tempname <- paste(names(temp0)[1], tempdate, sep='.') save(list='temp0', file = tempname) } # next d } # end fn.dateloop ####################### ##################### #a sample custom function that I want to run on each sample of data fn.mn <- function(x, y) { res = x - y names(res) = 'mn' res } ##################### ##################### #example of function that takes d as input... #I have not been able to get this to work with the custom function "fn.dateloop" above #I request assistance in learning how to accomplish this fn.mn.d <- function(x, y, d) {x[d] - y[d]} ##################### ##################### setwd('c:/') #specifies location where sample data will be saved getwd() #checks location fn.mn(x=temp.1951, y=births.1951) fn.mn(x=temp.1952, y=births.1952) # fn.dateloop(text.i = "fn.mn(x=get('temp.#'), y=get('births.#') )" , dates.i=c('1951','1952') ) get(load('mn.1951')) get(load('mn.1952')) --------------------------------- [[alternative HTML version deleted]]
Emmanuel Charpentier
2007-Dec-06 23:00 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Thomas Pujol a ?crit :> R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this?Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, "large" datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename="Births", rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the "R Data Import/Export guide"... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS... Furthermore, the variable "Year" bears your "d" information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier
Thomas Pujol
2007-Dec-07 15:12 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Emmanuel, Thanks for your reply. Please allow me to clarify. I am already extensively using a RDBMS and to store the data, and have used SQL and ODBC to extract the data into a set of R-files. (I have experimented with this a bit, and for my specific application, storing the data in R seems to improve speed and convenience. For example, I can extract the data only once, store it as an R-file, and then use the data an infinite number of times, whiteout ever again needing to "hit" the RDBMS.) What I am trying to do: I need to perform certain operations/processes/custom-functions on each "sample". I can easily write the code to do this, using a "FOR-loop". But I will then need to have a separate loop for each process I want to run, and will re-write much of the code within the "FOR-loop". I have many different "processes" I might want to perform on each sample on any given day. So instead of always re-writing the same loop, I want to write a function that takes as its input the "process", and then goes and runs it on each sample. Thanks From: Emmanuel Charpentier <charpent_at_bacbuc.dyndns.org> Date: Fri, 07 Dec 2007 00:00:21 +0100 Thomas Pujol a écrit :> R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this?Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, "large" datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename="Births", rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the "R Data Import/Export guide"... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS... Furthermore, the variable "Year" bears your "d" information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier --------------------------------- --------------------------------- [[alternative HTML version deleted]]
Gabor Grothendieck
2007-Dec-07 18:41 UTC
[R] using "eval(parse(text)) " , gsub(pattern, replacement, x) , to process "code" within a loop/custom function
Use the same names (births, temp, ...) in each Rdata file and then load each file into its own environment or proto object: library(proto); x1951 <- proto() # or x1951 <- new.env() load("1951.rda", envir = x1951) Then pass the environment or proto object to each of your functions: f <- function(x) x$difference <- x$births - x$temp f(x1951) The above completely avoids renaming variables and instead treats each year as an object. If you use proto objects the home page is: http://r-proto.googlecode.com On Dec 6, 2007 12:10 PM, Thomas Pujol <thomas.pujol at yahoo.com> wrote:> R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. I searched help and could not figure this out, and think you can probably offer some helpful tips. I apologize if I missed something, which I'm sure I probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this? > > One issue/obstacle that I have encountered: Some of the custom functions I use need to take as input the value of "d" in the loop below. > (Please see the sample function "fn.mn.d" below.) > > #creates sample data > temp.1951 <- c(11,13,15) > births.1951 <- c(123, 156, 178) > temp.1952 <- c(21,23,25) > births.1952 <- c(223, 256, 278) > ####################### > #function that looks for a a pattern "pat.i" within "x", and replaces it with "rep" > recurse <- function(x, pat.i,rep.i) { > f <- function(x,pat,rep) if (mode(x) == "character") gsub(pat, rep, x) else x > if (length(x) == 0) return(x) > if (is.list(x)) for(i in seq_along(x)) x[[i]] <- recurse(x[[i]], pat.i,rep.i) > else x <- f(x,pat.i,rep.i) > x > #f <- function(x) if (mode(x) == "character") gsub("a", "green", x) else x > }# end recurse end > ####################### > ####################### > #function that processes code submitted as "text.i" for each date in "dates.i" > fn.dateloop <- function(text.i, dates.i ) { > for(d in 1: length(dates.i) ) { > tempdate <- dates.i[d] > text.i2 <- recurse(text.i, pat.i='#', rep.i=tempdate) > temp0=eval(parse(t=text.i2)) > tempname <- paste(names(temp0)[1], tempdate, sep='.') > save(list='temp0', file = tempname) > } # next d > } # end fn.dateloop > ####################### > ##################### > #a sample custom function that I want to run on each sample of data > fn.mn <- function(x, y) { > res = x - y > names(res) = 'mn' > res > } > ##################### > ##################### > #example of function that takes d as input... > #I have not been able to get this to work with the custom function "fn.dateloop" above > #I request assistance in learning how to accomplish this > fn.mn.d <- function(x, y, d) {x[d] - y[d]} > ##################### > ##################### > setwd('c:/') #specifies location where sample data will be saved > getwd() #checks location > fn.mn(x=temp.1951, y=births.1951) > fn.mn(x=temp.1952, y=births.1952) > # > fn.dateloop(text.i = "fn.mn(x=get('temp.#'), y=get('births.#') )" , dates.i=c('1951','1952') ) > get(load('mn.1951')) > get(load('mn.1952')) > > > > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >