Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: "ln D ~ ln N + ln^2 N + ln t" The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and "regular expressions" What I am doing: (f <- log(D) ~ log(N)+I(log(N)^2)+log(t)) (ft <- sub("","",f)) # creates string with parts of formula, how to do it simpler? (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string Then I want to use \1 for backreferences something like (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu)) to change "log(g)^7" to "ln^7 g", and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx The special characters are making trouble, sub acceps "(", ")" only in pairs. Code for experimentation: trysub <- function(s,t,e) { ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=""));invisible(0) } trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub("log(","ln ",fu) # B: no substitutions; errors for cases 1..4,7.. 12 # typical errors: "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n" trysub("log\(","ln ",fu) # C: same as A trysub("log\\(","ln ",fu) # D: no substitutions; errors for cases 15,16 # typical errors: "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression 'log\\('\n" trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors # E: typical errors: "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log\\(([:alpha:]+)\\)'\n" Thanks for help Christian PS. The explanations in the documents -- Dr. Christian W. Hoffmann, Swiss Federal Research Institute WSL Mathematics + Statistical Computing Zuercherstrasse 111 CH-8903 Birmensdorf, Switzerland Tel +41-44-7392-277 (office) -111(exchange) Fax +41-44-7392-215 (fax) christian.hoffmann at wsl.ch http://www.wsl.ch/staff/christian.hoffmann International Conference 5.-7.6.2006 Ekaterinburg Russia "Climate changes and their impact on boreal and temperate forests" http://ecoinf.uran.ru/conference/
Note that [:alpha:] is a pre-defined character class and should only be used inside []. And metacharacters need to be quoted. See ?regexp.> f <- log(D) ~ log(N)+I(log(N)^2)+log(t) > f1 <- deparse(f) > f1[1] "log(D) ~ log(N) + I(log(N)^2) + log(t)" Now we have a string. (f2 <- gsub("I\\((.*)\\) ", "\\1 ", f1)) [1] "log(D) ~ log(N) + log(N)^2 + log(t)" (f3 <- gsub("(?U)log\\((.*)\\)", "ln \\1", f2, perl=TRUE)) [1] "ln D ~ ln N + ln N^2 + ln t" (f4 <- gsub("ln ([[:alpha:]])\\^([[:digit:]])", "ln^\\2 \\1", f3)) [1] "ln D ~ ln N + ln^2 N + ln t" That should give you some ideas to be going on with. On Fri, 27 Jan 2006, Christian Hoffmann wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions" > > What I am doing: > > (f <- log(D) ~ log(N)+I(log(N)^2)+log(t)) > (ft <- sub("","",f)) # creates string with parts of formula, how to do > it simpler? > (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string > > Then I want to use \1 for backreferences something like > > (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu)) > > to change "log(g)^7" to "ln^7 g", > > and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx > > The special characters are making trouble, sub acceps "(", ")" only in > pairs.>From ?regexpAny metacharacter with special meaning may be quoted by preceding it with a backslash. The metacharacters are '. \ | ( ) [ { ^ $ * + ?'.> Code for experimentation: > > trysub <- function(s,t,e) { > ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in > c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) > print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," > ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: > ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, > useBytes=i4)),sep=""));invisible(0) } > > trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases > 5,6,13..16, the rest unsubstituted > > trysub("log(","ln ",fu) # B: no substitutions; errors for > cases 1..4,7.. 12 # typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n" > > trysub("log\(","ln ",fu) # C: same as A > > trysub("log\\(","ln ",fu) # D: no substitutions; errors for > cases 15,16 # typical errors: > "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, > ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression > 'log\\('\n" > > trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors > # E: typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression > 'log\\(([:alpha:]+)\\)'\n" > > > > Thanks for help > Christian > > PS. The explanations in the documents > -- > Dr. Christian W. Hoffmann, > Swiss Federal Research Institute WSL > Mathematics + Statistical Computing > Zuercherstrasse 111 > CH-8903 Birmensdorf, Switzerland > > Tel +41-44-7392-277 (office) -111(exchange) > Fax +41-44-7392-215 (fax) > christian.hoffmann at wsl.ch > http://www.wsl.ch/staff/christian.hoffmann > > International Conference 5.-7.6.2006 Ekaterinburg Russia > "Climate changes and their impact on boreal and temperate forests" > http://ecoinf.uran.ru/conference/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hello, Here is what I got after playing a little bit with your problem: # First of all, if you prefer 'ln' instead of 'log', why not to define: ln <- function(x) log(x) ln2 <- function(x) log(x)^2 ln3 <- function(x) log(x)^3 ln4 <- function(x) log(x)^4 # ... as many function as powers you need # Then, your formula is now closer to what you want # which makes the whole code easier to read for you: Form <- ln(D) ~ ln(N) + ln2(N) + ln(t) # Same as your original formula # Here is the function to transform it in a more readable string: formulaTransform <- function(form, as.expression = FALSE) { if (!inherits(form, "formula")) stop("'form' must be a 'formula' object!") # Transform the formula into a string (is it a better way?) Res <- paste(as.character(form)[c(2, 1, 3)], collapse = " ") if (as.expression) { # Transform the formula in a nice expression # Change '~' into '==' Res <- sub("~", "%~~%", Res) # How to do '~' in an expression? # Eliminate brackets Res <- gsub("[(]([A-Za-z0-9._]*)[)]", " ~ \\1", Res) # Transform powers Res <- gsub("ln([2-9])", "ln^\\1", Res) Res <- eval(parse(text = Res)) } else { # Make a nicer string # Eliminate brackets Res <- gsub("[(]([A-Za-z0-9._]*)[)]", " \\1", Res) # Transform powers Res <- gsub("ln([2-9])", "ln^\\1", Res) } # Return the result return(Res) } # Here is a nicer presentation as a string formulaTransform(Form) # Here is an even nicer presentation (creating an expression) plot(1:3, type = "n") text(2, 2, formulaTransform(Form, TRUE)) # The later form is really interesting when you use, for instance, # greek letters for variables, or so... Form2 <- ln(alpha) ~ ln(beta) + ln2(beta) + ln3(beta) formulaTransform(Form2) plot(1:3, type = "n") text(2, 2, formulaTransform(Form2, TRUE)) # ... but this could be refined even more! Best, Philippe Grosjean ..............................................<??}))><........ ) ) ) ) ) ( ( ( ( ( Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( ( Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Pentagone (3D08) ( ( ( ( ( .............................................................. Christian Hoffmann wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions" > > What I am doing: > > (f <- log(D) ~ log(N)+I(log(N)^2)+log(t)) > (ft <- sub("","",f)) # creates string with parts of formula, how to do > it simpler? > (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string > > Then I want to use \1 for backreferences something like > > (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu)) > > to change "log(g)^7" to "ln^7 g", > > and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx > > The special characters are making trouble, sub acceps "(", ")" only in > pairs. Code for experimentation: > > trysub <- function(s,t,e) { > ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in > c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) > print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," > ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: > ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, > useBytes=i4)),sep=""));invisible(0) } > > trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases > 5,6,13..16, the rest unsubstituted > > trysub("log(","ln ",fu) # B: no substitutions; errors for > cases 1..4,7.. 12 # typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n" > > trysub("log\(","ln ",fu) # C: same as A > > trysub("log\\(","ln ",fu) # D: no substitutions; errors for > cases 15,16 # typical errors: > "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, > ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression > 'log\\('\n" > > trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors > # E: typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression > 'log\\(([:alpha:]+)\\)'\n" > > > > Thanks for help > Christian > > PS. The explanations in the documents
There are some interactive regex tools around. I use a python one sometimes. You just then have to be careful re escaping and the style of regular expressions used in the tool you worked with and the target environment. Christian Hoffmann wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions"
In this post: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/30590.html Thomas Lumley provided a function to traverse a formula recursively. We can modify it as shown to transform ln(m)^n to ln^n(m) producing proc2. We then bundle everything up into proc3 which uses substitute to translate log to ln and remove (, the calls proc2 to do the aforementioned substitute and finally we use simple character processing to clean up the rest. Although this is substantially longer in terms of lines of code we did not have to write many of them because proc2 is actually just a modification of the code in the indicated post and the character processing becomes extremely simple. Also its more powerful able to handle expressions like: log(D) ~ log(log(N)^2)^3 proc2 <-function(formula){ process<-function(expr){ if (length(expr)==1) return(expr) if(length(expr)==2) { expr[[2]] <- process(expr[[2]]) return(expr) } if ( expr[[1]]==as.name("^") && length(expr[[2]])==2 && expr[[2]][[1]] == as.name("ln") && class(idx <- expr[[3]]) == "numeric") { expr <- as.call(list(as.name(paste("ln",idx,sep = "^")), expr[[2]][[2]])) expr[[2]] <- process(expr[[2]]) return(expr) } expr[[2]]<-process(expr[[2]]) expr[[3]]<-process(expr[[3]]) return(expr) } formula[[3]]<-process(formula[[3]]) formula } proc3 <- function(f) { # replace log with ln result <- do.call("substitute", list(f, list(log = as.name("ln")))) # remove I result <- do.call("substitute", list(result, list(I = as.name("(")))) # transform ln(m)^n to ln^n(m) result <- proc2(result) # now clean up using simple character substitutions result <- deparse(result) # ( -> space result <- gsub("[(]", " ", result) # remove " and ) gsub("[\")]", "", result) } # tests proc3( log(D) ~ log(N)+I(log(N)^2)+log(t) ) # "ln D ~ ln N + ln^2 N + ln t" proc3( log(D) ~ log(log(N)^2)^3) # "ln D ~ ln^3 ln^2 N" On 1/27/06, Christian Hoffmann <christian.hoffmann at wsl.ch> wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions" > > What I am doing: > > (f <- log(D) ~ log(N)+I(log(N)^2)+log(t)) > (ft <- sub("","",f)) # creates string with parts of formula, how to do > it simpler? > (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string > > Then I want to use \1 for backreferences something like > > (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu)) > > to change "log(g)^7" to "ln^7 g", > > and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx > > The special characters are making trouble, sub acceps "(", ")" only in > pairs. Code for experimentation: > > trysub <- function(s,t,e) { > ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in > c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) > print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," > ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: > ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, > useBytes=i4)),sep=""));invisible(0) } > > trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases > 5,6,13..16, the rest unsubstituted > > trysub("log(","ln ",fu) # B: no substitutions; errors for > cases 1..4,7.. 12 # typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n" > > trysub("log\(","ln ",fu) # C: same as A > > trysub("log\\(","ln ",fu) # D: no substitutions; errors for > cases 15,16 # typical errors: > "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, > ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression > 'log\\('\n" > > trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors > # E: typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression > 'log\\(([:alpha:]+)\\)'\n" > > > > Thanks for help > Christian > > PS. The explanations in the documents > -- > Dr. Christian W. Hoffmann, > Swiss Federal Research Institute WSL > Mathematics + Statistical Computing > Zuercherstrasse 111 > CH-8903 Birmensdorf, Switzerland > > Tel +41-44-7392-277 (office) -111(exchange) > Fax +41-44-7392-215 (fax) > christian.hoffmann at wsl.ch > http://www.wsl.ch/staff/christian.hoffmann > > International Conference 5.-7.6.2006 Ekaterinburg Russia > "Climate changes and their impact on boreal and temperate forests" > http://ecoinf.uran.ru/conference/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >