Hi,
I am trying to use sub, regexpr on expressions like
log(D) ~ log(N)+I(log(N)^2)+log(t)
being a model specification.
The aim is to produce:
"ln D ~ ln N + ln^2 N + ln t"
The variable names N, t may change, the number of terms too.
I succeded only partially, help on regular expressions is hard to
understand for me, examples on my case are rare. The help page on R-help
for grep etc. and "regular expressions"
What I am doing:
(f <- log(D) ~ log(N)+I(log(N)^2)+log(t))
(ft <- sub("","",f)) # creates string with parts of
formula, how to do
it simpler?
(fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string
Then I want to use \1 for backreferences something like
(fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln
\\1^\\2",fu))
to change "log(g)^7" to "ln^7 g",
and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx)
-> xxx
The special characters are making trouble, sub acceps "(",
")" only in
pairs. Code for experimentation:
trysub <- function(s,t,e) {
ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in
c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
print(paste(ii<-ii+1,ifelse(i1," ","
~"),"ext",ifelse(i2," ","
~"),"perl",ifelse(i3," "," ~"),"fixed
",ifelse(i4," "," ~"),"useBytes:
", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
useBytes=i4)),sep=""));invisible(0) }
trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for
cases
5,6,13..16, the rest unsubstituted
trysub("log(","ln ",fu) # B: no substitutions;
errors for
cases 1..4,7.. 12 # typical errors:
"3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement,
x, ignore.case, useBytes) : \n\tinvalid regular expression
'log('\n"
trysub("log\(","ln ",fu) # C: same as A
trysub("log\\(","ln ",fu) # D: no substitutions;
errors for
cases 15,16 # typical errors:
"15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x,
ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression
'log\\('\n"
trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no
substitutions, no errors
# E: typical errors:
"3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement,
x, ignore.case, useBytes) : \n\tinvalid regular expression
'log\\(([:alpha:]+)\\)'\n"
Thanks for help
Christian
PS. The explanations in the documents
--
Dr. Christian W. Hoffmann,
Swiss Federal Research Institute WSL
Mathematics + Statistical Computing
Zuercherstrasse 111
CH-8903 Birmensdorf, Switzerland
Tel +41-44-7392-277 (office) -111(exchange)
Fax +41-44-7392-215 (fax)
christian.hoffmann at wsl.ch
http://www.wsl.ch/staff/christian.hoffmann
International Conference 5.-7.6.2006 Ekaterinburg Russia
"Climate changes and their impact on boreal and temperate forests"
http://ecoinf.uran.ru/conference/
Note that [:alpha:] is a pre-defined character class and should only be used inside []. And metacharacters need to be quoted. See ?regexp.> f <- log(D) ~ log(N)+I(log(N)^2)+log(t) > f1 <- deparse(f) > f1[1] "log(D) ~ log(N) + I(log(N)^2) + log(t)" Now we have a string. (f2 <- gsub("I\\((.*)\\) ", "\\1 ", f1)) [1] "log(D) ~ log(N) + log(N)^2 + log(t)" (f3 <- gsub("(?U)log\\((.*)\\)", "ln \\1", f2, perl=TRUE)) [1] "ln D ~ ln N + ln N^2 + ln t" (f4 <- gsub("ln ([[:alpha:]])\\^([[:digit:]])", "ln^\\2 \\1", f3)) [1] "ln D ~ ln N + ln^2 N + ln t" That should give you some ideas to be going on with. On Fri, 27 Jan 2006, Christian Hoffmann wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions" > > What I am doing: > > (f <- log(D) ~ log(N)+I(log(N)^2)+log(t)) > (ft <- sub("","",f)) # creates string with parts of formula, how to do > it simpler? > (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one string > > Then I want to use \1 for backreferences something like > > (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln \\1^\\2",fu)) > > to change "log(g)^7" to "ln^7 g", > > and to eliminate I(): sub("I(blabla)","\\1",fv) # I(xxx) -> xxx > > The special characters are making trouble, sub acceps "(", ")" only in > pairs.>From ?regexpAny metacharacter with special meaning may be quoted by preceding it with a backslash. The metacharacters are '. \ | ( ) [ { ^ $ * + ?'.> Code for experimentation: > > trysub <- function(s,t,e) { > ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in > c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) > print(paste(ii<-ii+1,ifelse(i1," "," ~"),"ext",ifelse(i2," "," > ~"),"perl",ifelse(i3," "," ~"),"fixed ",ifelse(i4," "," ~"),"useBytes: > ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, > useBytes=i4)),sep=""));invisible(0) } > > trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases > 5,6,13..16, the rest unsubstituted > > trysub("log(","ln ",fu) # B: no substitutions; errors for > cases 1..4,7.. 12 # typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n" > > trysub("log\(","ln ",fu) # C: same as A > > trysub("log\\(","ln ",fu) # D: no substitutions; errors for > cases 15,16 # typical errors: > "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, > ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression > 'log\\('\n" > > trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors > # E: typical errors: > "3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, > x, ignore.case, useBytes) : \n\tinvalid regular expression > 'log\\(([:alpha:]+)\\)'\n" > > > > Thanks for help > Christian > > PS. The explanations in the documents > -- > Dr. Christian W. Hoffmann, > Swiss Federal Research Institute WSL > Mathematics + Statistical Computing > Zuercherstrasse 111 > CH-8903 Birmensdorf, Switzerland > > Tel +41-44-7392-277 (office) -111(exchange) > Fax +41-44-7392-215 (fax) > christian.hoffmann at wsl.ch > http://www.wsl.ch/staff/christian.hoffmann > > International Conference 5.-7.6.2006 Ekaterinburg Russia > "Climate changes and their impact on boreal and temperate forests" > http://ecoinf.uran.ru/conference/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hello,
Here is what I got after playing a little bit with your problem:
# First of all, if you prefer 'ln' instead of 'log', why not to
define:
ln <- function(x) log(x)
ln2 <- function(x) log(x)^2
ln3 <- function(x) log(x)^3
ln4 <- function(x) log(x)^4
# ... as many function as powers you need
# Then, your formula is now closer to what you want
# which makes the whole code easier to read for you:
Form <- ln(D) ~ ln(N) + ln2(N) + ln(t) # Same as your original formula
# Here is the function to transform it in a more readable string:
formulaTransform <-
function(form, as.expression = FALSE) {
if (!inherits(form, "formula"))
stop("'form' must be a 'formula' object!")
# Transform the formula into a string (is it a better way?)
Res <- paste(as.character(form)[c(2, 1, 3)], collapse = " ")
if (as.expression) { # Transform the formula in a nice expression
# Change '~' into '=='
Res <- sub("~", "%~~%", Res) # How to do
'~' in an expression?
# Eliminate brackets
Res <- gsub("[(]([A-Za-z0-9._]*)[)]", " ~ \\1",
Res)
# Transform powers
Res <- gsub("ln([2-9])", "ln^\\1", Res)
Res <- eval(parse(text = Res))
} else { # Make a nicer string
# Eliminate brackets
Res <- gsub("[(]([A-Za-z0-9._]*)[)]", " \\1",
Res)
# Transform powers
Res <- gsub("ln([2-9])", "ln^\\1", Res)
}
# Return the result
return(Res)
}
# Here is a nicer presentation as a string
formulaTransform(Form)
# Here is an even nicer presentation (creating an expression)
plot(1:3, type = "n")
text(2, 2, formulaTransform(Form, TRUE))
# The later form is really interesting when you use, for instance,
# greek letters for variables, or so...
Form2 <- ln(alpha) ~ ln(beta) + ln2(beta) + ln3(beta)
formulaTransform(Form2)
plot(1:3, type = "n")
text(2, 2, formulaTransform(Form2, TRUE))
# ... but this could be refined even more!
Best,
Philippe Grosjean
..............................................<??}))><........
) ) ) ) )
( ( ( ( ( Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( ( Numerical Ecology of Aquatic Systems
) ) ) ) ) Mons-Hainaut University, Pentagone (3D08)
( ( ( ( (
..............................................................
Christian Hoffmann wrote:> Hi,
>
> I am trying to use sub, regexpr on expressions like
>
> log(D) ~ log(N)+I(log(N)^2)+log(t)
>
> being a model specification.
>
> The aim is to produce:
>
> "ln D ~ ln N + ln^2 N + ln t"
>
> The variable names N, t may change, the number of terms too.
>
> I succeded only partially, help on regular expressions is hard to
> understand for me, examples on my case are rare. The help page on R-help
> for grep etc. and "regular expressions"
>
> What I am doing:
>
> (f <- log(D) ~ log(N)+I(log(N)^2)+log(t))
> (ft <- sub("","",f)) # creates string with parts
of formula, how to do
> it simpler?
> (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one
string
>
> Then I want to use \1 for backreferences something like
>
> (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln
\\1^\\2",fu))
>
> to change "log(g)^7" to "ln^7 g",
>
> and to eliminate I(): sub("I(blabla)","\\1",fv) #
I(xxx) -> xxx
>
> The special characters are making trouble, sub acceps "(",
")" only in
> pairs. Code for experimentation:
>
> trysub <- function(s,t,e) {
> ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in
> c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
> print(paste(ii<-ii+1,ifelse(i1," ","
~"),"ext",ifelse(i2," ","
> ~"),"perl",ifelse(i3," ","
~"),"fixed ",ifelse(i4," ","
~"),"useBytes:
> ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
> useBytes=i4)),sep=""));invisible(0) }
>
> trysub("I(log(N)^2)","ln n^2",fu) # A: desired result
for cases
> 5,6,13..16, the rest unsubstituted
>
> trysub("log(","ln ",fu) # B: no
substitutions; errors for
> cases 1..4,7.. 12 # typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern,
replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
'log('\n"
>
> trysub("log\(","ln ",fu) # C: same as A
>
> trysub("log\\(","ln ",fu) # D: no
substitutions; errors for
> cases 15,16 # typical errors:
> "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement,
x,
> ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression
> 'log\\('\n"
>
> trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no
substitutions, no errors
> # E: typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern,
replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
> 'log\\(([:alpha:]+)\\)'\n"
>
>
>
> Thanks for help
> Christian
>
> PS. The explanations in the documents
There are some interactive regex tools around. I use a python one sometimes. You just then have to be careful re escaping and the style of regular expressions used in the tool you worked with and the target environment. Christian Hoffmann wrote:> Hi, > > I am trying to use sub, regexpr on expressions like > > log(D) ~ log(N)+I(log(N)^2)+log(t) > > being a model specification. > > The aim is to produce: > > "ln D ~ ln N + ln^2 N + ln t" > > The variable names N, t may change, the number of terms too. > > I succeded only partially, help on regular expressions is hard to > understand for me, examples on my case are rare. The help page on R-help > for grep etc. and "regular expressions"
In this post:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/30590.html
Thomas Lumley provided a function to traverse a formula recursively.
We can modify it as shown to transform ln(m)^n to ln^n(m) producing
proc2. We then bundle everything up into proc3 which uses substitute
to translate log to ln and remove (, the calls proc2 to do the aforementioned
substitute and finally we use simple character processing to clean up the
rest.
Although this is substantially longer in terms of lines of code
we did not have to write many of them because proc2 is actually
just a modification of the code in the indicated post and the
character processing becomes extremely simple. Also its more
powerful able to handle expressions like:
log(D) ~ log(log(N)^2)^3
proc2 <-function(formula){
process<-function(expr){
if (length(expr)==1)
return(expr)
if(length(expr)==2) {
expr[[2]] <- process(expr[[2]])
return(expr)
}
if ( expr[[1]]==as.name("^") && length(expr[[2]])==2
&&
expr[[2]][[1]] == as.name("ln") &&
class(idx <- expr[[3]]) == "numeric") {
expr <- as.call(list(as.name(paste("ln",idx,sep =
"^")),
expr[[2]][[2]]))
expr[[2]] <- process(expr[[2]])
return(expr)
}
expr[[2]]<-process(expr[[2]])
expr[[3]]<-process(expr[[3]])
return(expr)
}
formula[[3]]<-process(formula[[3]])
formula
}
proc3 <- function(f) {
# replace log with ln
result <- do.call("substitute", list(f, list(log =
as.name("ln"))))
# remove I
result <- do.call("substitute", list(result, list(I =
as.name("("))))
# transform ln(m)^n to ln^n(m)
result <- proc2(result)
# now clean up using simple character substitutions
result <- deparse(result)
# ( -> space
result <- gsub("[(]", " ", result)
# remove " and )
gsub("[\")]", "", result)
}
# tests
proc3( log(D) ~ log(N)+I(log(N)^2)+log(t) ) # "ln D ~ ln N + ln^2 N +
ln t"
proc3( log(D) ~ log(log(N)^2)^3) # "ln D ~ ln^3 ln^2 N"
On 1/27/06, Christian Hoffmann <christian.hoffmann at wsl.ch>
wrote:> Hi,
>
> I am trying to use sub, regexpr on expressions like
>
> log(D) ~ log(N)+I(log(N)^2)+log(t)
>
> being a model specification.
>
> The aim is to produce:
>
> "ln D ~ ln N + ln^2 N + ln t"
>
> The variable names N, t may change, the number of terms too.
>
> I succeded only partially, help on regular expressions is hard to
> understand for me, examples on my case are rare. The help page on R-help
> for grep etc. and "regular expressions"
>
> What I am doing:
>
> (f <- log(D) ~ log(N)+I(log(N)^2)+log(t))
> (ft <- sub("","",f)) # creates string with parts
of formula, how to do
> it simpler?
> (fu <- paste(ft[c(2,1,3)],collapse=" ")) # converts to one
string
>
> Then I want to use \1 for backreferences something like
>
> (fv <- sub("log( [:alpha:] N )^ [:alpha:)","ln
\\1^\\2",fu))
>
> to change "log(g)^7" to "ln^7 g",
>
> and to eliminate I(): sub("I(blabla)","\\1",fv) #
I(xxx) -> xxx
>
> The special characters are making trouble, sub acceps "(",
")" only in
> pairs. Code for experimentation:
>
> trysub <- function(s,t,e) {
> ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in
> c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
> print(paste(ii<-ii+1,ifelse(i1," ","
~"),"ext",ifelse(i2," ","
> ~"),"perl",ifelse(i3," ","
~"),"fixed ",ifelse(i4," ","
~"),"useBytes:
> ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
> useBytes=i4)),sep=""));invisible(0) }
>
> trysub("I(log(N)^2)","ln n^2",fu) # A: desired result
for cases
> 5,6,13..16, the rest unsubstituted
>
> trysub("log(","ln ",fu) # B: no
substitutions; errors for
> cases 1..4,7.. 12 # typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern,
replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
'log('\n"
>
> trysub("log\(","ln ",fu) # C: same as A
>
> trysub("log\\(","ln ",fu) # D: no
substitutions; errors for
> cases 15,16 # typical errors:
> "15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement,
x,
> ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression
> 'log\\('\n"
>
> trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no
substitutions, no errors
> # E: typical errors:
> "3 ext perl ~fixed useBytes: Error in sub.perl(pattern,
replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
> 'log\\(([:alpha:]+)\\)'\n"
>
>
>
> Thanks for help
> Christian
>
> PS. The explanations in the documents
> --
> Dr. Christian W. Hoffmann,
> Swiss Federal Research Institute WSL
> Mathematics + Statistical Computing
> Zuercherstrasse 111
> CH-8903 Birmensdorf, Switzerland
>
> Tel +41-44-7392-277 (office) -111(exchange)
> Fax +41-44-7392-215 (fax)
> christian.hoffmann at wsl.ch
> http://www.wsl.ch/staff/christian.hoffmann
>
> International Conference 5.-7.6.2006 Ekaterinburg Russia
> "Climate changes and their impact on boreal and temperate
forests"
> http://ecoinf.uran.ru/conference/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
Reasonably Related Threads
- elegant way to check if 2 values are in 3 columns?
- dtracing a forked process OR dynamic library
- pivot table
- [ win32utils-Bugs-28840 ] wrong process_id is returned if using create multiple times for IE
- Errors on Windows with grep(fixed=TRUE) on UTF-8 strings