David Winsemius
2009-Dec-01 19:41 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
Starting with the head of a 499 element matrix whose column names are
now the labels trom a cut() operation, I needed to get to a vector of
midpoints to serve as the basis for plotting a calibration curve
( exp(linear predictor) vs. :
> dput(head(dimnames(mtcal)[2][[1]])) # was starting point
testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
"(-2.756,-2.668]",
"(-2.668,-2.597]", "(-2.597,-2.539]")
I started this message with the thought of requesting an answer but
kept asking myself if I really had check the docs and tested my
understanding. I eventually solved it using the gsubfn from the gsubfn
package:
testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
(-?[[:digit:]]+.?[[:digit:]]*)\\]",
~ (as.numeric(x)+as.numeric(y))/2, testvec))
# I did discover that carriage returns in the middle of the pattern
will not give desired results, so if this is broken by your mail-
client, be sure to rejoin in the console.
The extra "?"'s after the decimal point are in there because I had
4
NA's around the median linear predictor:
> dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
[1] "(-1.008,-1]" "(-1,-0.9922]" "(0.9914,1]"
"(1,1.009]"
So a better test vector would be:
testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
"(-2.756,-2.668]",
"(-2.668,-2.597]", "(-2.597,-2.539]",
"(-1.008,-1]", "(-1,-0.9922]",
"(0.9914,1]", "(1,1.009]" )
> testintvl
<-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?
[[:digit:]]+.?[[:digit:]]*)\\]",
+ ~ (as.numeric(x)+as.numeric(y))/2, testvec))
> testintvl
[1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961
0.9957 1.0045
I offer this to those who may feel regex challenged (as I often do).
The gsubfn function is pretty slick. I don't see an author listed for
the function, but the author of the package documents is Gabor
Grothendieck.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Gabor Grothendieck
2009-Dec-01 19:47 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
You also might want to look at
demo("gsubfn-cut")
On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius
<dwinsemius@comcast.net>wrote:
> Starting with the head of a 499 element matrix whose column names are now
> the labels trom a cut() operation, I needed to get to a vector of midpoints
> to serve as the basis for plotting a calibration curve ( exp(linear
> predictor) vs. :
>
> > dput(head(dimnames(mtcal)[2][[1]])) # was starting point
>
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
> "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]")
>
> I started this message with the thought of requesting an answer but kept
> asking myself if I really had check the docs and tested my understanding. I
> eventually solved it using the gsubfn from the gsubfn package:
>
> testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
> (-?[[:digit:]]+.?[[:digit:]]*)\\]",
> ~ (as.numeric(x)+as.numeric(y))/2, testvec))
>
> # I did discover that carriage returns in the middle of the pattern will
> not give desired results, so if this is broken by your mail-client, be sure
> to rejoin in the console.
>
> The extra "?"'s after the decimal point are in there because
I had 4 NA's
> around the median linear predictor:
>
> > dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
> [1] "(-1.008,-1]" "(-1,-0.9922]"
"(0.9914,1]" "(1,1.009]"
>
> So a better test vector would be:
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
> "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]",
"(-1.008,-1]", "(-1,-0.9922]",
> "(0.9914,1]", "(1,1.009]" )
>
> > testintvl
>
<-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]",
> + ~ (as.numeric(x)+as.numeric(y))/2, testvec))
>
> > testintvl
> [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961
> 0.9957 1.0045
>
> I offer this to those who may feel regex challenged (as I often do). The
> gsubfn function is pretty slick. I don't see an author listed for the
> function, but the author of the package documents is Gabor Grothendieck.
>
> --
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Dec-01 20:01 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
Perhaps this shoul work too:
sapply(strsplit(gsub("^\\W|\\W$", "", testvec),
","),
function(x)sum(as.numeric(x))/2)
On Tue, Dec 1, 2009 at 5:41 PM, David Winsemius <dwinsemius at
comcast.net> wrote:> Starting with the head of a 499 element matrix whose column names are now
> the labels trom a cut() operation, I needed to get to a vector of midpoints
> to serve as the basis for plotting a calibration curve ( exp(linear
> predictor) vs. ?:
>
>> dput(head(dimnames(mtcal)[2][[1]])) # was starting point
>
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
> "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]")
>
> I started this message with the thought of requesting an answer but kept
> asking myself if I really had check the docs and tested my understanding. I
> eventually solved it using the gsubfn from the gsubfn package:
>
> testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),
> (-?[[:digit:]]+.?[[:digit:]]*)\\]",
> ~ (as.numeric(x)+as.numeric(y))/2, ?testvec))
>
> # I did discover that carriage returns in the middle of the pattern will
not
> give desired results, so if this is broken by your mail-client, be sure to
> rejoin in the console.
>
> The extra "?"'s after the decimal point are in there because
I had 4 NA's
> around the median linear predictor:
>
>> dimnames(mtcal)[2][[1]][which(is.na(testintvl))]
> [1] "(-1.008,-1]" ?"(-1,-0.9922]"
"(0.9914,1]" ? "(1,1.009]"
>
> So a better test vector would be:
>
> testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]",
"(-2.876,-2.756]",
> "(-2.756,-2.668]",
> "(-2.668,-2.597]", "(-2.597,-2.539]",
"(-1.008,-1]", ?"(-1,-0.9922]",
> "(0.9914,1]", "(1,1.009]" )
>
>> testintvl
>>
<-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]",
> + ~ (as.numeric(x)+as.numeric(y))/2, ?testvec))
>
>> testintvl
> ?[1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961
?0.9957
> ?1.0045
>
> I offer this to those who may feel regex challenged (as I often do). The
> gsubfn function is pretty slick. I don't see an author listed for the
> function, but the author of the package documents is Gabor Grothendieck.
>
> --
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O