David Winsemius
2009-Dec-01 19:41 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
Starting with the head of a 499 element matrix whose column names are now the labels trom a cut() operation, I needed to get to a vector of midpoints to serve as the basis for plotting a calibration curve ( exp(linear predictor) vs. : > dput(head(dimnames(mtcal)[2][[1]])) # was starting point testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", "(-2.756,-2.668]", "(-2.668,-2.597]", "(-2.597,-2.539]") I started this message with the thought of requesting an answer but kept asking myself if I really had check the docs and tested my understanding. I eventually solved it using the gsubfn from the gsubfn package: testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*), (-?[[:digit:]]+.?[[:digit:]]*)\\]", ~ (as.numeric(x)+as.numeric(y))/2, testvec)) # I did discover that carriage returns in the middle of the pattern will not give desired results, so if this is broken by your mail- client, be sure to rejoin in the console. The extra "?"'s after the decimal point are in there because I had 4 NA's around the median linear predictor: > dimnames(mtcal)[2][[1]][which(is.na(testintvl))] [1] "(-1.008,-1]" "(-1,-0.9922]" "(0.9914,1]" "(1,1.009]" So a better test vector would be: testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", "(-2.756,-2.668]", "(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]", "(-1,-0.9922]", "(0.9914,1]", "(1,1.009]" ) > testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-? [[:digit:]]+.?[[:digit:]]*)\\]", + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) > testintvl [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 0.9957 1.0045 I offer this to those who may feel regex challenged (as I often do). The gsubfn function is pretty slick. I don't see an author listed for the function, but the author of the package documents is Gabor Grothendieck. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Gabor Grothendieck
2009-Dec-01 19:47 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
You also might want to look at demo("gsubfn-cut") On Tue, Dec 1, 2009 at 2:41 PM, David Winsemius <dwinsemius@comcast.net>wrote:> Starting with the head of a 499 element matrix whose column names are now > the labels trom a cut() operation, I needed to get to a vector of midpoints > to serve as the basis for plotting a calibration curve ( exp(linear > predictor) vs. : > > > dput(head(dimnames(mtcal)[2][[1]])) # was starting point > > > testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", > "(-2.756,-2.668]", > "(-2.668,-2.597]", "(-2.597,-2.539]") > > I started this message with the thought of requesting an answer but kept > asking myself if I really had check the docs and tested my understanding. I > eventually solved it using the gsubfn from the gsubfn package: > > testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*), > (-?[[:digit:]]+.?[[:digit:]]*)\\]", > ~ (as.numeric(x)+as.numeric(y))/2, testvec)) > > # I did discover that carriage returns in the middle of the pattern will > not give desired results, so if this is broken by your mail-client, be sure > to rejoin in the console. > > The extra "?"'s after the decimal point are in there because I had 4 NA's > around the median linear predictor: > > > dimnames(mtcal)[2][[1]][which(is.na(testintvl))] > [1] "(-1.008,-1]" "(-1,-0.9922]" "(0.9914,1]" "(1,1.009]" > > So a better test vector would be: > > testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", > "(-2.756,-2.668]", > "(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]", "(-1,-0.9922]", > "(0.9914,1]", "(1,1.009]" ) > > > testintvl > <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]", > + ~ (as.numeric(x)+as.numeric(y))/2, testvec)) > > > testintvl > [1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 > 0.9957 1.0045 > > I offer this to those who may feel regex challenged (as I often do). The > gsubfn function is pretty slick. I don't see an author listed for the > function, but the author of the package documents is Gabor Grothendieck. > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Dec-01 20:01 UTC
[R] Cut intervals (character) to numeric midpoint; regex problem
Perhaps this shoul work too: sapply(strsplit(gsub("^\\W|\\W$", "", testvec), ","), function(x)sum(as.numeric(x))/2) On Tue, Dec 1, 2009 at 5:41 PM, David Winsemius <dwinsemius at comcast.net> wrote:> Starting with the head of a 499 element matrix whose column names are now > the labels trom a cut() operation, I needed to get to a vector of midpoints > to serve as the basis for plotting a calibration curve ( exp(linear > predictor) vs. ?: > >> dput(head(dimnames(mtcal)[2][[1]])) # was starting point > > > testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", > "(-2.756,-2.668]", > "(-2.668,-2.597]", "(-2.597,-2.539]") > > I started this message with the thought of requesting an answer but kept > asking myself if I really had check the docs and tested my understanding. I > eventually solved it using the gsubfn from the gsubfn package: > > testintvl <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*), > (-?[[:digit:]]+.?[[:digit:]]*)\\]", > ~ (as.numeric(x)+as.numeric(y))/2, ?testvec)) > > # I did discover that carriage returns in the middle of the pattern will not > give desired results, so if this is broken by your mail-client, be sure to > rejoin in the console. > > The extra "?"'s after the decimal point are in there because I had 4 NA's > around the median linear predictor: > >> dimnames(mtcal)[2][[1]][which(is.na(testintvl))] > [1] "(-1.008,-1]" ?"(-1,-0.9922]" "(0.9914,1]" ? "(1,1.009]" > > So a better test vector would be: > > testvec <- c("(-8.616,-3.084]", "(-3.084,-2.876]", "(-2.876,-2.756]", > "(-2.756,-2.668]", > "(-2.668,-2.597]", "(-2.597,-2.539]", "(-1.008,-1]", ?"(-1,-0.9922]", > "(0.9914,1]", "(1,1.009]" ) > >> testintvl >> <-as.numeric(gsubfn("\\((-?[[:digit:]]+.?[[:digit:]]*),(-?[[:digit:]]+.?[[:digit:]]*)\\]", > + ~ (as.numeric(x)+as.numeric(y))/2, ?testvec)) > >> testintvl > ?[1] -5.8500 -2.9800 -2.8160 -2.7120 -2.6325 -2.5680 -1.0040 -0.9961 ?0.9957 > ?1.0045 > > I offer this to those who may feel regex challenged (as I often do). The > gsubfn function is pretty slick. I don't see an author listed for the > function, but the author of the package documents is Gabor Grothendieck. > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O