Jing Liu
2010-Dec-17 13:34 UTC
[R] Matching a pattern of vector of character strings in another vector of character strings
Dear all, My question is illustrated by the following example: I have a matrix M:> M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3) > colnames(M)<- c("2006","2007","2008","2009","2010") > M2006 2007 2008 2009 2010 [1,] "0" "1" "1" "*" "0" [2,] "0" "0" "0" "1" "1" [3,] "1" "1" "0" "1" "*"> pattern<- c("0","1")I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3. For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!! Best regards, Jing Liu [[alternative HTML version deleted]]
Liviu Andronic
2010-Dec-17 13:58 UTC
[R] Matching a pattern of vector of character strings in another vector of character strings
On Fri, Dec 17, 2010 at 2:34 PM, Jing Liu <quiet_jing0920 at hotmail.com> wrote:>> M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3) >> colnames(M)<- c("2006","2007","2008","2009","2010") >> M > ? ? 2006 2007 2008 2009 2010 > [1,] "0" ?"1" ?"1" ?"*" ?"0" > [2,] "0" ?"0" ?"0" ?"1" ?"1" > [3,] "1" ?"1" ?"0" ?"1" ?"*" > >> pattern<- c("0","1") > > I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? > E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3. >I could only think of this> apply(M, 1, function(z) grep('01', paste(z, collapse='')))[1] 1 1 1> apply(M, 1, function(z) grepl('01', paste(z, collapse='')))[1] TRUE TRUE TRUE But it doesn't return the position of the matched string. So this isn't what you wanted. Regards Liviu> For as far as I know, the variations of the grep function group cannot search for a pattern that has 2 or more character strings. I could do it with a loop but I seek a more efficient way than a loop. How should I do it? Really appreciated for your help!!! > > Best regards, > Jing Liu > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
Petr Savicky
2010-Dec-17 14:24 UTC
[R] Matching a pattern of vector of character strings in another vector of character strings
On Fri, Dec 17, 2010 at 09:34:57PM +0800, Jing Liu wrote:> > Dear all, > > My question is illustrated by the following example: > > I have a matrix M: > > > M<- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3) > > colnames(M)<- c("2006","2007","2008","2009","2010") > > M > 2006 2007 2008 2009 2010 > [1,] "0" "1" "1" "*" "0" > [2,] "0" "0" "0" "1" "1" > [3,] "1" "1" "0" "1" "*" > > > pattern<- c("0","1") > > I would like to find, for each row, if it contains exactly the pattern of two character strings, beginning with a "0" and followed by a "1", i.e, exactly "0" "1". If it does, at which year? > E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for row 3.If the pattern is always c("0","1"), the number of rows is large and the number of years is relatively small, then this may computed also using matrix calculations. For example M <- matrix(c("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3) colnames(M) <- c("2006","2007","2008","2009","2010") year <- colnames(M) status <- rep(NA, times=nrow(M)) for (i in seq(length(year) - 1)) { status[M[, i] == "0" & M[, i+1] == "1"] <- year[i] } status # [1] "2006" "2008" "2008" Petr Savicky.
David Winsemius
2010-Dec-17 14:39 UTC
[R] Matching a pattern of vector of character strings in another vector of character strings
On Dec 17, 2010, at 8:34 AM, Jing Liu wrote:> > Dear all, > > My question is illustrated by the following example: > > I have a matrix M: > >> M<- >> matrix >> (c >> ("0","0","1","1","0","1","1","0","0","*","1","1","0","1","*"),nrow=3) >> colnames(M)<- c("2006","2007","2008","2009","2010") >> M > 2006 2007 2008 2009 2010 > [1,] "0" "1" "1" "*" "0" > [2,] "0" "0" "0" "1" "1" > [3,] "1" "1" "0" "1" "*" > >> pattern<- c("0","1") > > I would like to find, for each row, if it contains exactly the > pattern of two character strings, beginning with a "0" and followed > by a "1", i.e, exactly "0" "1". If it does, at which year? > E.g. It should return 2006 for row 1, 2008 for row 2 and 2008 for > row 3. > > For as far as I know, the variations of the grep function group > cannot search for a pattern that has 2 or more character strings. I > could do it with a loop but I seek a more efficient way than a loop. > How should I do it? Really appreciated for your help!!!You can just paste() each row with collapse="._" and now can use grep- ish functions as you were hoping to use. > m2 <- apply(M, 1, paste, collapse="_") > colnames(M)[(regexpr("0_1", m2)+1)/2] # assuming number of characters per element are all 1 [1] "2006" "2008" "2008" -- David Winsemius, MD West Hartford, CT
Apparently Analagous Threads
- Can ROC be used as a metric for optimal model selection for randomForest?
- How to determine a subset of a binary strings?
- convert 'character' vector containing mixed formats to 'Date'
- Converting a character string into a data frame name and performing assignments to that data frame
- how to 'average' one col wrt to another one