Boris Steipe
2017-Apr-26 00:33 UTC
[R] Counting enumerated items in each element of a character vector
I should add: there's a str_count() function in the stringr package. library(stringr) str_count(text1, "Example") # [1] 5 5 5 5 I guess that would be the neater solution. B.> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > How about: > > unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) > > > Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of > matches. You can use any regex instead of "example" if you need to tweak what you are looking for. > > > B. > > > > >> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >> >> Hi all, >> >> I am looking for a streamlined way of counting the number of enumerated >> items are each element of a character vector. For example: >> >> >> text1<-c("This is an example. >> List 1 >> 1) Example 1 >> 2) Example 2 >> 10) Example 10 >> List 2 >> 1) Example 1 >> 2) Example 2 >> These have been examples.","This is another example. >> List 1 >> 1. Example 1 >> 2. Example 2 >> 10. Example 10 >> List 2 >> 1. Example 1 >> 2. Example 2 >> These have been examples.","This is a third example. List 1 1) Example 1. >> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >> been examples." >> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >> 10. List 2 Example 1. 2. Example 2. These have been examples.") >> >> text1 >> >> ==>> >> I would like the result to be c(5,5,5,5). Notice that sometimes there are >> leading hard returns, other times not. Sometimes are there separate lists >> and the same numbers are used in the enumerated items multiple times within >> each character string. Sometimes the leading numbers for the enumerated >> items exceed single digits. Notice that the delimiter may be ) or a period >> (.). If the delimiter is a period and there are hard returns (example 2), >> then I expect that will be easy enough to differentiate sentences ending >> with a number from enumerated items. However, I imagine it would be much >> more difficult to differentiate the two for example 4. >> >> Any suggestions are appreciated. >> >> Best, >> >> Dan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Michael Hannon
2017-Apr-26 03:40 UTC
[R] Counting enumerated items in each element of a character vector
I like Boris's "Hadley" solution. For the record, I've appended a version that uses regular expressions, the only benefit of which is that it could be generalized to find more-complicated patterns. -- Mike counts <- sapply(text1, function(next_string) { loc_example <- length(gregexpr("Example", next_string)[[1]]) loc_example }, USE.NAMES=FALSE)> counts[1] 5 5 5 5>On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:> I should add: there's a str_count() function in the stringr package. > > library(stringr) > str_count(text1, "Example") > # [1] 5 5 5 5 > > I guess that would be the neater solution. > > B. > > > >> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: >> >> How about: >> >> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) >> >> >> Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of >> matches. You can use any regex instead of "example" if you need to tweak what you are looking for. >> >> >> B. >> >> >> >> >>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >>> >>> Hi all, >>> >>> I am looking for a streamlined way of counting the number of enumerated >>> items are each element of a character vector. For example: >>> >>> >>> text1<-c("This is an example. >>> List 1 >>> 1) Example 1 >>> 2) Example 2 >>> 10) Example 10 >>> List 2 >>> 1) Example 1 >>> 2) Example 2 >>> These have been examples.","This is another example. >>> List 1 >>> 1. Example 1 >>> 2. Example 2 >>> 10. Example 10 >>> List 2 >>> 1. Example 1 >>> 2. Example 2 >>> These have been examples.","This is a third example. List 1 1) Example 1. >>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >>> been examples." >>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >>> 10. List 2 Example 1. 2. Example 2. These have been examples.") >>> >>> text1 >>> >>> ==>>> >>> I would like the result to be c(5,5,5,5). Notice that sometimes there are >>> leading hard returns, other times not. Sometimes are there separate lists >>> and the same numbers are used in the enumerated items multiple times within >>> each character string. Sometimes the leading numbers for the enumerated >>> items exceed single digits. Notice that the delimiter may be ) or a period >>> (.). If the delimiter is a period and there are hard returns (example 2), >>> then I expect that will be easy enough to differentiate sentences ending >>> with a number from enumerated items. However, I imagine it would be much >>> more difficult to differentiate the two for example 4. >>> >>> Any suggestions are appreciated. >>> >>> Best, >>> >>> Dan >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2017-Apr-26 03:47 UTC
[R] Counting enumerated items in each element of a character vector
stringr::str_count (and stringi::stri_count that it wraps) interpret the pattern argument as a regular expression by default. Best, Ista On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon <jmhannon.ucdavis at gmail.com> wrote:> I like Boris's "Hadley" solution. For the record, I've appended a > version that uses regular expressions, the only benefit of which is > that it could be generalized to find more-complicated patterns. > > -- Mike > > counts <- sapply(text1, function(next_string) { > loc_example <- length(gregexpr("Example", next_string)[[1]]) > loc_example > }, USE.NAMES=FALSE) > >> counts > [1] 5 5 5 5 >> > > On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: >> I should add: there's a str_count() function in the stringr package. >> >> library(stringr) >> str_count(text1, "Example") >> # [1] 5 5 5 5 >> >> I guess that would be the neater solution. >> >> B. >> >> >> >>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: >>> >>> How about: >>> >>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) >>> >>> >>> Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of >>> matches. You can use any regex instead of "example" if you need to tweak what you are looking for. >>> >>> >>> B. >>> >>> >>> >>> >>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> I am looking for a streamlined way of counting the number of enumerated >>>> items are each element of a character vector. For example: >>>> >>>> >>>> text1<-c("This is an example. >>>> List 1 >>>> 1) Example 1 >>>> 2) Example 2 >>>> 10) Example 10 >>>> List 2 >>>> 1) Example 1 >>>> 2) Example 2 >>>> These have been examples.","This is another example. >>>> List 1 >>>> 1. Example 1 >>>> 2. Example 2 >>>> 10. Example 10 >>>> List 2 >>>> 1. Example 1 >>>> 2. Example 2 >>>> These have been examples.","This is a third example. List 1 1) Example 1. >>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >>>> been examples." >>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >>>> 10. List 2 Example 1. 2. Example 2. These have been examples.") >>>> >>>> text1 >>>> >>>> ==>>>> >>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are >>>> leading hard returns, other times not. Sometimes are there separate lists >>>> and the same numbers are used in the enumerated items multiple times within >>>> each character string. Sometimes the leading numbers for the enumerated >>>> items exceed single digits. Notice that the delimiter may be ) or a period >>>> (.). If the delimiter is a period and there are hard returns (example 2), >>>> then I expect that will be easy enough to differentiate sentences ending >>>> with a number from enumerated items. However, I imagine it would be much >>>> more difficult to differentiate the two for example 4. >>>> >>>> Any suggestions are appreciated. >>>> >>>> Best, >>>> >>>> Dan >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.