Boris Steipe
2017-Apr-26 00:33 UTC
[R] Counting enumerated items in each element of a character vector
I should add: there's a str_count() function in the stringr package. library(stringr) str_count(text1, "Example") # [1] 5 5 5 5 I guess that would be the neater solution. B.> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > How about: > > unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) > > > Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of > matches. You can use any regex instead of "example" if you need to tweak what you are looking for. > > > B. > > > > >> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >> >> Hi all, >> >> I am looking for a streamlined way of counting the number of enumerated >> items are each element of a character vector. For example: >> >> >> text1<-c("This is an example. >> List 1 >> 1) Example 1 >> 2) Example 2 >> 10) Example 10 >> List 2 >> 1) Example 1 >> 2) Example 2 >> These have been examples.","This is another example. >> List 1 >> 1. Example 1 >> 2. Example 2 >> 10. Example 10 >> List 2 >> 1. Example 1 >> 2. Example 2 >> These have been examples.","This is a third example. List 1 1) Example 1. >> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >> been examples." >> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >> 10. List 2 Example 1. 2. Example 2. These have been examples.") >> >> text1 >> >> ==>> >> I would like the result to be c(5,5,5,5). Notice that sometimes there are >> leading hard returns, other times not. Sometimes are there separate lists >> and the same numbers are used in the enumerated items multiple times within >> each character string. Sometimes the leading numbers for the enumerated >> items exceed single digits. Notice that the delimiter may be ) or a period >> (.). If the delimiter is a period and there are hard returns (example 2), >> then I expect that will be easy enough to differentiate sentences ending >> with a number from enumerated items. However, I imagine it would be much >> more difficult to differentiate the two for example 4. >> >> Any suggestions are appreciated. >> >> Best, >> >> Dan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Michael Hannon
2017-Apr-26 03:40 UTC
[R] Counting enumerated items in each element of a character vector
I like Boris's "Hadley" solution. For the record, I've
appended a
version that uses regular expressions, the only benefit of which is
that it could be generalized to find more-complicated patterns.
-- Mike
counts <- sapply(text1, function(next_string) {
loc_example <- length(gregexpr("Example", next_string)[[1]])
loc_example
}, USE.NAMES=FALSE)
> counts
[1] 5 5 5 5>
On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at
utoronto.ca> wrote:> I should add: there's a str_count() function in the stringr package.
>
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
>
> I guess that would be the neater solution.
>
> B.
>
>
>
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at
utoronto.ca> wrote:
>>
>> How about:
>>
>> unlist(lapply(strsplit(text1, "Example"), function(x) {
length(x) - 1 } ))
>>
>>
>> Splitting your string on the five "Examples" in each gives
six elements. length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you
need to tweak what you are looking for.
>>
>>
>> B.
>>
>>
>>
>>
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at
gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am looking for a streamlined way of counting the number of
enumerated
>>> items are each element of a character vector. For example:
>>>
>>>
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List
1 1) Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2.
10. Example
>>> 10. List 2 Example 1. 2. Example 2. These have been
examples.")
>>>
>>> text1
>>>
>>> ==>>>
>>> I would like the result to be c(5,5,5,5). Notice that sometimes
there are
>>> leading hard returns, other times not. Sometimes are there separate
lists
>>> and the same numbers are used in the enumerated items multiple
times within
>>> each character string. Sometimes the leading numbers for the
enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a
period
>>> (.). If the delimiter is a period and there are hard returns
(example 2),
>>> then I expect that will be easy enough to differentiate sentences
ending
>>> with a number from enumerated items. However, I imagine it would be
much
>>> more difficult to differentiate the two for example 4.
>>>
>>> Any suggestions are appreciated.
>>>
>>> Best,
>>>
>>> Dan
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Ista Zahn
2017-Apr-26 03:47 UTC
[R] Counting enumerated items in each element of a character vector
stringr::str_count (and stringi::stri_count that it wraps) interpret the pattern argument as a regular expression by default. Best, Ista On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon <jmhannon.ucdavis at gmail.com> wrote:> I like Boris's "Hadley" solution. For the record, I've appended a > version that uses regular expressions, the only benefit of which is > that it could be generalized to find more-complicated patterns. > > -- Mike > > counts <- sapply(text1, function(next_string) { > loc_example <- length(gregexpr("Example", next_string)[[1]]) > loc_example > }, USE.NAMES=FALSE) > >> counts > [1] 5 5 5 5 >> > > On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: >> I should add: there's a str_count() function in the stringr package. >> >> library(stringr) >> str_count(text1, "Example") >> # [1] 5 5 5 5 >> >> I guess that would be the neater solution. >> >> B. >> >> >> >>> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: >>> >>> How about: >>> >>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) >>> >>> >>> Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of >>> matches. You can use any regex instead of "example" if you need to tweak what you are looking for. >>> >>> >>> B. >>> >>> >>> >>> >>>> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> I am looking for a streamlined way of counting the number of enumerated >>>> items are each element of a character vector. For example: >>>> >>>> >>>> text1<-c("This is an example. >>>> List 1 >>>> 1) Example 1 >>>> 2) Example 2 >>>> 10) Example 10 >>>> List 2 >>>> 1) Example 1 >>>> 2) Example 2 >>>> These have been examples.","This is another example. >>>> List 1 >>>> 1. Example 1 >>>> 2. Example 2 >>>> 10. Example 10 >>>> List 2 >>>> 1. Example 1 >>>> 2. Example 2 >>>> These have been examples.","This is a third example. List 1 1) Example 1. >>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >>>> been examples." >>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >>>> 10. List 2 Example 1. 2. Example 2. These have been examples.") >>>> >>>> text1 >>>> >>>> ==>>>> >>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are >>>> leading hard returns, other times not. Sometimes are there separate lists >>>> and the same numbers are used in the enumerated items multiple times within >>>> each character string. Sometimes the leading numbers for the enumerated >>>> items exceed single digits. Notice that the delimiter may be ) or a period >>>> (.). If the delimiter is a period and there are hard returns (example 2), >>>> then I expect that will be easy enough to differentiate sentences ending >>>> with a number from enumerated items. However, I imagine it would be much >>>> more difficult to differentiate the two for example 4. >>>> >>>> Any suggestions are appreciated. >>>> >>>> Best, >>>> >>>> Dan >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.