Dan Abner
2017-Apr-26 00:14 UTC
[R] Counting enumerated items in each element of a character vector
Hi all, I am looking for a streamlined way of counting the number of enumerated items are each element of a character vector. For example: text1<-c("This is an example. List 1 1) Example 1 2) Example 2 10) Example 10 List 2 1) Example 1 2) Example 2 These have been examples.","This is another example. List 1 1. Example 1 2. Example 2 10. Example 10 List 2 1. Example 1 2. Example 2 These have been examples.","This is a third example. List 1 1) Example 1. 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have been examples." ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example 10. List 2 Example 1. 2. Example 2. These have been examples.") text1 == I would like the result to be c(5,5,5,5). Notice that sometimes there are leading hard returns, other times not. Sometimes are there separate lists and the same numbers are used in the enumerated items multiple times within each character string. Sometimes the leading numbers for the enumerated items exceed single digits. Notice that the delimiter may be ) or a period (.). If the delimiter is a period and there are hard returns (example 2), then I expect that will be easy enough to differentiate sentences ending with a number from enumerated items. However, I imagine it would be much more difficult to differentiate the two for example 4. Any suggestions are appreciated. Best, Dan [[alternative HTML version deleted]]
Boris Steipe
2017-Apr-26 00:23 UTC
[R] Counting enumerated items in each element of a character vector
How about: unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of matches. You can use any regex instead of "example" if you need to tweak what you are looking for. B.> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: > > Hi all, > > I am looking for a streamlined way of counting the number of enumerated > items are each element of a character vector. For example: > > > text1<-c("This is an example. > List 1 > 1) Example 1 > 2) Example 2 > 10) Example 10 > List 2 > 1) Example 1 > 2) Example 2 > These have been examples.","This is another example. > List 1 > 1. Example 1 > 2. Example 2 > 10. Example 10 > List 2 > 1. Example 1 > 2. Example 2 > These have been examples.","This is a third example. List 1 1) Example 1. > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have > been examples." > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example > 10. List 2 Example 1. 2. Example 2. These have been examples.") > > text1 > > ==> > I would like the result to be c(5,5,5,5). Notice that sometimes there are > leading hard returns, other times not. Sometimes are there separate lists > and the same numbers are used in the enumerated items multiple times within > each character string. Sometimes the leading numbers for the enumerated > items exceed single digits. Notice that the delimiter may be ) or a period > (.). If the delimiter is a period and there are hard returns (example 2), > then I expect that will be easy enough to differentiate sentences ending > with a number from enumerated items. However, I imagine it would be much > more difficult to differentiate the two for example 4. > > Any suggestions are appreciated. > > Best, > > Dan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Boris Steipe
2017-Apr-26 00:33 UTC
[R] Counting enumerated items in each element of a character vector
I should add: there's a str_count() function in the stringr package. library(stringr) str_count(text1, "Example") # [1] 5 5 5 5 I guess that would be the neater solution. B.> On Apr 25, 2017, at 8:23 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > How about: > > unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) > > > Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of > matches. You can use any regex instead of "example" if you need to tweak what you are looking for. > > > B. > > > > >> On Apr 25, 2017, at 8:14 PM, Dan Abner <dan.abner99 at gmail.com> wrote: >> >> Hi all, >> >> I am looking for a streamlined way of counting the number of enumerated >> items are each element of a character vector. For example: >> >> >> text1<-c("This is an example. >> List 1 >> 1) Example 1 >> 2) Example 2 >> 10) Example 10 >> List 2 >> 1) Example 1 >> 2) Example 2 >> These have been examples.","This is another example. >> List 1 >> 1. Example 1 >> 2. Example 2 >> 10. Example 10 >> List 2 >> 1. Example 1 >> 2. Example 2 >> These have been examples.","This is a third example. List 1 1) Example 1. >> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >> been examples." >> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >> 10. List 2 Example 1. 2. Example 2. These have been examples.") >> >> text1 >> >> ==>> >> I would like the result to be c(5,5,5,5). Notice that sometimes there are >> leading hard returns, other times not. Sometimes are there separate lists >> and the same numbers are used in the enumerated items multiple times within >> each character string. Sometimes the leading numbers for the enumerated >> items exceed single digits. Notice that the delimiter may be ) or a period >> (.). If the delimiter is a period and there are hard returns (example 2), >> then I expect that will be easy enough to differentiate sentences ending >> with a number from enumerated items. However, I imagine it would be much >> more difficult to differentiate the two for example 4. >> >> Any suggestions are appreciated. >> >> Best, >> >> Dan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.