thr3ads.net - R help - [R] Frequency of a character in a string [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Charles C. Berry

2016-Nov-14 19:55 UTC

[R] Frequency of a character in a string

On Mon, 14 Nov 2016, Marc Schwartz wrote:
>
>> On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccberry at
ucsd.edu> wrote:
>>
>> On Mon, 14 Nov 2016, Bert Gunter wrote:
>>[stuff deleted]
> Hi,
>
> Both gsub() and strsplit() are using regex based pattern matching 
> internally. That being said, they are ultimately calling .Internal code, 
> so both are pretty fast.
>
> For comparison:
>
> ## Create a 1,000,000 character vector
> set.seed(1)
> Vec <- paste(sample(letters, 1000000, replace = TRUE), collapse =
"")
>
>> nchar(Vec)
> [1] 1000000
>
> ## Split the vector into single characters and tabulate
>> table(strsplit(Vec, split = "")[[1]])
>
>    a     b     c     d     e     f     g     h     i     j     k     l
> 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184 38621
>    m     n     o     p     q     r     s     t     u     v     w     x
> 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904 38310
>    y     z
> 38265 38299
>
>
> ## Get just the count of "a"
>> table(strsplit(Vec, split = "")[[1]])["a"]
>    a
> 38664
>
>> nchar(gsub("[^a]", "", Vec))
> [1] 38664
>
>
> ## Check performance
>> system.time(table(strsplit(Vec, split =
"")[[1]])["a"])
>   user  system elapsed
>  0.100   0.007   0.107
>
>> system.time(nchar(gsub("[^a]", "", Vec)))
>   user  system elapsed
>  0.270   0.001   0.272
>
>
> So, the above would suggest that using strsplit() is somewhat faster 
> than using gsub(). However, as Chuck notes, in the absence of more 
> exhaustive benchmarking, the difference may or may not be more 
> generalizable.

Whether splitting on fixed strings rather than treating them as
regex'es (i.e.`fixed=TRUE') makes a big difference seems to depend on
what you split:

First repeating what Marc did...
> system.time(table(strsplit(Vec, split =
"",fixed=TRUE)[[1]])["a"])    user  system elapsed
   0.132   0.010   0.139 > system.time(table(strsplit(Vec, split =
"",fixed=FALSE)[[1]])["a"])    user  system elapsed
   0.130   0.010   0.138

... fixed=TRUE hardly matters. But the idiom I proposed...
> system.time(sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=TRUE)) - 1))    user  system elapsed
   0.017   0.000   0.018 > system.time(sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1))    user  system elapsed
   0.104   0.000   0.104>
... is 5 times faster with fixed=TRUE for this case.

This result matchea Marc's count:
> sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1)
[1] 38664>
Chuck

Bert Gunter

2016-Nov-14 20:23 UTC

head link

[R] Frequency of a character in a string

Chuck, Marc, and anyone else who still has interest in this odd little
discussion ...

Yes, and with fixed = TRUE my approach took 1/3 as much time as
Chuck's with a 10 element vector each element of which is a character
string of length 1e5:
> set.seed(1001)
> x <- sapply(1:10,
function(x)paste0(sample(letters,1e5,rep=TRUE),collapse = ""))
> system.time(sum(lengths(strsplit(paste0("X", x,
"X"),"a",fixed=TRUE)) - 1))   user  system elapsed
  0.012   0.000   0.012> system.time(nchar(gsub("[^a]", "", x,fixed = TRUE)))   user  system elapsed
  0.004   0.000   0.004

Best,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 14, 2016 at 11:55 AM, Charles C. Berry <ccberry at ucsd.edu>
wrote:> On Mon, 14 Nov 2016, Marc Schwartz wrote:
>
>>
>>> On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccberry at
ucsd.edu> wrote:
>>>
>>> On Mon, 14 Nov 2016, Bert Gunter wrote:
>>>
> [stuff deleted]
>
>
>> Hi,
>>
>> Both gsub() and strsplit() are using regex based pattern matching
>> internally. That being said, they are ultimately calling .Internal
code, so
>> both are pretty fast.
>>
>> For comparison:
>>
>> ## Create a 1,000,000 character vector
>> set.seed(1)
>> Vec <- paste(sample(letters, 1000000, replace = TRUE), collapse =
"")
>>
>>> nchar(Vec)
>>
>> [1] 1000000
>>
>> ## Split the vector into single characters and tabulate
>>>
>>> table(strsplit(Vec, split = "")[[1]])
>>
>>
>>    a     b     c     d     e     f     g     h     i     j     k     l
>> 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184 38621
>>    m     n     o     p     q     r     s     t     u     v     w     x
>> 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904 38310
>>    y     z
>> 38265 38299
>>
>>
>> ## Get just the count of "a"
>>>
>>> table(strsplit(Vec, split = "")[[1]])["a"]
>>
>>    a
>> 38664
>>
>>> nchar(gsub("[^a]", "", Vec))
>>
>> [1] 38664
>>
>>
>> ## Check performance
>>>
>>> system.time(table(strsplit(Vec, split =
"")[[1]])["a"])
>>
>>   user  system elapsed
>>  0.100   0.007   0.107
>>
>>> system.time(nchar(gsub("[^a]", "", Vec)))
>>
>>   user  system elapsed
>>  0.270   0.001   0.272
>>
>>
>> So, the above would suggest that using strsplit() is somewhat faster
than
>> using gsub(). However, as Chuck notes, in the absence of more
exhaustive
>> benchmarking, the difference may or may not be more generalizable.
>
>
>
> Whether splitting on fixed strings rather than treating them as
> regex'es (i.e.`fixed=TRUE') makes a big difference seems to depend
on
> what you split:
>
> First repeating what Marc did...
>
>> system.time(table(strsplit(Vec, split =
"",fixed=TRUE)[[1]])["a"])
>
>    user  system elapsed
>   0.132   0.010   0.139
>>
>> system.time(table(strsplit(Vec, split =
"",fixed=FALSE)[[1]])["a"])
>
>    user  system elapsed
>   0.130   0.010   0.138
>
> ... fixed=TRUE hardly matters. But the idiom I proposed...
>
>> system.time(sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=TRUE)) -
>> 1))
>
>    user  system elapsed
>   0.017   0.000   0.018
>>
>> system.time(sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) -
>> 1))
>
>    user  system elapsed
>   0.104   0.000   0.104
>>
>>
>
> ... is 5 times faster with fixed=TRUE for this case.
>
> This result matchea Marc's count:
>
>> sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1)
>
> [1] 38664
>>
>>
>
> Chuck

Hervé Pagès

2016-Nov-14 20:26 UTC

head link

[R] Frequency of a character in a string

Hi,

FWIW using gsub( , fixed=TRUE) is faster than using gsub( , fixed=FALSE)
or strsplit( , fixed=TRUE):

   set.seed(1)
   Vec <- paste(sample(letters, 5000000, replace = TRUE), collapse =
"")

   system.time(res1 <- nchar(gsub("[^a]", "", Vec)))
   #  user  system elapsed
   # 0.585   0.000   0.586

   system.time(res2 <- lengths(strsplit(Vec,"a",fixed=TRUE)) - 1L)
   #  user  system elapsed
   # 0.061   0.000   0.061

   system.time(res3 <- nchar(Vec) - nchar(gsub("a", "",
Vec, fixed=TRUE)))
   #  user  system elapsed
   # 0.039   0.000   0.039

   identical(res1, res2)
   # [1] TRUE
   identical(res1, res3)
   # [1] TRUE

The gsub( , fixed=TRUE) solution also uses slightly less memory than the
strsplit( , fixed=TRUE) solution.

Cheers,
H.


On 11/14/2016 11:55 AM, Charles C. Berry wrote:> On Mon, 14 Nov 2016, Marc Schwartz wrote:
>
>>
>>> On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccberry at
ucsd.edu> wrote:
>>>
>>> On Mon, 14 Nov 2016, Bert Gunter wrote:
>>>
> [stuff deleted]
>
>> Hi,
>>
>> Both gsub() and strsplit() are using regex based pattern matching
>> internally. That being said, they are ultimately calling .Internal
>> code, so both are pretty fast.
>>
>> For comparison:
>>
>> ## Create a 1,000,000 character vector
>> set.seed(1)
>> Vec <- paste(sample(letters, 1000000, replace = TRUE), collapse =
"")
>>
>>> nchar(Vec)
>> [1] 1000000
>>
>> ## Split the vector into single characters and tabulate
>>> table(strsplit(Vec, split = "")[[1]])
>>
>>    a     b     c     d     e     f     g     h     i     j     k     l
>> 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184 38621
>>    m     n     o     p     q     r     s     t     u     v     w     x
>> 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904 38310
>>    y     z
>> 38265 38299
>>
>>
>> ## Get just the count of "a"
>>> table(strsplit(Vec, split = "")[[1]])["a"]
>>    a
>> 38664
>>
>>> nchar(gsub("[^a]", "", Vec))
>> [1] 38664
>>
>>
>> ## Check performance
>>> system.time(table(strsplit(Vec, split =
"")[[1]])["a"])
>>   user  system elapsed
>>  0.100   0.007   0.107
>>
>>> system.time(nchar(gsub("[^a]", "", Vec)))
>>   user  system elapsed
>>  0.270   0.001   0.272
>>
>>
>> So, the above would suggest that using strsplit() is somewhat faster
>> than using gsub(). However, as Chuck notes, in the absence of more
>> exhaustive benchmarking, the difference may or may not be more
>> generalizable.
>
>
> Whether splitting on fixed strings rather than treating them as
> regex'es (i.e.`fixed=TRUE') makes a big difference seems to depend
on
> what you split:
>
> First repeating what Marc did...
>
>> system.time(table(strsplit(Vec, split =
"",fixed=TRUE)[[1]])["a"])
>    user  system elapsed
>   0.132   0.010   0.139
>> system.time(table(strsplit(Vec, split =
"",fixed=FALSE)[[1]])["a"])
>    user  system elapsed
>   0.130   0.010   0.138
>
> ... fixed=TRUE hardly matters. But the idiom I proposed...
>
>> system.time(sum(lengths(strsplit(paste0("X", Vec,
>> "X"),"a",fixed=TRUE)) - 1))
>    user  system elapsed
>   0.017   0.000   0.018
>> system.time(sum(lengths(strsplit(paste0("X", Vec,
>> "X"),"a",fixed=FALSE)) - 1))
>    user  system elapsed
>   0.104   0.000   0.104
>>
>
> ... is 5 times faster with fixed=TRUE for this case.
>
> This result matchea Marc's count:
>
>> sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1)
> [1] 38664
>>
>
> Chuck
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Bert Gunter

2016-Nov-14 20:44 UTC

head link

[R] Frequency of a character in a string

(Sheepishly)...

Yes, thank you Herv?. It would have been nice if I had given correct
soutions. Fixed = TRUE could not have of course worked with ["a"]
character class!

Here's what I found with a 10 element vector each member of which is a
1e5 length string:
> system.time((lengths(strsplit(paste0("X", x,
"X"),"a",fixed=TRUE)) - 1))   user  system elapsed
  0.013   0.000   0.013
> system.time(nchar(gsub("[^a]", "", x,fixed = FALSE)))   user  system elapsed
  0.251   0.000   0.252
## WAYYYY slower

> system.time(nchar(x) - nchar(gsub("a", "", x,fixed =
TRUE)))   user  system elapsed
  0.007   0.000   0.007
## twice as fast



Clearly and unsurprisingly, the message is to avoid fixed = FALSE;
after that, it seems mostly to be: who cares?!


Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 14, 2016 at 12:26 PM, Herv? Pag?s <hpages at fredhutch.org>
wrote:> Hi,
>
> FWIW using gsub( , fixed=TRUE) is faster than using gsub( , fixed=FALSE)
> or strsplit( , fixed=TRUE):
>
>   set.seed(1)
>   Vec <- paste(sample(letters, 5000000, replace = TRUE), collapse =
"")
>
>   system.time(res1 <- nchar(gsub("[^a]", "", Vec)))
>   #  user  system elapsed
>   # 0.585   0.000   0.586
>
>   system.time(res2 <- lengths(strsplit(Vec,"a",fixed=TRUE)) -
1L)
>   #  user  system elapsed
>   # 0.061   0.000   0.061
>
>   system.time(res3 <- nchar(Vec) - nchar(gsub("a",
"", Vec, fixed=TRUE)))
>   #  user  system elapsed
>   # 0.039   0.000   0.039
>
>   identical(res1, res2)
>   # [1] TRUE
>   identical(res1, res3)
>   # [1] TRUE
>
> The gsub( , fixed=TRUE) solution also uses slightly less memory than the
> strsplit( , fixed=TRUE) solution.
>
> Cheers,
> H.
>
>
> On 11/14/2016 11:55 AM, Charles C. Berry wrote:
>>
>> On Mon, 14 Nov 2016, Marc Schwartz wrote:
>>
>>>
>>>> On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccberry at
ucsd.edu> wrote:
>>>>
>>>> On Mon, 14 Nov 2016, Bert Gunter wrote:
>>>>
>> [stuff deleted]
>>
>>> Hi,
>>>
>>> Both gsub() and strsplit() are using regex based pattern matching
>>> internally. That being said, they are ultimately calling .Internal
>>> code, so both are pretty fast.
>>>
>>> For comparison:
>>>
>>> ## Create a 1,000,000 character vector
>>> set.seed(1)
>>> Vec <- paste(sample(letters, 1000000, replace = TRUE), collapse
= "")
>>>
>>>> nchar(Vec)
>>>
>>> [1] 1000000
>>>
>>> ## Split the vector into single characters and tabulate
>>>>
>>>> table(strsplit(Vec, split = "")[[1]])
>>>
>>>
>>>    a     b     c     d     e     f     g     h     i     j     k   
l
>>> 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184
38621
>>>    m     n     o     p     q     r     s     t     u     v     w   
x
>>> 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904
38310
>>>    y     z
>>> 38265 38299
>>>
>>>
>>> ## Get just the count of "a"
>>>>
>>>> table(strsplit(Vec, split = "")[[1]])["a"]
>>>
>>>    a
>>> 38664
>>>
>>>> nchar(gsub("[^a]", "", Vec))
>>>
>>> [1] 38664
>>>
>>>
>>> ## Check performance
>>>>
>>>> system.time(table(strsplit(Vec, split =
"")[[1]])["a"])
>>>
>>>   user  system elapsed
>>>  0.100   0.007   0.107
>>>
>>>> system.time(nchar(gsub("[^a]", "", Vec)))
>>>
>>>   user  system elapsed
>>>  0.270   0.001   0.272
>>>
>>>
>>> So, the above would suggest that using strsplit() is somewhat
faster
>>> than using gsub(). However, as Chuck notes, in the absence of more
>>> exhaustive benchmarking, the difference may or may not be more
>>> generalizable.
>>
>>
>>
>> Whether splitting on fixed strings rather than treating them as
>> regex'es (i.e.`fixed=TRUE') makes a big difference seems to
depend on
>> what you split:
>>
>> First repeating what Marc did...
>>
>>> system.time(table(strsplit(Vec, split =
"",fixed=TRUE)[[1]])["a"])
>>
>>    user  system elapsed
>>   0.132   0.010   0.139
>>>
>>> system.time(table(strsplit(Vec, split =
"",fixed=FALSE)[[1]])["a"])
>>
>>    user  system elapsed
>>   0.130   0.010   0.138
>>
>> ... fixed=TRUE hardly matters. But the idiom I proposed...
>>
>>> system.time(sum(lengths(strsplit(paste0("X", Vec,
>>> "X"),"a",fixed=TRUE)) - 1))
>>
>>    user  system elapsed
>>   0.017   0.000   0.018
>>>
>>> system.time(sum(lengths(strsplit(paste0("X", Vec,
>>> "X"),"a",fixed=FALSE)) - 1))
>>
>>    user  system elapsed
>>   0.104   0.000   0.104
>>>
>>>
>>
>> ... is 5 times faster with fixed=TRUE for this case.
>>
>> This result matchea Marc's count:
>>
>>> sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1)
>>
>> [1] 38664
>>>
>>>
>>
>> Chuck
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> Herv? Pag?s
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

William Dunlap

2016-Nov-14 20:57 UTC

head link

[R] Frequency of a character in a string

Here is another variant, v3, and a change to your first example
so it returns the same value as your second example.
> set.seed(1001)
> x <- sapply(1:100,function(x)paste0(sample(letters,rpois(1,1e5),rep=TRUE),collapse =
""))> system.time(v1 <- lengths(strsplit(paste0("X", x,
"X"),"a",fixed=TRUE)) -1)
   user  system elapsed
   0.47    0.00    0.49> system.time(v2 <- nchar(gsub("[^a]", "", x)))   user  system elapsed
   2.53    0.00    2.53> system.time(v3 <- nchar(x) - nchar(gsub("a", "", x,
fixed=TRUE)))   user  system elapsed
   0.08    0.00    0.08>
> all.equal(v1,v2)
[1] TRUE> all.equal(v1,v3)[1] TRUE


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Nov 14, 2016 at 12:23 PM, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Chuck, Marc, and anyone else who still has interest in this odd little
> discussion ...
>
> Yes, and with fixed = TRUE my approach took 1/3 as much time as
> Chuck's with a 10 element vector each element of which is a character
> string of length 1e5:
>
> > set.seed(1001)
> > x <- sapply(1:10,
function(x)paste0(sample(letters,1e5,rep=TRUE),collapse
> = ""))
>
> > system.time(sum(lengths(strsplit(paste0("X", x,
"X"),"a",fixed=TRUE)) -
> 1))
>    user  system elapsed
>   0.012   0.000   0.012
> > system.time(nchar(gsub("[^a]", "", x,fixed =
TRUE)))
>    user  system elapsed
>   0.004   0.000   0.004
>
> Best,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Mon, Nov 14, 2016 at 11:55 AM, Charles C. Berry <ccberry at
ucsd.edu>
> wrote:
> > On Mon, 14 Nov 2016, Marc Schwartz wrote:
> >
> >>
> >>> On Nov 14, 2016, at 11:26 AM, Charles C. Berry <ccberry at
ucsd.edu>
> wrote:
> >>>
> >>> On Mon, 14 Nov 2016, Bert Gunter wrote:
> >>>
> > [stuff deleted]
> >
> >
> >> Hi,
> >>
> >> Both gsub() and strsplit() are using regex based pattern matching
> >> internally. That being said, they are ultimately calling .Internal
> code, so
> >> both are pretty fast.
> >>
> >> For comparison:
> >>
> >> ## Create a 1,000,000 character vector
> >> set.seed(1)
> >> Vec <- paste(sample(letters, 1000000, replace = TRUE), collapse
= "")
> >>
> >>> nchar(Vec)
> >>
> >> [1] 1000000
> >>
> >> ## Split the vector into single characters and tabulate
> >>>
> >>> table(strsplit(Vec, split = "")[[1]])
> >>
> >>
> >>    a     b     c     d     e     f     g     h     i     j     k  
l
> >> 38664 38442 38282 38496 38540 38623 38548 38288 38143 38493 38184
38621
> >>    m     n     o     p     q     r     s     t     u     v     w  
x
> >> 38306 38725 38705 38144 38529 38809 38575 38355 38386 38364 38904
38310
> >>    y     z
> >> 38265 38299
> >>
> >>
> >> ## Get just the count of "a"
> >>>
> >>> table(strsplit(Vec, split = "")[[1]])["a"]
> >>
> >>    a
> >> 38664
> >>
> >>> nchar(gsub("[^a]", "", Vec))
> >>
> >> [1] 38664
> >>
> >>
> >> ## Check performance
> >>>
> >>> system.time(table(strsplit(Vec, split =
"")[[1]])["a"])
> >>
> >>   user  system elapsed
> >>  0.100   0.007   0.107
> >>
> >>> system.time(nchar(gsub("[^a]", "", Vec)))
> >>
> >>   user  system elapsed
> >>  0.270   0.001   0.272
> >>
> >>
> >> So, the above would suggest that using strsplit() is somewhat
faster
> than
> >> using gsub(). However, as Chuck notes, in the absence of more
exhaustive
> >> benchmarking, the difference may or may not be more generalizable.
> >
> >
> >
> > Whether splitting on fixed strings rather than treating them as
> > regex'es (i.e.`fixed=TRUE') makes a big difference seems to
depend on
> > what you split:
> >
> > First repeating what Marc did...
> >
> >> system.time(table(strsplit(Vec, split =
"",fixed=TRUE)[[1]])["a"])
> >
> >    user  system elapsed
> >   0.132   0.010   0.139
> >>
> >> system.time(table(strsplit(Vec, split =
"",fixed=FALSE)[[1]])["a"])
> >
> >    user  system elapsed
> >   0.130   0.010   0.138
> >
> > ... fixed=TRUE hardly matters. But the idiom I proposed...
> >
> >> system.time(sum(lengths(strsplit(paste0("X", Vec,
> "X"),"a",fixed=TRUE)) -
> >> 1))
> >
> >    user  system elapsed
> >   0.017   0.000   0.018
> >>
> >> system.time(sum(lengths(strsplit(paste0("X", Vec,
> "X"),"a",fixed=FALSE)) -
> >> 1))
> >
> >    user  system elapsed
> >   0.104   0.000   0.104
> >>
> >>
> >
> > ... is 5 times faster with fixed=TRUE for this case.
> >
> > This result matchea Marc's count:
> >
> >> sum(lengths(strsplit(paste0("X", Vec,
"X"),"a",fixed=FALSE)) - 1)
> >
> > [1] 38664
> >>
> >>
> >
> > Chuck
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Nov 2016 - Frequency of a character in a string

[R] Frequency of a character in a string

[R] Frequency of a character in a string

[R] Frequency of a character in a string

[R] Frequency of a character in a string

[R] Frequency of a character in a string