How do you determine if one string is a subset of another? Does it
only match at the beginning, or anywhere? How large is your set of
strings? Can you use table as you describe and then determine what
the groupings of subsets are and then just add the numbers together?
You can use grep/regexpr to determine if one string is a subset of
another.
On 10/3/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org>
wrote:> Hi list,
>
> I'm currently processing textual data and I would really appreciate
some
> help with one off my problems.
>
> I have a set of strings and I want to count how often each of this
> strings appears in this set.
>
> This is not very difficult and can be done as:
>
> TB<-table(my_set)
> plot(TB)
>
> However, I also want to collapse across sub-strings. This is, I want a
> sub-string ss of string S to be counted as an occurrence of string S.
>
> So, 'abab' should be included in the count of 'ababaaa' and
should not
> be listed as a separate entry in the frequency table.
>
> Does somebody has a pointer to a way to do this? I have been checking
> out the CRAN packages for handling DNA sequences, but this has not
> really brought me closer to a solution.
>
> Thanks,
> Dieter Vanderelst
>
> ------------------------------------------
> Dieter Vanderelst
> Eindhoven University of Technology
> Faculty of Industrial Design
> Designed Intelligence Group
> Den Dolech 2
> 5612 AZ Eindhoven
> The Netherlands
> Tel +31 40 247 91 11
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?