thr3ads.net - R help - [R] Making a table: collapsing across sub-strings [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Dieter Vanderelst

2007-Oct-03 15:25 UTC

[R] Making a table: collapsing across sub-strings

Hi list,

I'm currently processing textual data and I would really appreciate some
help with one off my problems.

I have a set of strings and I want to count how often each of this
strings appears in this set.

This is not very difficult and can be done as:

TB<-table(my_set)
plot(TB)

However, I also want to collapse across sub-strings. This is, I want a
sub-string ss of string S to be counted as an occurrence of string S.

So, 'abab' should be included in the count of 'ababaaa' and
should not
be listed as a separate entry in the frequency table.

Does somebody has a pointer to a way to do this? I have been checking
out the CRAN packages for handling DNA sequences, but this has not
really brought me closer to a solution.

Thanks,
Dieter Vanderelst

------------------------------------------
Dieter Vanderelst
Eindhoven University of Technology
Faculty of Industrial Design
Designed Intelligence Group
Den Dolech 2
5612 AZ Eindhoven
The Netherlands
Tel +31 40 247 91 11

jim holtman

2007-Oct-03 15:39 UTC

head link

[R] Making a table: collapsing across sub-strings

How do you determine if one string is a subset of another?  Does it
only match at the beginning, or anywhere?  How large is your set of
strings?  Can you use table as you describe and then determine what
the groupings of subsets are and then just add the numbers together?
You can use grep/regexpr to determine if one string is a subset of
another.

On 10/3/07, Dieter Vanderelst <dieter_vanderelst at emailengine.org>
wrote:> Hi list,
>
> I'm currently processing textual data and I would really appreciate
some
> help with one off my problems.
>
> I have a set of strings and I want to count how often each of this
> strings appears in this set.
>
> This is not very difficult and can be done as:
>
> TB<-table(my_set)
> plot(TB)
>
> However, I also want to collapse across sub-strings. This is, I want a
> sub-string ss of string S to be counted as an occurrence of string S.
>
> So, 'abab' should be included in the count of 'ababaaa' and
should not
> be listed as a separate entry in the frequency table.
>
> Does somebody has a pointer to a way to do this? I have been checking
> out the CRAN packages for handling DNA sequences, but this has not
> really brought me closer to a solution.
>
> Thanks,
> Dieter Vanderelst
>
> ------------------------------------------
> Dieter Vanderelst
> Eindhoven University of Technology
> Faculty of Industrial Design
> Designed Intelligence Group
> Den Dolech 2
> 5612 AZ Eindhoven
> The Netherlands
> Tel +31 40 247 91 11
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Seemingly Similar Threads

Search for more maybe matching threads

R help - Oct 2007 - Making a table: collapsing across sub-strings

[R] Making a table: collapsing across sub-strings

[R] Making a table: collapsing across sub-strings

Seemingly Similar Threads