Starting by the first entry:
A00096:A00096:A00096:A00096:A02178:A02178:A07776
and supposing there aren't any other subvectors identical in the set, the
algorithm will slide through the vector, first in pairs, then in trios, then
in sets of four, etc, and count the occurrences:
A00096:A00096
3
A00096:A02178
1
A02178:A02178
1
A02178:A07776
1
A00096:A00096:A00096
2
A00096:A00096:A02178
1
A00096:A02178:A02178
1
A02178:A02178:A07776
1
A00096:A00096:A00096:A00096
1
A00096:A00096:A00096:A02178
1
A00096:A00096:A02178:A02178
1
A00096:A02178:A02178:A07776
1
A00096:A00096:A00096:A00096:A02178
1
A00096:A00096:A00096:A02178:A02178
1
A00096:A00096:A02178:A02178:A07776
1
A00096:A00096:A00096:A00096:A02178:A02178
1
A00096:A00096:A00096:A02178:A02178:A07776
1
A00096:A00096:A00096:A00096:A02178:A02178:A07776
1
On Fri, Apr 17, 2009 at 1:04 PM, jim holtman <jholtman@gmail.com> wrote:
> Can you provide the output that you would expect from the data you
> gave. I am not sure what you mean by a 'subvector'.
>
> On Fri, Apr 17, 2009 at 5:25 AM, Albert Vilella <avilella@gmail.com>
> wrote:
> > Hi,
> >
> > I've got a list of ~20000 elements that look like this:
> >
> > [1]
> > "A00096:A00096:A00096:A00096:A02178:A02178:A07776"
> >
> > [2]
> > "A00046:A00076:A01101:A04146:A05671:A07169"
> >
> > [3]
> >
>
"A00038:A00932:A02185:A02370:A02818:A02818:A02818:A02818:A04732:A07142:A07142"
> >
> > [4]
> > "A00096:A01352:A01352:A02023:A05001:A05001:A07776"
> >
> > [5]
> >
>
"A00036:A00047:A00059:A00503:A00904:A00904:A00904:A01023:A01023:A01399:A02029:A03941:A07679"
> > [6]
> >
>
"A00041:A00533:A00855:A02178:A02178:A02178:A05671:A05671:A05671:A05671:A05671:A05671:A05671"
> > ...
> >
> > And I would like to have a table with the frequency of occurrences for
> > matching subvectors in all elements, i.e., not
> > only the number of times a vector is found but also how many times a
> > subvector (of at least 2 ids) is found.
> >
> > How can I do that?
> > Thanks in advance,
> > Albert.
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
[[alternative HTML version deleted]]