thr3ads.net - R help - [R] Pattern recognition [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Pablo Cerdeira

2010-Jul-29 03:01 UTC

[R] Pattern recognition

Dear all,

I'm trying to use some technic to do a pattern recognition over a large
dataset. I really don't have any idea on how to do that using R.

Here is a sample of the data:

id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284
1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69
1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497
1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3
1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158
1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284
1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336
1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3
1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578
1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34
1482662,336,369,75,495,34,,,,,,,,,,
1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238
1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513
1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69
1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69
1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241
1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495
1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516
1484738,336,495,34,475,391,,,,,,,,,,

The column id is the identity of the object. After that, the columns 1, 2, 3
... brings me some information about the object in a sequence.

I'd like to recognize the patterns. I.E.:

- As you can see, the number "208" os the most common value in the
column 1.
I have "208" 12 times over 20. Or 60%.
- Usually, after a "208", I have a "69" in the column 2. Or
100% when the
first column is "208".
- In the column 3 we can find a fork. Sometimes I have a "180" (line
1),
sometimes a "352".

I'd like to identify this patterns, plotting 2 graphs:

- A dendogram showing the chances of a pattern to occur to each possible
combination.
- A dispersion graph, identifying the possible clusters.

Does anybody have any idea on how to do something like this?

Many thanks, in advanced,


-- 
*Pablo de Camargo Cerdeira*
pablo@fgv.br
pablo.cerdeira@gmail.com
+55 (21) 3799-6065

	[[alternative HTML version deleted]]

Pablo Cerdeira

2010-Jul-29 03:16 UTC

head link

[R] Pattern recognition

This is a good example of what I'm looking for:

[image: dendrogram.jpg]


Best

On Thu, Jul 29, 2010 at 12:01 AM, Pablo Cerdeira
<pablo.cerdeira@gmail.com>wrote:
>
> Dear all,
>
> I'm trying to use some technic to do a pattern recognition over a large
> dataset. I really don't have any idea on how to do that using R.
>
> Here is a sample of the data:
>
> id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> 1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284
> 1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69
> 1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497
> 1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3
> 1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158
> 1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284
> 1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336
> 1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3
> 1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578
> 1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34
> 1482662,336,369,75,495,34,,,,,,,,,,
> 1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238
> 1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513
> 1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69
> 1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69
> 1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241
> 1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495
> 1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516
> 1484738,336,495,34,475,391,,,,,,,,,,
>
> The column id is the identity of the object. After that, the columns 1, 2,
> 3 ... brings me some information about the object in a sequence.
>
> I'd like to recognize the patterns. I.E.:
>
> - As you can see, the number "208" os the most common value in
the column
> 1. I have "208" 12 times over 20. Or 60%.
> - Usually, after a "208", I have a "69" in the column
2. Or 100% when the
> first column is "208".
> - In the column 3 we can find a fork. Sometimes I have a "180"
(line 1),
> sometimes a "352".
>
> I'd like to identify this patterns, plotting 2 graphs:
>
> - A dendogram showing the chances of a pattern to occur to each possible
> combination.
> - A dispersion graph, identifying the possible clusters.
>
> Does anybody have any idea on how to do something like this?
>
> Many thanks, in advanced,
>
>
> --
> *Pablo de Camargo Cerdeira*
> pablo@fgv.br
> pablo.cerdeira@gmail.com
> +55 (21) 3799-6065
>
>

-- 
*Pablo de Camargo Cerdeira*
pablo@fgv.br
pablo.cerdeira@gmail.com
+55 (21) 3799-6065

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jul 2010 - Pattern recognition

[R] Pattern recognition

[R] Pattern recognition

Seemingly Similar Threads