This is a good example of what I'm looking for:
[image: dendrogram.jpg]
Best
On Thu, Jul 29, 2010 at 12:01 AM, Pablo Cerdeira
<pablo.cerdeira@gmail.com>wrote:
>
> Dear all,
>
> I'm trying to use some technic to do a pattern recognition over a large
> dataset. I really don't have any idea on how to do that using R.
>
> Here is a sample of the data:
>
> id,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> 1480010,208,69,180,465,465,241,241,69,584,26,75,578,507,75,284
> 1480183,208,69,352,476,531,495,163,241,69,584,69,584,69,484,69
> 1480210,208,69,352,465,476,369,495,241,69,584,69,584,69,54,497
> 1480234,208,69,180,465,241,69,69,584,54,583,352,497,3,158,3
> 1480556,208,69,180,151,497,151,465,241,69,151,3,25,516,405,158
> 1481098,208,69,465,241,69,584,241,584,69,180,497,369,584,75,284
> 1482149,208,69,180,465,241,69,584,507,584,69,151,3,158,3,336
> 1482269,208,69,180,241,69,507,476,69,584,507,69,516,484,484,3
> 1482386,208,69,180,180,69,180,69,352,465,531,495,163,241,69,578
> 1482422,208,471,69,180,465,241,584,507,561,390,75,284,497,163,34
> 1482662,336,369,75,495,34,,,,,,,,,,
> 1482887,471,74,180,584,390,74,180,238,497,208,69,484,238,465,238
> 1482892,521,584,471,74,180,180,584,497,497,507,507,74,390,74,513
> 1483275,471,74,180,497,208,69,484,465,465,531,495,241,163,241,69
> 1483376,74,180,471,497,208,69,484,465,465,531,495,163,241,241,69
> 1484082,180,497,208,69,163,69,163,69,180,497,497,369,69,465,241
> 1484501,208,69,476,69,584,507,476,497,369,584,69,54,3,336,495
> 1484555,208,69,484,238,465,238,495,163,241,69,584,69,584,69,516
> 1484738,336,495,34,475,391,,,,,,,,,,
>
> The column id is the identity of the object. After that, the columns 1, 2,
> 3 ... brings me some information about the object in a sequence.
>
> I'd like to recognize the patterns. I.E.:
>
> - As you can see, the number "208" os the most common value in
the column
> 1. I have "208" 12 times over 20. Or 60%.
> - Usually, after a "208", I have a "69" in the column
2. Or 100% when the
> first column is "208".
> - In the column 3 we can find a fork. Sometimes I have a "180"
(line 1),
> sometimes a "352".
>
> I'd like to identify this patterns, plotting 2 graphs:
>
> - A dendogram showing the chances of a pattern to occur to each possible
> combination.
> - A dispersion graph, identifying the possible clusters.
>
> Does anybody have any idea on how to do something like this?
>
> Many thanks, in advanced,
>
>
> --
> *Pablo de Camargo Cerdeira*
> pablo@fgv.br
> pablo.cerdeira@gmail.com
> +55 (21) 3799-6065
>
>
--
*Pablo de Camargo Cerdeira*
pablo@fgv.br
pablo.cerdeira@gmail.com
+55 (21) 3799-6065
[[alternative HTML version deleted]]