I am looking for simple introduction to cluster analysis using R, that would
be understandable to a novice in statistics. Or, could someone perhaps help
me understand how to proceed in my analysis? I am very new to both statistics
and R, but am trying hard to avoid having to use SPSS as everyone around
me...
I have dataset on people presenting their opinions on different religious
communities coded on 5 point scale, and I want to see if those communities
can be grouped (clustered) in some way that would be illuminatin for my
research purposes.
So, I have data that looks like this:
> describe(R12)
R12
18 Variables 1035 Observations
---------------------------------------------------------------------------
R12.1
n missing unique
416 619 5
More negative (51, 12%), More positive (112, 27%)
Completely negative (41, 10%), Completely positive (23, 6%)
Neutral (189, 45%)
<skip>
R12.12
n missing unique
451 584 5
More negative (111, 25%), More positive (43, 10%)
Completely negative (79, 18%), Completely positive (5, 1%)
Neutral (213, 47%)
<and so on>
So you can see there is a lot (more than half) at times NA's in this
questionnairre.
Here is also a correlation matrix (only part is displayed):
> x=cor(R12, use="pairwise.complete.obs")
> round(x,2)
R12.1 R12.2 R12.3 R12.4 R12.5 R12.6 R12.7 R12.8 R12.9 R12.10 R12.11
R12.1 1.00 0.57 0.57 0.61 0.57 0.48 0.43 0.38 0.52 0.58 0.58
R12.2 0.57 1.00 0.82 0.78 0.73 0.62 0.43 0.49 0.64 0.69 0.75
R12.3 0.57 0.82 1.00 0.89 0.90 0.73 0.54 0.57 0.70 0.77 0.78
R12.4 0.61 0.78 0.89 1.00 0.91 0.68 0.51 0.56 0.65 0.80 0.76
R12.5 0.57 0.73 0.90 0.91 1.00 0.73 0.53 0.55 0.68 0.78 0.74
R12.6 0.48 0.62 0.73 0.68 0.73 1.00 0.59 0.62 0.68 0.79 0.78
R12.7 0.43 0.43 0.54 0.51 0.53 0.59 1.00 0.62 0.55 0.65 0.65
R12.8 0.38 0.49 0.57 0.56 0.55 0.62 0.62 1.00 0.55 0.65 0.62
R12.9 0.52 0.64 0.70 0.65 0.68 0.68 0.55 0.55 1.00 0.79 0.82
R12.10 0.58 0.69 0.77 0.80 0.78 0.79 0.65 0.65 0.79 1.00 0.88
R12.11 0.58 0.75 0.78 0.76 0.74 0.78 0.65 0.62 0.82 0.88 1.00
R12.12 0.47 0.59 0.64 0.65 0.60 0.61 0.56 0.50 0.68 0.77 0.83
R12.13 0.62 0.69 0.77 0.70 0.74 0.76 0.65 0.61 0.78 0.81 0.82
R12.14 0.58 0.70 0.71 0.75 0.70 0.74 0.64 0.62 0.78 0.86 0.86
R12.15 0.58 0.61 0.72 0.72 0.71 0.72 0.64 0.59 0.73 0.83 0.79
R12.16 0.56 0.67 0.77 0.72 0.78 0.75 0.57 0.54 0.75 0.85 0.80
R12.17 0.61 0.69 0.79 0.77 0.75 0.73 0.56 0.57 0.74 0.82 0.80
R12.18 0.63 0.73 0.84 0.82 0.83 0.71 0.54 0.64 0.68 0.71 0.74
so you can see there is a lot of correlation in the opinions. I doubt
clusterization would be meaningfull, but I still want to try.
How do I proceed with this?
--
Donatas Glodenis