Hi everyone, I'm observing what I believe is weird behaviour when attempting to do something very simple. I want a correlation matrix, but my matrix seems to contain correlation values that are not found when executed on pairs:> test2$P2[1] 2 2 4 4 1 3 2 4 3 3 2 3 4 1 2 2 4 3 4 1 2 3 2 1 3> test2$HP_tot[1] 10 10 10 10 10 10 10 10 136 136 136 136 136 136 136 136 136 136 15 [20] 15 15 15 15 15 15 c=cor(test2$P3,test2$HP_tot,method='spearman')> c[1] -0.2182876> c=cor(test2,method='spearman')Warning message: In cor(test2, method = "spearman") : the standard deviation is zero> write(c,file='out.csv')from my spreadsheet -0.25028783918741 Most cells are correct, but not that one. If this is expected behaviour, I apologise for bothering you, I read the documentation, but I do not know if the calculation of matrices and pairs is done using the same function (eg, with respect to equal value observations). If this is not a desired behaviour, I noticed that it only occurs with a relatively large matrix (I couldn't reproduce on a simple 2 column data set). There might be a naming error.> names(test2)[1] "ID" "NOMBRE" "MAIL" [4] "Age" "SEXO" "Studies" [7] "Hours_Internet" "Vision.Disabilities" "Other.disabilities" [10] "Technology_Knowledge" "Start_Time" "End_Time" [13] "Duration" "P1" "P1Book" [16] "P1DVD" "P2" "P3" [19] "P4" "P5" "P6" [22] "P8" "P9" "P10" [25] "P11" "P12" "P7" [28] "SITE" "Errors" "warnings" [31] "Manual" "Total" "H_tot" [34] "HP1.1" "HP1.2" "HP1.3" [37] "HP1.4" "HP_tot" "HO1.1" [40] "HO1.2" "HO1.3" "HO1.4" [43] "HO_tot" "HU1.1" "HU1.2" [46] "HU1.3" "HU_tot" "HR" [49] "L_tot" "LP1.1" "LP1.2" [52] "LP1.3" "LP1.4" "LP_tot" [55] "LO1.1" "LO1.2" "LO1.3" [58] "LO1.4" "LO_tot" "LU1.1" [61] "LU1.2" "LU1.3" "LU_tot" [64] "LR_tot" "SP_tot" "SP1.1" [67] "SP1.2" "SP1.3" "SP1.4" [70] "SP_tot.1" "SO1.1" "SO1.2" [73] "SO1.3" "SO1.4" "SO_tot" [76] "SU1.1" "SU1.2" "SU1.3" [79] "SU_tot" "SR" Thank you in advance, Stephane Vaucher
Hi,
Does your data have missing values? I am not sure it would change
anything, but perhaps try adding:
cor(test2, method = "spearman", use =
"pairwise.complete.obs")
or something of the like. I am not sure what R does by default. My
reasoning stems from this particular passage in the documentation:
If ?use? is ?"everything"?, ?NA?s will propagate conceptually,
i.e., a resulting value will be ?NA? whenever one of its
contributing observations is ?NA?.
I do not think the names should make a difference (unless you're
talking about human error).
Best regards,
Josh
On Wed, Sep 8, 2010 at 12:35 PM, Stephane Vaucher
<vauchers at iro.umontreal.ca> wrote:> Hi everyone,
>
> I'm observing what I believe is weird behaviour when attempting to do
> something very simple. I want a correlation matrix, but my matrix seems to
> contain correlation values that are not found when executed on pairs:
>
>> test2$P2
>
> ?[1] 2 2 4 4 1 3 2 4 3 3 2 3 4 1 2 2 4 3 4 1 2 3 2 1 3
>>
>> test2$HP_tot
>
> ?[1] ?10 ?10 ?10 ?10 ?10 ?10 ?10 ?10 136 136 136 136 136 136 136 136 136
136
> ?15
> [20] ?15 ?15 ?15 ?15 ?15 ?15
c=cor(test2$P3,test2$HP_tot,method='spearman')
>>
>> c
>
> [1] -0.2182876
>>
>> c=cor(test2,method='spearman')
>
> Warning message:
> In cor(test2, method = "spearman") : the standard deviation is
zero
>>
>> write(c,file='out.csv')
>
> from my spreadsheet
> -0.25028783918741
>
> Most cells are correct, but not that one.
>
> If this is expected behaviour, I apologise for bothering you, I read the
> documentation, but I do not know if the calculation of matrices and pairs
is
> done using the same function (eg, with respect to equal value
observations).
>
> If this is not a desired behaviour, I noticed that it only occurs with a
> relatively large matrix (I couldn't reproduce on a simple 2 column data
> set). There might be a naming error.
>
>> names(test2)
>
> ?[1] "ID" ? ? ? ? ? ? ? ? ? "NOMBRE" ? ? ? ? ? ? ?
"MAIL"
> ?[4] "Age" ? ? ? ? ? ? ? ? ?"SEXO" ? ? ? ? ? ? ? ?
"Studies"
> ?[7] "Hours_Internet" ? ? ? "Vision.Disabilities"
?"Other.disabilities"
> [10] "Technology_Knowledge" "Start_Time" ? ? ? ? ?
"End_Time"
> [13] "Duration" ? ? ? ? ? ? "P1" ? ? ? ? ? ? ? ? ?
"P1Book"
> [16] "P1DVD" ? ? ? ? ? ? ? ?"P2" ? ? ? ? ? ? ? ? ?
"P3"
> [19] "P4" ? ? ? ? ? ? ? ? ? "P5" ? ? ? ? ? ? ? ? ?
"P6"
> [22] "P8" ? ? ? ? ? ? ? ? ? "P9" ? ? ? ? ? ? ? ? ?
"P10"
> [25] "P11" ? ? ? ? ? ? ? ? ?"P12" ? ? ? ? ? ? ? ?
?"P7"
> [28] "SITE" ? ? ? ? ? ? ? ? "Errors" ? ? ? ? ? ? ?
"warnings"
> [31] "Manual" ? ? ? ? ? ? ? "Total" ? ? ? ? ? ? ?
?"H_tot"
> [34] "HP1.1" ? ? ? ? ? ? ? ?"HP1.2" ? ? ? ? ? ? ?
?"HP1.3"
> [37] "HP1.4" ? ? ? ? ? ? ? ?"HP_tot" ? ? ? ? ? ? ?
"HO1.1"
> [40] "HO1.2" ? ? ? ? ? ? ? ?"HO1.3" ? ? ? ? ? ? ?
?"HO1.4"
> [43] "HO_tot" ? ? ? ? ? ? ? "HU1.1" ? ? ? ? ? ? ?
?"HU1.2"
> [46] "HU1.3" ? ? ? ? ? ? ? ?"HU_tot" ? ? ? ? ? ? ?
"HR"
> [49] "L_tot" ? ? ? ? ? ? ? ?"LP1.1" ? ? ? ? ? ? ?
?"LP1.2"
> [52] "LP1.3" ? ? ? ? ? ? ? ?"LP1.4" ? ? ? ? ? ? ?
?"LP_tot"
> [55] "LO1.1" ? ? ? ? ? ? ? ?"LO1.2" ? ? ? ? ? ? ?
?"LO1.3"
> [58] "LO1.4" ? ? ? ? ? ? ? ?"LO_tot" ? ? ? ? ? ? ?
"LU1.1"
> [61] "LU1.2" ? ? ? ? ? ? ? ?"LU1.3" ? ? ? ? ? ? ?
?"LU_tot"
> [64] "LR_tot" ? ? ? ? ? ? ? "SP_tot" ? ? ? ? ? ? ?
"SP1.1"
> [67] "SP1.2" ? ? ? ? ? ? ? ?"SP1.3" ? ? ? ? ? ? ?
?"SP1.4"
> [70] "SP_tot.1" ? ? ? ? ? ? "SO1.1" ? ? ? ? ? ? ?
?"SO1.2"
> [73] "SO1.3" ? ? ? ? ? ? ? ?"SO1.4" ? ? ? ? ? ? ?
?"SO_tot"
> [76] "SU1.1" ? ? ? ? ? ? ? ?"SU1.2" ? ? ? ? ? ? ?
?"SU1.3"
> [79] "SU_tot" ? ? ? ? ? ? ? "SR"
>
> Thank you in advance,
> Stephane Vaucher
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
Did you try taking out P7, which is text? Moreover, if you get a message
saying ' the standard deviation is zero', it means that the entire
column is
constant. By definition, the covariance of a constant with a random variable
is 0, but your data consists of values, so cor() understandably throws a
warning that one or more of your columns are constant. Applying the
following to your data (which I named expd instead), we get
sapply(expd[, -12], var)
P1 P2 P3 P4 P5
P6
5.433333e-01 1.083333e+00 5.766667e-01 1.083333e+00 6.433333e-01
5.566667e-01
P8 P9 P10 P11 P12
SITE
5.733333e-01 3.193333e+00 5.066667e-01 2.500000e-01 5.500000e+00
2.493333e+00
Errors warnings Manual Total H_tot
HP1.1
9.072840e+03 2.081334e+04 7.433333e-01 3.823500e+04 3.880250e+03
2.676667e+00
HP1.2 HP1.3 HP1.4 HP_tot HO1.1
HO1.2
0.000000e+00 2.008440e+03 3.057067e+02 3.827250e+03 8.400000e-01
0.000000e+00
HO1.3 HO1.4 HO_tot HU1.1 HU1.2
HU1.3
0.000000e+00 0.000000e+00 8.400000e-01 0.000000e+00 2.100000e-01
2.266667e-01
HU_tot HR L_tot LP1.1 LP1.2
LP1.3
6.233333e-01 7.433333e-01 3.754610e+03 3.209333e+01 0.000000e+00
2.065010e+03
LP1.4 LP_tot LO1.1 LO1.2 LO1.3
LO1.4
2.246233e+02 3.590040e+03 3.684000e+01 0.000000e+00 0.000000e+00
2.840000e+00
LO_tot LU1.1 LU1.2 LU1.3 LU_tot
LR_tot
6.000000e+01 0.000000e+00 1.440000e+00 3.626667e+00 8.373333e+00
4.943333e+00
SP_tot SP1.1 SP1.2 SP1.3 SP1.4
SP_tot.1
6.911067e+02 4.225000e+01 0.000000e+00 1.009600e+02 4.161600e+02
3.071600e+02
SO1.1 SO1.2 SO1.3 SO1.4 SO_tot
SU1.1
4.543333e+00 2.500000e-01 0.000000e+00 2.100000e-01 5.250000e+00
0.000000e+00
SU1.2 SU1.3 SU_tot SR
1.556667e+00 4.225000e+01 3.504000e+01 4.225000e+01
Which columns are constant?
which(sapply(expd[, -12], var) < .Machine$double.eps)
HP1.2 HO1.2 HO1.3 HO1.4 HU1.1 LP1.2 LO1.2 LO1.3 LU1.1 SP1.2 SO1.3 SU1.1
19 24 25 26 28 35 40 41 44 51 57 60
I suspect that in your real data set, there aren't so many constant columns,
but this is one way to check.
HTH,
Dennis
On Wed, Sep 8, 2010 at 12:35 PM, Stephane Vaucher
<vauchers@iro.umontreal.ca> wrote:
> Hi everyone,
>
> I'm observing what I believe is weird behaviour when attempting to do
> something very simple. I want a correlation matrix, but my matrix seems to
> contain correlation values that are not found when executed on pairs:
>
> test2$P2
>>
> [1] 2 2 4 4 1 3 2 4 3 3 2 3 4 1 2 2 4 3 4 1 2 3 2 1 3
>
>> test2$HP_tot
>>
> [1] 10 10 10 10 10 10 10 10 136 136 136 136 136 136 136 136 136
> 136 15
> [20] 15 15 15 15 15 15
c=cor(test2$P3,test2$HP_tot,method='spearman')
>
>> c
>>
> [1] -0.2182876
>
>> c=cor(test2,method='spearman')
>>
> Warning message:
> In cor(test2, method = "spearman") : the standard deviation is
zero
>
>> write(c,file='out.csv')
>>
>
> from my spreadsheet
> -0.25028783918741
>
> Most cells are correct, but not that one.
>
> If this is expected behaviour, I apologise for bothering you, I read the
> documentation, but I do not know if the calculation of matrices and pairs
is
> done using the same function (eg, with respect to equal value
observations).
>
> If this is not a desired behaviour, I noticed that it only occurs with a
> relatively large matrix (I couldn't reproduce on a simple 2 column data
> set). There might be a naming error.
>
> names(test2)
>>
> [1] "ID" "NOMBRE"
"MAIL"
> [4] "Age" "SEXO"
"Studies"
> [7] "Hours_Internet" "Vision.Disabilities"
"Other.disabilities"
> [10] "Technology_Knowledge" "Start_Time"
"End_Time"
> [13] "Duration" "P1"
"P1Book"
> [16] "P1DVD" "P2"
"P3"
> [19] "P4" "P5"
"P6"
> [22] "P8" "P9"
"P10"
> [25] "P11" "P12"
"P7"
> [28] "SITE" "Errors"
"warnings"
> [31] "Manual" "Total"
"H_tot"
> [34] "HP1.1" "HP1.2"
"HP1.3"
> [37] "HP1.4" "HP_tot"
"HO1.1"
> [40] "HO1.2" "HO1.3"
"HO1.4"
> [43] "HO_tot" "HU1.1"
"HU1.2"
> [46] "HU1.3" "HU_tot"
"HR"
> [49] "L_tot" "LP1.1"
"LP1.2"
> [52] "LP1.3" "LP1.4"
"LP_tot"
> [55] "LO1.1" "LO1.2"
"LO1.3"
> [58] "LO1.4" "LO_tot"
"LU1.1"
> [61] "LU1.2" "LU1.3"
"LU_tot"
> [64] "LR_tot" "SP_tot"
"SP1.1"
> [67] "SP1.2" "SP1.3"
"SP1.4"
> [70] "SP_tot.1" "SO1.1"
"SO1.2"
> [73] "SO1.3" "SO1.4"
"SO_tot"
> [76] "SU1.1" "SU1.2"
"SU1.3"
> [79] "SU_tot" "SR"
>
> Thank you in advance,
> Stephane Vaucher
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]