Lalitha Viswanathan
2015-May-03 09:19 UTC
[R] Request for functions to calculate correlated factors influencing an outcome.
Hi
I have a dataset of the type attached.
Here's my code thus far.
dataset <-data.frame(read.delim("data", sep="\t",
header=TRUE));
newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight,
Disp, HP));
cor(newData, method="pearson");
Results are
Price Reliability Mileage Weight Disp
HP
Price 1.0000000 NA -0.6537541 0.7017999 0.4856769
0.6536433
Reliability NA 1 NA NA NA
NA
Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928
-0.6667146
Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804
0.7629322
Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000
0.8181881
HP 0.6536433 NA -0.6667146 0.7629322 0.8181881
1.0000000
It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and
Price are strongly correlated.
To find the statistical significance,
I am trying sample.correln<-cor.test(newData$Disp, newData$HP,
method="kendall", exact=NULL)
Kendall's rank correlation tau
data: newx$Disp and newx$HP
z = 7.2192, p-value = 5.229e-13
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.6563871
If I try the same with
sample.correln<-cor.test(newData$Disp, newData$HP,
method="pearson",
exact=NULL)
I get Warning message:
In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact =
NULL) :
Cannot compute exact p-value with ties> sample.correln
Spearman's rank correlation rho
data: newx$Disp and newx$HP
S = 5716.8, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8411566
I am not sure how to interpret these values.
Basically, I am trying to figure out which combination of factors
influences efficiency.
Thanks
Lalitha
-------------- next part --------------
Price Country Reliability Mileage Type Weight Disp. HP
8895 USA 4 33 Small 2560 97 113
7402 USA 2 33 Small 2345 114 90
6319 Korea 4 37 Small 1845 81 63
6635 Japan/USA 5 32 Small 2260 91 92
6599 Japan 5 32 Small 2440 113 103
8672 Mexico 4 26 Small 2285 97 82
7399 Japan/USA 5 33 Small 2275 97 90
7254 Korea 1 28 Small 2350 98 74
9599 Japan 5 25 Small 2295 109 90
5866 Japan NA 34 Small 1900 73 73
8748 Japan/USA 5 29 Small 2390 97 102
6488 Japan 5 35 Small 2075 89 78
9995 Germany 3 26 Small 2330 109 100
11545 USA 1 20 Sporty 3320 305 170
9745 USA 1 27 Sporty 2885 153 100
12164 USA 1 19 Sporty 3310 302 225
11470 USA 3 30 Sporty 2695 133 110
9410 Japan 5 33 Sporty 2170 97 108
13945 Japan 5 27 Sporty 2710 125 140
13249 Japan 3 24 Sporty 2775 146 140
10855 USA NA 26 Sporty 2840 107 92
13071 Japan NA 28 Sporty 2485 109 97
18900 Germany NA 27 Compact 2670 121 108
10565 USA 2 23 Compact 2640 151 110
10320 USA 1 26 Compact 2655 133 95
10945 USA 4 25 Compact 3065 181 141
9483 USA 2 24 Compact 2750 141 98
12145 Japan/USA 5 26 Compact 2920 132 125
12459 Japan/USA 4 24 Compact 2780 133 110
10989 Japan 5 25 Compact 2745 122 102
17879 Japan 4 21 Compact 3110 181 142
11650 Japan 5 21 Compact 2920 146 138
9995 USA 2 23 Compact 2645 151 110
15930 France NA 24 Compact 2575 116 120
11499 Japan/USA 5 23 Compact 2935 135 130
11588 Japan/USA 5 27 Compact 2920 122 115
18450 Sweden 3 23 Compact 2985 141 114
24760 Japan 5 20 Medium 3265 163 160
13150 USA 3 21 Medium 2880 151 110
12495 USA 2 22 Medium 2975 153 150
16342 USA 3 22 Medium 3450 202 147
15350 USA 2 22 Medium 3145 180 150
13195 USA 3 22 Medium 3190 182 140
14980 USA 1 23 Medium 3610 232 140
9999 Korea NA 23 Medium 2885 143 110
23300 Japan 5 21 Medium 3480 180 158
17899 Japan 5 22 Medium 3200 180 160
13150 USA 2 21 Medium 2765 151 110
14495 USA NA 21 Medium 3220 189 135
21498 Japan 3 23 Medium 3480 180 190
16145 USA 3 23 Large 3325 231 165
14525 USA 1 18 Large 3855 305 170
17257 USA 3 20 Large 3850 302 150
13995 USA NA 18 Van 3195 151 110
15395 USA 3 18 Van 3735 202 150
12267 USA 3 18 Van 3665 182 145
14944 Japan 5 19 Van 3735 181 150
14929 Japan NA 20 Van 3415 143 107
13949 Japan NA 20 Van 3185 146 138
14799 Japan NA 19 Van 3690 146 106
Michael Dewey
2015-May-03 11:24 UTC
[R] Request for functions to calculate correlated factors influencing an outcome.
Dear Lalitha, see inline below On 03/05/2015 10:19, Lalitha Viswanathan wrote:> Hi > I have a dataset of the type attached. > Here's my code thus far. > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, > Disp, HP));In fact in the file the variable seems to be called Disp.> cor(newData, method="pearson"); > Results are > Price Reliability Mileage Weight Disp > HP > Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 > 0.6536433 > Reliability NA 1 NA NA NA > NA > Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 > -0.6667146 > Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 > 0.7629322 > Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 > 0.8181881 > HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 > 1.0000000 > > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and > Price are strongly correlated. > To find the statistical significance, > I am trying sample.correln<-cor.test(newData$Disp, newData$HP, > method="kendall", exact=NULL) > Kendall's rank correlation tau > > data: newx$Disp and newx$HP > z = 7.2192, p-value = 5.229e-13 > alternative hypothesis: true tau is not equal to 0 > sample estimates: > tau > 0.6563871 > > If I try the same with > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", > exact=NULL)When I try that it works fine. The real question is why when you asked it for the Pearson coefficient it decided to give you the Spearman as the warning message below points out. I suspect you have done something else which you did not tell us about.> I get Warning message: > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL) : > Cannot compute exact p-value with ties >> sample.correln > > Spearman's rank correlation rho > > data: newx$Disp and newx$HP > S = 5716.8, p-value < 2.2e-16 > alternative hypothesis: true rho is not equal to 0 > sample estimates: > rho > 0.8411566 > > I am not sure how to interpret these values. > Basically, I am trying to figure out which combination of factors > influences efficiency. > > Thanks > Lalitha > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html