Lalitha Viswanathan
2015-May-03 09:19 UTC
[R] Request for functions to calculate correlated factors influencing an outcome.
Hi I have a dataset of the type attached. Here's my code thus far. dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, Disp, HP)); cor(newData, method="pearson"); Results are Price Reliability Mileage Weight Disp HP Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 0.6536433 Reliability NA 1 NA NA NA NA Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 -0.6667146 Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 0.7629322 Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 0.8181881 HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 1.0000000 It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and Price are strongly correlated. To find the statistical significance, I am trying sample.correln<-cor.test(newData$Disp, newData$HP, method="kendall", exact=NULL) Kendall's rank correlation tau data: newx$Disp and newx$HP z = 7.2192, p-value = 5.229e-13 alternative hypothesis: true tau is not equal to 0 sample estimates: tau 0.6563871 If I try the same with sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", exact=NULL) I get Warning message: In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL) : Cannot compute exact p-value with ties> sample.correlnSpearman's rank correlation rho data: newx$Disp and newx$HP S = 5716.8, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.8411566 I am not sure how to interpret these values. Basically, I am trying to figure out which combination of factors influences efficiency. Thanks Lalitha -------------- next part -------------- Price Country Reliability Mileage Type Weight Disp. HP 8895 USA 4 33 Small 2560 97 113 7402 USA 2 33 Small 2345 114 90 6319 Korea 4 37 Small 1845 81 63 6635 Japan/USA 5 32 Small 2260 91 92 6599 Japan 5 32 Small 2440 113 103 8672 Mexico 4 26 Small 2285 97 82 7399 Japan/USA 5 33 Small 2275 97 90 7254 Korea 1 28 Small 2350 98 74 9599 Japan 5 25 Small 2295 109 90 5866 Japan NA 34 Small 1900 73 73 8748 Japan/USA 5 29 Small 2390 97 102 6488 Japan 5 35 Small 2075 89 78 9995 Germany 3 26 Small 2330 109 100 11545 USA 1 20 Sporty 3320 305 170 9745 USA 1 27 Sporty 2885 153 100 12164 USA 1 19 Sporty 3310 302 225 11470 USA 3 30 Sporty 2695 133 110 9410 Japan 5 33 Sporty 2170 97 108 13945 Japan 5 27 Sporty 2710 125 140 13249 Japan 3 24 Sporty 2775 146 140 10855 USA NA 26 Sporty 2840 107 92 13071 Japan NA 28 Sporty 2485 109 97 18900 Germany NA 27 Compact 2670 121 108 10565 USA 2 23 Compact 2640 151 110 10320 USA 1 26 Compact 2655 133 95 10945 USA 4 25 Compact 3065 181 141 9483 USA 2 24 Compact 2750 141 98 12145 Japan/USA 5 26 Compact 2920 132 125 12459 Japan/USA 4 24 Compact 2780 133 110 10989 Japan 5 25 Compact 2745 122 102 17879 Japan 4 21 Compact 3110 181 142 11650 Japan 5 21 Compact 2920 146 138 9995 USA 2 23 Compact 2645 151 110 15930 France NA 24 Compact 2575 116 120 11499 Japan/USA 5 23 Compact 2935 135 130 11588 Japan/USA 5 27 Compact 2920 122 115 18450 Sweden 3 23 Compact 2985 141 114 24760 Japan 5 20 Medium 3265 163 160 13150 USA 3 21 Medium 2880 151 110 12495 USA 2 22 Medium 2975 153 150 16342 USA 3 22 Medium 3450 202 147 15350 USA 2 22 Medium 3145 180 150 13195 USA 3 22 Medium 3190 182 140 14980 USA 1 23 Medium 3610 232 140 9999 Korea NA 23 Medium 2885 143 110 23300 Japan 5 21 Medium 3480 180 158 17899 Japan 5 22 Medium 3200 180 160 13150 USA 2 21 Medium 2765 151 110 14495 USA NA 21 Medium 3220 189 135 21498 Japan 3 23 Medium 3480 180 190 16145 USA 3 23 Large 3325 231 165 14525 USA 1 18 Large 3855 305 170 17257 USA 3 20 Large 3850 302 150 13995 USA NA 18 Van 3195 151 110 15395 USA 3 18 Van 3735 202 150 12267 USA 3 18 Van 3665 182 145 14944 Japan 5 19 Van 3735 181 150 14929 Japan NA 20 Van 3415 143 107 13949 Japan NA 20 Van 3185 146 138 14799 Japan NA 19 Van 3690 146 106
Michael Dewey
2015-May-03 11:24 UTC
[R] Request for functions to calculate correlated factors influencing an outcome.
Dear Lalitha, see inline below On 03/05/2015 10:19, Lalitha Viswanathan wrote:> Hi > I have a dataset of the type attached. > Here's my code thus far. > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, > Disp, HP));In fact in the file the variable seems to be called Disp.> cor(newData, method="pearson"); > Results are > Price Reliability Mileage Weight Disp > HP > Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 > 0.6536433 > Reliability NA 1 NA NA NA > NA > Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 > -0.6667146 > Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 > 0.7629322 > Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 > 0.8181881 > HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 > 1.0000000 > > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and > Price are strongly correlated. > To find the statistical significance, > I am trying sample.correln<-cor.test(newData$Disp, newData$HP, > method="kendall", exact=NULL) > Kendall's rank correlation tau > > data: newx$Disp and newx$HP > z = 7.2192, p-value = 5.229e-13 > alternative hypothesis: true tau is not equal to 0 > sample estimates: > tau > 0.6563871 > > If I try the same with > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", > exact=NULL)When I try that it works fine. The real question is why when you asked it for the Pearson coefficient it decided to give you the Spearman as the warning message below points out. I suspect you have done something else which you did not tell us about.> I get Warning message: > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL) : > Cannot compute exact p-value with ties >> sample.correln > > Spearman's rank correlation rho > > data: newx$Disp and newx$HP > S = 5716.8, p-value < 2.2e-16 > alternative hypothesis: true rho is not equal to 0 > sample estimates: > rho > 0.8411566 > > I am not sure how to interpret these values. > Basically, I am trying to figure out which combination of factors > influences efficiency. > > Thanks > Lalitha > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html