Lalitha Viswanathan
2015-Apr-27 17:50 UTC
[R] Fwd: Distribution to use to calculate p values
Hi I have a dataset as below Price Country Reliability Mileage Type Weight Disp. HP 8895 USA 4 33 Small 2560 97 113 (Hundreds of rows) I am trying to find the best possible distribution to use, to find p-values and compute which factors most influence efficiency. Any starting points for the functions I could use, or similar examples I could follow, would be a start. I am a relative novice at R having used it many years ago and am now getting back to it. So looking for pointers Thanks>[[alternative HTML version deleted]]
On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote:> Hi > I have a dataset as below > Price Country Reliability Mileage Type Weight Disp. HP > > > 8895 USA 4 33 Small 2560 97 113 > (Hundreds of rows) > > I am trying to find the best possible distribution to use, to find p-values > and compute which factors most influence efficiency."Finding p-values" is a task that requires research questions. You obviously have some sort of meaning attached to the word "efficiency" but have not stated what it is. This appears to be a request for a statistical tutorial an a topic that has not been described. (And if this is course homework, then it is off-topic for r-help.)> > Any starting points for the functions I could use, or similar examples I > could follow, would be a start. > I am a relative novice at R having used it many years ago and am now > getting back to it. > So looking for pointers > > Thanks > > [[alternative HTML version deleted]]The Posting Guide suggests that you create a small example in R code and describe your question more clearly (if it's not homework.)> ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Hi Lalitha, If you want to find a reasonable model distribution for your data, try plotting the histogram of the variable you want to predict and compare this to the density curves of the distributions that you think will fit. So for example: # plot a histogram of a uniform distribution hist(seq(1,10,length.out=100)) # overlay a normal density function with the same mean lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30) Not a very good fit, but: hist(rnorm(100,5.5)) lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90) Much better. You can then perform a "goodness of fit" test if you need it to justify your choice of distribution. In most cases, you will have to find a "family" (link function) to use in a generalized linear modeling (glm) test. Another approach is to use a non-parametric test if one gives an appropriate answer to your question. Jim On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote: > >> Hi >> I have a dataset as below >> Price Country Reliability Mileage Type Weight Disp. HP >> >> >> 8895 USA 4 33 Small 2560 97 113 >> (Hundreds of rows) >> >> I am trying to find the best possible distribution to use, to find p-values >> and compute which factors most influence efficiency. > > "Finding p-values" is a task that requires research questions. You obviously have some sort of meaning attached to the word "efficiency" but have not stated what it is. This appears to be a request for a statistical tutorial an a topic that has not been described. (And if this is course homework, then it is off-topic for r-help.) > >> >> Any starting points for the functions I could use, or similar examples I >> could follow, would be a start. >> I am a relative novice at R having used it many years ago and am now >> getting back to it. >> So looking for pointers >> >> Thanks >> >> [[alternative HTML version deleted]] > > The Posting Guide suggests that you create a small example in R code and describe your question more clearly (if it's not homework.) > >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.