Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian.
Dear Brian, I can suggest you to use density() function to get an estimate of the pdf you're finding (I believe it's unknown). Then you can plot the point you got by density() using plot(). In this way you have a graphic representation of you unknown pdf. According its shape and helping by the graphic you could try to understand what kind of pdf it would be (normal, gamma, weibul, etc.) After you can estimate parameters of pdf using your data with LS or ML methods. Then you can calculate the goodness of fit for each model of pdf and use the best one. I hope I get you a little help. Cordially Vito Ricci brian.macnamee at gmail.com wrote: Hi there, Sorry if this is a rather loing post. I have a simple list of single feature data points from which I would like to generate a probability that an unseen point comes from the same distribution. To do this I am trying to estimate the probability density of the list of points and use this to generate a probability for the new unseen points. I have managed to use the R density function to generate the density estimate but have not been able to do anything with this - i.e. generate a rpobability that a new point comes from the same distribution. Is there a function to do this, or am I way off the mark using the density function at all? Thanks in advance, Brian. ====Diventare costruttori di soluzioni Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml ___________________________________ http://it.seriea.fantasysports.yahoo.com/
Try fitting it with a Johnson function -- see SuppDists. If you can fit it you will then be able to use the functions in SuppDists just as you can for any other distribution supported by R. Brian Mac Namee wrote:> Hi there, > > Sorry if this is a rather loing post. I have a simple list of single > feature data points from which I would like to generate a probability > that an unseen point comes from the same distribution. To do this I am > trying to estimate the probability density of the list of points and > use this to generate a probability for the new unseen points. I have > managed to use the R density function to generate the density estimate > but have not been able to do anything with this - i.e. generate a > rpobability that a new point comes from the same distribution. Is > there a function to do this, or am I way off the mark using the > density function at all? > > Thanks in advance, > > Brian. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Bob Wheeler --- http://www.bobwheeler.com/ ECHIP, Inc. --- Randomness comes in bunches.
Hi! The function density returns you a object of class density. This object has an x and an y attribute which you can access by x y, Hi! Use approx and runif. eg.: dd<-density(rnorm(100,3,5)) plot(dd) Using the function ?approx you can compute the density value for any x. #the x is a dummy here. mydist<-function(x,dd) { while(1) { tmp <- runif(1,min=min(dd$x),max=max(dd$x)) lev <- approx(dd$x,dd$y,tmp)$y if(runif(1,c(0,1)) <= lev) { return(tmp) } } } x <- 0 mydist(x,dd) res<-rep(0,500) res<-sapply(res,mydist,dd) lines(density(res),col=2) /E. *********** REPLY SEPARATOR *********** On 9/15/2004 at 12:36 PM Brian Mac Namee wrote:>>>Hi there, >>> >>>Sorry if this is a rather loing post. I have a simple list of single >>>feature data points from which I would like to generate a probability >>>that an unseen point comes from the same distribution. To do this I am >>>trying to estimate the probability density of the list of points and >>>use this to generate a probability for the new unseen points. I have >>>managed to use the R density function to generate the density estimate >>>but have not been able to do anything with this - i.e. generate a >>>rpobability that a new point comes from the same distribution. Is >>>there a function to do this, or am I way off the mark using the >>>density function at all? >>> >>>Thanks in advance, >>> >>>Brian. >>> >>>______________________________________________ >>>R-help at stat.math.ethz.ch mailing list >>>https://stat.ethz.ch/mailman/listinfo/r-help >>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlDipl. bio-chem. Witold Eryk Wolski @ MPI-Moleculare Genetic Ihnestrasse 63-73 14195 Berlin 'v' tel: 0049-30-83875219 / \ mail: witek96 at users.sourceforge.net ---W-W---- http://www.molgen.mpg.de/~wolski wolski at molgen.mpg.de
On 15-Sep-04 Brian Mac Namee wrote:> Sorry if this is a rather loing post. I have a simple list of single > feature data points from which I would like to generate a probability > that an unseen point comes from the same distribution. To do this I am > trying to estimate the probability density of the list of points and > use this to generate a probability for the new unseen points. I have > managed to use the R density function to generate the density estimate > but have not been able to do anything with this - i.e. generate a > rpobability that a new point comes from the same distribution. Is > there a function to do this, or am I way off the mark using the > density function at all?It's not clear what you're really after, but it looks as though you may be wanting to sample from the distribution estimated by 'density'. A possible approach, which you could refine, is exemplified by x<-rnorm(1000) d<-density(x,n=4096) y<-sample(d$x,size=1000,prob=d$y) Check performance with hist(y) Looks OK to me! See "?density" and "?sample". On an alternative interpretation, perhaps you want to first estimate the density based on data you already have, and then when you have got further data (but these would then be "seen" and not "unseen") come to a judgement about whether these new points are compatible with coming from the distributikon you have estimated. A possible approach to this question (again susceptible to refinement) would be as follows. 1. Use a fine-grained grid for 'density', i.e. a large value for "n". 2. Replace each of the points in the new data by the nearest point in this grid. Call these values z1, z2, ... , zk corresponding to index values i1, i2, ... , ik in d$x. 3. Evaluate the probability P(z1,...,zk) from the density as the product of d$y[i] where i<-c(i1,...,ik). Better still, evaluated the logarithm of this. Call the result L. 4. Now simulate a large number of draws of k values from d on the lines of sample(d$x,size=k,prob=d$y) as above, and evaluate L for each of these. Where is the value of L from (3) situated in the distribution of these values of L from (4)? If (say) only 1 per cent of the simulated values of L from "d" are less than the value of L from (3), then you have a basis for a test that your new data did not come from the distribution you have estimated from your old data, in that the new data are from the low-density part of the estimated distribution. There are of course alternative ways to view this question. The value of "k" is relevant. In particular, if "k" is small (say 3 or 4) then the suggestion in (4) is probably the best way to approach it. However, if "k" is large then you can use a test on the lines of Kolmogorov-Smirnov with the reference distribution estimated as the cumulative distribution of d$y and the distribution being tested as the empirical cumulative distribution of your new data. Even sharper focus is available if you are in a position to make a paramatric model for your data, but your description does not suggest that this is the case. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 15-Sep-04 Time: 15:07:33 ------------------------------ XFMail ------------------------------