I apologize if this is redundant. I've been Googling, searching the archive and reading the help all morning and I am not getting closer to my goal. I have a series of data( xi, yi). It is not evenly sampled and it is messy (meaning that there is a lot of scatter in the data). I want to fit a normal distribution (i.e. a gaussian) to the data in order to find the center. (The data has a loose "normal" look to it.) I have done this with polynomial fitting in R with nls but want to try it with a Gaussian (I am a student astronomer and have not a lot of experience in statistics). In looking at the fitdistr function, it seems to take as input a bunch of x values and it will fit a gaussian to the histogram. That is not what I need to do, I want to fit a normal distribution to the x,y values and get out the parameters of the fit. I'm fooling with nls but it seems to want the data in some other format or something because it is complaining about "singular gradient". I'm sure this isn't hard and the answer must be out there somewhere but I can't find it. I appreciate any assistance. Cheers, Michael filepath <- system.file("data", infile , package="datasets") mm <-read.table(filepath) mm dmk <- data.frame( x=mm$V1, y=mm$V2) attach(dmk) plot(x,y) #ymk <-nls(y~c*x^2+b*x+a,start=list(a=1,b=1,c=1),trace=TRUE) ymk <-nls(y~a*exp(-x^2/sig),start=list(a=1,sig=1),trace=TRUE) co = coef(ymk) cmk <- list(a=co[[1]], b=co[[2]], c=co[[3]] )
Michael Koppelman wrote:> I apologize if this is redundant. I've been Googling, searching the > archive and reading the help all morning and I am not getting closer > to my goal. > > I have a series of data( xi, yi). It is not evenly sampled and it is > messy (meaning that there is a lot of scatter in the data). I want to > fit a normal distribution (i.e. a gaussian) to the data in order to > find the center. (The data has a loose "normal" look to it.) I have > done this with polynomial fitting in R with nls but want to try it > with a Gaussian (I am a student astronomer and have not a lot of > experience in statistics). > > In looking at the fitdistr function, it seems to take as input a > bunch of x values and it will fit a gaussian to the histogram. That > is not what I need to do, I want to fit a normal distribution to the > x,y values and get out the parameters of the fit. I'm fooling with > nls but it seems to want the data in some other format or something > because it is complaining about "singular gradient". > > I'm sure this isn't hard and the answer must be out there somewhere > but I can't find it. I appreciate any assistance.Fitting a distribution simply means estimating the parameters of that distribution. For a Gaussian distribution this problem is trivial. The parameters are the mean vector mu and the covariance matrix Sigma. The (optimal; maximum-likelihood-adjusted-to-be-unbiased) estimates of these are the obvious ones --- the sample mean and the sample covariance. In R you most easily get these by o cbinding your ``x'' and ``y'' vectors into matrix: > xy <- cbind(x,y) o applying mean() to this matrix: > mu.hat <- apply(xy,2,mean) o calling upon var(): > Sigma.hat <- var(xy) That is all there is to it. If all you really want is an estimate of the ``centre'' then this estimate is just mu.hat; you don't need to ``fit a distribution'' at all. From your description of the problem there may well be other issues --- lack of independence of your data, biased sample, outliers, skewness, god knows what. Such issues should be dealt with as carefully as possible. It may be the case that you should be doing some sort of robust estimation. Or it may be the case that this data set is hopeless and should be thrown away and a new data set collected in a manner that will actually yield some information. But ``fitting a distribution'' is *NOT* the issue. I don't mean to be rude, but this question illustrates the point that you really shouldn't be *doing* statistics unless you *understand* some statistics. At the very best you'll waste a great deal of time. cheers, Rolf Turner rolf at math.unb.ca
Thank you. Yes, I do feel that I am under-qualified to even ask questions of y'all. Plus I'm an astronomer, which doesn't help! ;) I'll try again. I have two columns of data, the first column (x) is a distance (or length or separation) and the second column (y) is a flux (or number of counts or brightness) at that distance. Thus, when you plot y vs. x you get a spatial profile which shows how bright this thing is as a function of position. (See the small, attached PNG file. You can see there is a vague gaussian shape to the data.) This is measured data from a bizarre technique which yields data that is not evenly-spaced in x and it does not represent a pure mathematical function (i.e. it is not a point spread function or something like that), it represents the actual, non-uniform shape of an astronomical object. We are making the assumption that the shape of this object can be roughly represented with a gaussian. I want to fit a gaussian to this with the purpose of determining systematically the "center" of the normal-like shape of the spatial feature. I have successfully done so in R with a polynomial but I can't figure out how to do it with a gaussian. Does that make sense? Thanks! Michael On Aug 25, 2006, at 2:04 PM, MARK LEEDS wrote:> hi : i'm not clear on what you mean but someone else might be ? if you > say ( x,y), then it sounds like you are talking about a bivariate > normal > distribution. to fit a regular one dimensional gaussian distribution, > you can't have 2 dimensional data. i honestly don't mean to sound > rude but > i think you should explain what you want to do more clearly > because I don't think > I am the only one that will be confused by what you said. > send out a clearer email and you will get quite a response because > there are a lot of really smart people ( compared to me ) on this > list that love to help. > it's an amazing list in that sense.-------------- next part -------------- A non-text attachment was scrubbed... Name: example.png Type: image/png Size: 15487 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20060825/4d0aedbb/attachment.png
Success! The line I needed was: gcoeffs <-nls(y~(a/b)*exp(-(x-c)^2/(2*b^2)),start=list (a=0.4,b=2,c=-10), trace=TRUE) I also needed to provide good guesses for a, b and c. The attached PNG should explain what I was going after, which is the line in the center of that curve. I am sorry for my confusing and perhaps incorrect usage of terminology. Cheers, Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: example3.png Type: image/png Size: 21420 bytes Desc: not available Url : https://stat.ethz.ch/pipermail/r-help/attachments/20060826/b405c7dd/attachment.png
Whoops, I forgot to add, many thanks to all who replied publicly and privately. I am very appreciative of all comments and suggestions. Mark Leeds was especially kind in terms of clarifying why my description of the situation was so confusing. Cheers, Michael On Aug 26, 2006, at 2:51 PM, Michael Koppelman wrote:> Success! The line I needed was: > > gcoeffs <-nls(y~(a/b)*exp(-(x-c)^2/(2*b^2)),start=list > (a=0.4,b=2,c=-10), trace=TRUE) > > I also needed to provide good guesses for a, b and c. The attached > PNG should explain what I was going after, which is the line in the > center of that curve. I am sorry for my confusing and perhaps > incorrect usage of terminology. > > Cheers, > Michael