Hello list! I would like to fit a distribution to each of the peaks in a histogram, such as this: http://photos1.blogger.com/blogger/7029/2724/1600/DU145-Bax3-Bcl-xL.png . The peaks are identified using Petr Pikal peaks function ( http://finzi.psych.upenn.edu/R/Rhelp02a/archive/33097.html), but after that I am quite stuck. Any idea as to how I can: Fit a distribution to each peak Integrate the area between each two peaks, using the means and widths of the distributions fitted to the two peaks. I will be using the integrate function The histogram is based on approximately 15000 events, which makes Mclust and pam (which both delivers the information I need) less useful. The whole point of this exercise is to find the percentage of cells in peak 1, 2, 3, and so on, and between peak 1-2, peak 2-3, peak 3-4 and so on. Having more that 6 peaks does not appears likely. I am quite new to R and apologise if the solution is fairly basic. Thank you in advance for any help and suggestions Sincerely, Ulrik [[alternative HTML version deleted]]
Hi There are some packages for mass spectra processing (spectrino, caMassClass). I did not use them so I do not know how they suit your needs. However you can compute area (integrate) by these functions # uses information interactively from plot(x,y) # first it replots data between corners *replot(x,y)* # then it computes sum between x axis and y values - osum - # and between "baseline" and y values - cista - based # on locator positions integ<-function (x,y) { replot(x,y) meze<-locator(2) dm<-meze$x[1] hm<-meze$x[2] abline(v=c(dm,hm),col=2) vyber<-x<=hm&x>=dm f3 <- splinefun(x, y) osum<-integrate(f3, dm, hm)$value o1<-(y[x==min(x[vyber])]+y[x==max(x[vyber])])*(max(x[vyber])- min(x[vyber]))/2 cista<-osum-o1 return(c(osum,cista)) } # similar as integ but you has to supply upper and lower limits (dm, # hm) manually if you do not want to perform "integration" of whole # area under the curve. integ1<-function (x,y,dm=-Inf,hm=+Inf) { ifelse(dm==-Inf, dm<-min(x), dm<-dm) ifelse(hm==+Inf, hm<-max(x), hm<-hm) vyber<-x<=hm&x>=dm f3 <- splinefun(x, y) osum<-integrate(f3, dm, hm)$value o1<-(y[x==min(x[vyber])]+y[x==max(x[vyber])])*(max(x[vyber])- min(x[vyber]))/2 cista<-osum-o1 return(c(osum,cista)) } On 19 Jul 2006 at 11:58, Ulrik Stervbo wrote: Date sent: Wed, 19 Jul 2006 11:58:38 +0200 From: "Ulrik Stervbo" <ulriks at ruc.dk> To: r-help at stat.math.ethz.ch Subject: [R] Fitting a distribution to peaks in histogram> Hello list! > > I would like to fit a distribution to each of the peaks in a > histogram, such as this: > http://photos1.blogger.com/blogger/7029/2724/1600/DU145-Bax3-Bcl-xL.pn > g . > > The peaks are identified using Petr Pikal peaks function ( > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/33097.html), but after > that I am quite stuck. > > Any idea as to how I can: > Fit a distribution to each peak > Integrate the area between each two peaks, using the means and widths > of the distributions fitted to the two peaks. I will be using the > integrate function > > The histogram is based on approximately 15000 events, which makes > Mclust and pam (which both delivers the information I need) less > useful. > > The whole point of this exercise is to find the percentage of cells in > peak 1, 2, 3, and so on, and between peak 1-2, peak 2-3, peak 3-4 and > so on. Having more that 6 peaks does not appears likely. > > I am quite new to R and apologise if the solution is fairly basic. > > Thank you in advance for any help and suggestions > > Sincerely, > Ulrik > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.Petr Pikal petr.pikal at precheza.cz
> I would like to fit a distribution to each of the peaks in a histogram, such > as this: http://photos1.blogger.com/blogger/7029/2724/1600/DU145-Bax3-Bcl-xL.pngAs a first shot, I'd try fitting a mixture of gamma distributions (say 3), plus a constant term for the highest bin. You could do this using ML. If the number of peaks is truly unknown, this will be a little trickier but still possible and you could use the LRT to chose between them.> Integrate the area between each two peaks, using the means and widths of the > distributions fitted to the two peaks. I will be using the integrate > functionWhy do you want to do this?> > The histogram is based on approximately 15000 events, which makes Mclust and > pam (which both delivers the information I need) less useful.If you have unbinned data, it would be better (more precise/powerful) to use that. Regards, Hadley
On Wed, 19 Jul 2006, Ulrik Stervbo wrote: [much deleted]> > I am measureing the amount of DNA in cells, and I need to know the > percentage of cells in a part of the cell cycle; that the percentage > of cells in the first peak, in the second peak and so on. I want to > integrate the area between to two cells, because that apparently is > how its none (as far as I can tell from the literature) >I do not doubt that that is how some do it, but a better approach is takes account of errors in the values used to form the histogram. This is reviewed in T. Lynn Eudey Statistical Considerations in DNA Flow Cytometry Statistical Science 1996, Vol. 11, No. 4, 320{334 A classic approach for this purpose is John Mendelsohn; John Rice Deconvolution of Microfluorometric Histograms with B Splines Journal of the American Statistical Association. Vol. 77, No. 380, 1982. A FORTRAN program is mentioned there and the authors say that they will provide listings on request (but that WAS a long time ago). I don't know if this was ever implemented in S/R, but an email to Professor John Rice (UC Berkeley Dept of Stats) might help. HTH, Chuck Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717