On Fri, Mar 09, 2007 at 01:25:24PM +0100, Lukasz Komsta
wrote:>
> Dear useRs,
>
> I have a curve which is a mixture of Gaussian curves (for example UV
> emission or absorption spectrum). Do you have any suggestions how to
> implement searching for optimal set of Gaussian peaks to fit the curve?
> I know that it is very complex problem, but maybe it is a possibility
> to do it? First supposement is to use a nls() with very large functions,
> and compare AIC value, but it is very difficult to suggest any starting
> points for algotirithm.
>
> Searching google I have found only a description of commercial software
> for doing such deconvolution (Origin, PeakFit) without any information
> about used algorithms. No ready-to-use function in any language.
>
> I have tried to use a Mclust workaround for this problem, by generating a
> large dataset for which the spectrum is a histogram and feed it into
> the Mclust. The results seem to be serious, but this is very ugly and
> imprecise method.
>
> Thanks for any help,
>
> Luke
>
I would try `nls'. we have used `nls' for fitting magnetic resonance
spectra
consisting of =~ 10 gaussian peaks. this works OK, if the input data are
reasonable (not too noisy, peak amplitudes above noise level, peak distance
not unreasonably smaller than peak width, i.e peak overlap such that peaks are
still more or less identifiable visually).
of course you must invest effort in getting the start values (automatically or
manually) right. if your data are good, you might get good start values for the
positions (the means of the gaussians) with an approach that was floating around
the r-help list in 11/2005, which I adopted as follows:
peaks <- function (series, span = 3, what = c("max",
"min"), do.pad = TRUE,
add.to.plot = FALSE, ...)
{
if ((span <- as.integer(span))%%2 != 1)
stop("'span' must be odd")
if (!is.numeric(series))
stop("`peaks' needs numeric input")
what <- match.arg(what)
if (is.null(dim(series)) || min(dim(series)) == 1) {
series <- as.numeric(series)
x <- seq(along = series)
y <- series
}
else if (nrow(series) == 2) {
x <- series[1, ]
y <- series[2, ]
}
else if (ncol(series) == 2) {
x <- series[, 1]
y <- series[, 2]
}
if (span == 1)
return(list(x = x, y = y, pos = rep(TRUE, length(y))),
span = span, what = what, do.pad = do.pad)
if (what == "min")
z <- embed(-y, span)
else z <- embed(y, span)
s <- span%/%2
s1 <- s + 1
v <- max.col(z, "first") == s1
if (do.pad) {
pad <- rep(FALSE, s)
v <- c(pad, v, pad)
idx <- v
}
else idx <- c(rep(FALSE, s), v)
val <- list(x = x[idx], y = y[idx], pos = v, span = span,
what = what, do.pad = do.pad)
if (add.to.plot == TRUE)
points(val, ...)
val
}
this looks for local maxima in the vector ("y-values") or 2-dim array
("x/y-matrix") `series'in a neighborhood of each point defined by
`span'.
if you first plot your data and then call the above on the data with
'add.to.plot = TRUE', the results of the peak search are added to your
plot (and
you can modify this plotting via the `...' argument).
maybe this works for your data to get the peak position estimates (and the
amplitudes in the next step) right. frequently the standard deviations
estimates can be set to some fixed value for any given experiment.
and of course distant parts of your spectrum won't have anything to do which
each other, so you can split up the fitting to help `nls' along a bit.
joerg