thr3ads.net - R help - [R] finding peaks in a simple dataset with R [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2005-Nov-23 13:35 UTC

[R] finding peaks in a simple dataset with R

I've been asked in private,
(and am replying BCC to the asker),
>> I saw your post on the R-help archives page about the possibility of 
>> porting a function from S-Plus called peaks() to R. I am looking for 
>> some way to locate peaks in a simple x,y data set, and thought that R 
>> might be the way to go.
"of course" it is the way to go, don't get lost be going
somewhere else  :-)
and try

    install.packages("fortune")
    fortune("go with R")
>> Any ideas would be a great help,
Using      RSiteSearch("peaks") gives too many hits, among
which those you can get by the more advanced (regular expression) call

    RSiteSearch("/peaks\\b.*\\bfunction/")

where in the 2nd hit,
    http://finzi.psych.upenn.edu/R/Rhelp02a/archive/33097.html
Petr Pikal gives a simple peaks() function, originally by Brian Ripley
which is using embed() and max.col() smartly.

I wonder if we shouldn't polish that a bit and add to R's
standard 'utils' package.

Martin Maechler, ETH Zurich

Marc Kirchner

2005-Nov-23 14:33 UTC

head link

[R] finding peaks in a simple dataset with R

> 
> I wonder if we shouldn't polish that a bit and add to R's
> standard 'utils' package.
> 
Hm, I figured out there are (at least) two versions out there, one being
the "original" idea and a modification. 

=== Petr Pikal in 2001 (based on Brian Ripley's idea)=peaks <-
function(series, span=3) {
	z <- embed(series, span)
	result <- max.col(z) == 1 + span %/% 2
	result
}

versus

=== Petr Pikal in 2004 =peaks2<-function(series,span=3) {
	z <- embed(series, span)
	s <- span%/%2
	v<- max.col(z) == 1 + s
	result <- c(rep(FALSE,s),v)
	result <- result[1:(length(result)-s)]
	result
} 

Comparison shows> peaks(c(1,4,1,1,6,1,5,1,1),3)[1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
which is a logical vector for elements 2:N-1 and
> peaks2(c(1,4,1,1,6,1,5,1,1),3)[1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
which is a logical vector for elements 1:N-2.

As I would expect to "lose" (span-1)/2 elements on each side 
of the vector, to me the 2001 version feels more natural.

Also, both "suffer" from being non-deterministic in the 
multiple-maxima-case (the two 4s here)
> peaks(c(1,4,4,1,6,1,5,1,1),3)
[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE> peaks(c(1,4,4,1,6,1,5,1,1),3)
[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE> peaks(c(1,4,4,1,6,1,5,1,1),3)
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE> peaks(c(1,4,4,1,6,1,5,1,1),3)[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

which also persits for span > 3 (without the 6 then, of course):
> peaks(c(1,4,4,1,1,1,5,1,1),5)
[1]  TRUE FALSE FALSE FALSE  TRUE> peaks(c(1,4,4,1,1,1,5,1,1),5)
[1] FALSE FALSE FALSE FALSE  TRUE> peaks(c(1,4,4,1,1,1,5,1,1),5)[1]  TRUE FALSE FALSE FALSE  TRUE

This could (should?) be fixed by modifying the call to max.col()
	result <- max.col(z, "first") == 1 + span %/% 2;

Just my two cents,
Marc

-- 
=======================================================Dipl. Inform. Med. Marc
Kirchner
Interdisciplinary Centre for Scientific Computing (IWR)
Multidimensional Image Processing
INF 368
University of Heidelberg
D-69120 Heidelberg
Tel: ++49-6221-54 87 97
Fax: ++49-6221-54 88 50
marc.kirchner at iwr.uni-heidelberg.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url :
https://stat.ethz.ch/pipermail/r-help/attachments/20051123/bba13290/attachment.bin

Martin Maechler

2005-Nov-23 16:18 UTC

head link

[R] go with R :-) {Re: finding peaks ..}

>>>>> "MM" == Martin Maechler <maechler at
stat.math.ethz.ch>
>>>>>     on Wed, 23 Nov 2005 14:35:14 +0100 writes:
    MM> I've been asked in private,
    MM> (and am replying BCC to the asker),

    >>> I saw your post on the R-help archives page about the
possibility of
    >>> porting a function from S-Plus called peaks() to R. I am
looking for
    >>> some way to locate peaks in a simple x,y data set, and thought
that R
    >>> might be the way to go.

    MM> "of course" it is the way to go, don't get lost be
going
    MM> somewhere else  :-)
    MM> and try

    MM> install.packages("fortune")
    MM> fortune("go with R")

auch!  Two mistakes in such short section.. (thanks, Andy!)
Instead, it should have been


    >>> ..... and thought that R might be the way to go.

  "of course" it is the way to go, don't get lost by going
somewhere else  :-)

and try

   install.packages("fortune")
   library(fortune)
   fortune("go with R")

Martin

Tuszynski, Jaroslaw W.

2005-Nov-23 18:15 UTC

head link

[R] finding peaks in a simple dataset with R

>> I am looking for some way to locate peaks in a simple x,y data set.
See my 'msc.peaks.find' function in 'caMassClass', it has a
simple peak
finding algorithm.

Jarek Tuszynski

Petr Pikal

2005-Nov-24 12:31 UTC

head link

[R] finding peaks in a simple dataset with R

Hi Marc

I use this function for finding maxima in some spectral 
data (eg. from Xray diffraction) and it satisfied my 
needs. The function itself was modified probably due to 
some reasons for ploting my data so it dropped values 
from the end rather than from both sides.

Peaks in those cases are different than just occasional 
spikes from noise so therefore I did not notice this bug.
Thanks for your suggestion.

Best regards.

Petr



On 23 Nov 2005 at 14:33, Marc Kirchner wrote:

Date sent:      	Wed, 23 Nov 2005 14:33:28 +0000
From:           	Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
To:             	Martin Maechler <maechler at stat.math.ethz.ch>
Copies to:      	R-help at r-project.org
Subject:        	Re: [R] finding peaks in a simple dataset with R
> > 
> > I wonder if we shouldn't polish that a bit and add to R's
> > standard 'utils' package.
> > 
> 
> Hm, I figured out there are (at least) two versions out there, one
> being the "original" idea and a modification. 
> 
> === Petr Pikal in 2001 (based on Brian Ripley's idea)=> peaks <-
function(series, span=3) {
>  z <- embed(series, span)
>  result <- max.col(z) == 1 + span %/% 2
>  result
> }
> 
> versus
> 
> === Petr Pikal in 2004 => peaks2<-function(series,span=3) {
>  z <- embed(series, span)
>  s <- span%/%2
>  v<- max.col(z) == 1 + s
>  result <- c(rep(FALSE,s),v)
>  result <- result[1:(length(result)-s)]
>  result
> } 
> 
> Comparison shows
> > peaks(c(1,4,1,1,6,1,5,1,1),3)
> [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
> which is a logical vector for elements 2:N-1 and
> 
> > peaks2(c(1,4,1,1,6,1,5,1,1),3)
> [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
> which is a logical vector for elements 1:N-2.
> 
> As I would expect to "lose" (span-1)/2 elements on each side 
> of the vector, to me the 2001 version feels more natural.
> 
> Also, both "suffer" from being non-deterministic in the 
> multiple-maxima-case (the two 4s here)
> 
> > peaks(c(1,4,4,1,6,1,5,1,1),3)
> [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
> > peaks(c(1,4,4,1,6,1,5,1,1),3)
> [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE
> > peaks(c(1,4,4,1,6,1,5,1,1),3)
> [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
> > peaks(c(1,4,4,1,6,1,5,1,1),3)
> [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
> 
> which also persits for span > 3 (without the 6 then, of course):
> 
> > peaks(c(1,4,4,1,1,1,5,1,1),5)
> [1]  TRUE FALSE FALSE FALSE  TRUE
> > peaks(c(1,4,4,1,1,1,5,1,1),5)
> [1] FALSE FALSE FALSE FALSE  TRUE
> > peaks(c(1,4,4,1,1,1,5,1,1),5)
> [1]  TRUE FALSE FALSE FALSE  TRUE
> 
> This could (should?) be fixed by modifying the call to max.col()
>  result <- max.col(z, "first") == 1 + span %/% 2;
> 
> Just my two cents,
> Marc
> 
> -- 
> =======================================================> Dipl. Inform.
Med. Marc Kirchner
> Interdisciplinary Centre for Scientific Computing (IWR)
> Multidimensional Image Processing
> INF 368
> University of Heidelberg
> D-69120 Heidelberg
> Tel: ++49-6221-54 87 97
> Fax: ++49-6221-54 88 50
> marc.kirchner at iwr.uni-heidelberg.de
> 
> 
Petr Pikal
petr.pikal at precheza.cz

Tuszynski, Jaroslaw W.

2005-Nov-28 14:41 UTC

head link

[R] finding peaks in a simple dataset with R

Try,

  # work directly with data from the input files
  directory  = system.file("Test", package = "caMassClass")
  X = msc.rawMS.read.csv(directory, "IMAC_normal_.*csv")
  Peaks = msc.peaks.find(X) # Find Peaks
  cat(nrow(Peaks), "peaks were found in", Peaks[nrow(Peaks),2],
"files.\n")
  stopifnot( nrow(Peaks)==424 )

On my data to see that every thing works OK. Than I would convert your
"input.dat" to CSV format:

2.00, 233
2.04, 220
...
11.60, 540
12.00, 600   <-- a peak!
12.04, 450
...

On Windows machine, you can do it by opening your file in excel, and saving
it as CSV. Or possibly using test editor to replace ' ' with ',
'. Than the
script

  X = msc.rawMS.read.csv('.', "Input.csv")
  Peaks = msc.peaks.find(X)
  cat(nrow(Peaks), "peaks were found in", Peaks  [nrow(Peaks),2],
"files.\n")

 should work.

Other way, is to try:

  X = read.table("input.dat", header=TRUE)
  Y = X[,2]
  rownames(Y) = signif(X[,1], 6)
  Peaks = msc.peaks.find(Y)

Which casts your data in correct format, described in documentation as:
"Spectrum data either in matrix format [nFeatures x nSamples] or in 3D
array
format [nFeatures x nSamples x nCopies]. Row names (rownames(X)) store M/Z
mass of each row."

I hope one of those solutions works for you.

Good Luck.

Jarek Tuszynski

-----Original Message-----
From: dylan.beaudette at gmail.com [mailto:dylan.beaudette at gmail.com] 
Sent: Wednesday, November 23, 2005 5:47 PM
To: r-help at stat.math.ethz.ch
Cc: Tuszynski, Jaroslaw W.
Subject: Re: [R] finding peaks in a simple dataset with R

On Wednesday 23 November 2005 10:15 am, Tuszynski, Jaroslaw W.
wrote:> >> I am looking for some way to locate peaks in a simple x,y data
set.
>
> See my 'msc.peaks.find' function in 'caMassClass', it has a
simple
> peak finding algorithm.
>
> Jarek Tuszynski
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
Jarek,

Thanks for the tip. I was able to install the caMassClass package and all of

its dependancies. In addition, I was able to run the examples on the manual 
pages.

However, The format of the input data to the 'msc.peaks.find' function
is
not 
apparent to me. In its simplest form, my data looks something like this:

2.00 233
2.04 220
...
11.60 540
12.00 600   <-- a peak!
12.04 450
...

Here is an example R session, trying out the function you suggested:

#importing my data like this:
X <- read.table("input.dat", header=TRUE)

#from the example:
Peaks = msc.peaks.find(X)

#errors with:
Error in sort(x, partial = unique(c(lo, hi))) :
        'x' must be atomic

Also: I have tried one of the functions ( 'getPeaks' ) listed on the 
'msc.peaks.find' manual page, however I am still having a problem with
the
format of my data vs. what the function is expecting.

#importing my data like this:
X <- read.table("input.dat", header=TRUE)

#setup an output file for peak information
peakfile <- paste("peakinfo.csv", sep="/")

#run the analysis:
getPeaks(X,peakfile)

#errors with:
Error in area/max(area) : non-numeric argument to binary operator In
addition: Warning message: no finite arguments to max; returning -Inf

any ideas would be greatly appreciated!

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

Apparently Analagous Threads

Search for more maybe matching threads

R help - Nov 2005 - finding peaks in a simple dataset with R

[R] finding peaks in a simple dataset with R

[R] finding peaks in a simple dataset with R

[R] go with R :-) {Re: finding peaks ..}

[R] finding peaks in a simple dataset with R

[R] finding peaks in a simple dataset with R

[R] finding peaks in a simple dataset with R

Apparently Analagous Threads