thr3ads.net - R help - [R] using spec.pgram [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Anthony Mathelier

2008-Jun-09 18:52 UTC

[R] using spec.pgram

Hi everyone,

first of all, I would like to say that I am a newbie in R, so I apologize in
advance if my questions seem to be too easy for you.

Well, I'm looking for periodicity in histograms. I have histograms of
certain phenomenons and I'm asking whether a periodicity exists in these
data. So, I make a periodogram with the function spec.pgram. For instance,
if I have a histogram h, I call spec.pgram by spec.pgram (h, log="no",
taper=0.5). So, I have some peaks that appear and I would like to interpret
them but I do not know how they are computed and so what a peak with a value
of 10000 represents in comparison with a peak of value 600 with another
histogram.
I looked at the source code of the function spec.pgram to better understand
what is behind. But, when I apply the source code line by line, I've got a
problem. For instance, I make:>data = scan ("file.txt")
>h = hist (data, breaks=max(data)/5000)#then I apply the first two lines of the spec.pgram
function>series <- deparse(substitute(h$counts))
>x <- na.action(as.ts(h$counts))
>xNULL
I do not understand why when I apply the first two lines of the function I
have x which is equal to NULL (which make a mistake in the following lines
of the code) but if I apply the function directly with h$counts it gives me
a result.
So, if someone can explain to me what is the problem and/or how spec.pgram
exactly computes the periodogram and how to interpret it with my data, I
would be so grateful.
And subsidiary questions:
- Is it possible to have the commented source code of the function?
- I do not understand what is the function na.action in the second line of
spec.pgram, so if you can explain it to me.

Thanks in advance for your answers.
Best regards,

Anthony Mathelier

	[[alternative HTML version deleted]]

Matthieu Stigler

2008-Jun-10 12:49 UTC

head link

[R] using spec.pgram

Hello

I don't know exactly what you want to do but:

-why do you use in your example h$counts and not h? Furthermore helpl 
file says it should be a time series, why then rather not your time series?

-usually na.action will make the "default" action, which you can see
by
getOptions("na.action")

-here in this function it is provided in the function values na.action = 
na.fail so it will just remove the NA in the time series

-if you want to study a function, I advise you to copy it entirely, 
rename it and then just insert print(curiousobject...) in the function, 
this will allow you to let the function run and grasp the interessting 
objects, like:

study<-function (x, spans = NULL, kernel = NULL, taper = 0.1, pad = 0,   
fast = TRUE, demean = FALSE, detrend = TRUE, plot = TRUE,
    na.action = na.fail, ...)
{
    series <- deparse(substitute(x))
    x <- na.action(as.ts(x))
    print(x)
    xfreq <- frequency(x)
 ...}
study(sunspots)

-when you provide an example, instead of giving an external reference 
for the data, try to search a convenient internal data (accessed by 
data() ), so one will be able to reproduce your problems. Here you could 
use sunspots

-to obtain the commented code... I don't know it...

-good luck

Matthieu



> Hi everyone,
>
> first of all, I would like to say that I am a newbie in R, so I apologize
in
> advance if my questions seem to be too easy for you.
>
> Well, I'm looking for periodicity in histograms. I have histograms of
> certain phenomenons and I'm asking whether a periodicity exists in
these
> data. So, I make a periodogram with the function spec.pgram. For instance,
> if I have a histogram h, I call spec.pgram by spec.pgram (h,
log="no",
> taper=0.5). So, I have some peaks that appear and I would like to interpret
> them but I do not know how they are computed and so what a peak with a
value
> of 10000 represents in comparison with a peak of value 600 with another
> histogram.
> I looked at the source code of the function spec.pgram to better understand
> what is behind. But, when I apply the source code line by line, I've
got a
> problem. For instance, I make:
>   
>> >data = scan ("file.txt")
>> >h = hist (data, breaks=max(data)/5000)
>>     
> #then I apply the first two lines of the spec.pgram function
>   
>> >series <- deparse(substitute(h$counts))
>> >x <- na.action(as.ts(h$counts))
>> >x
>>     
> NULL
> I do not understand why when I apply the first two lines of the function I
> have x which is equal to NULL (which make a mistake in the following lines
> of the code) but if I apply the function directly with h$counts it gives me
> a result.
> So, if someone can explain to me what is the problem and/or how spec.pgram
> exactly computes the periodogram and how to interpret it with my data, I
> would be so grateful.
> And subsidiary questions:
> - Is it possible to have the commented source code of the function?
> - I do not understand what is the function na.action in the second line of
> spec.pgram, so if you can explain it to me.
>
> Thanks in advance for your answers.
> Best regards,
>
> Anthony Mathelier
>
> 	[[alternative HTML version deleted]]

Anthony Mathelier

2008-Jun-17 07:58 UTC

head link

[R] using spec.pgram

Even if the signal given by the histogram is not really a signal, it seems
that spec.pgram can give an interesting evaluation of how the genes are
spaced in the chromosome, like in the article.
So now, when I study a chromosome with 200 interesting genes, I would like
to compare the amplitude of the spectrum of the periodogram given by
spec.pgram (applied on the histogram of the distances between genes) with
another periodogram for a chromosome with 350 interesting genes. As the
spectrum seems to be calculated approximately like fft(x)^2/N (according to
the computation of pgram[] in spec.pgram but perhaps I am wrong) where x
stands for the signal and N for the number of observations in the signal, I
suppose I can compare to periodogram (and the values of the peaks) by
dividing the value of the spectrum by N.
I apply this kind of technique with a cosine signal like
this:> x= 1:1000
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)
> x= 1:100
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)and the two periodograms have an amplitude of the peak equals when we divide
it by the corresponding Ns.
Am I wrong if I use this technique to compare my periodograms?

Thanks in advance.
Best regards,

Anthony

On Mon, Jun 16, 2008 at 8:37 PM, stephen sefick <ssefick@gmail.com> wrote:
> OK so this is what I think-  The gaussian smoothing window is just like the
> smooth curve that i suggested that you draw on top of the histogram (try
> looking at ?density and on the R site search page).  They take this and
then
> make the density = 1.  I am not sure how this is done, but i am sure  that
> you could figure it out.  I believe that what they are doing is still
taking
> the area under the curve at discrete "periodicities" (I would not
call these
> periodicities because there is not really a periodic part of this signal-
> there is re-occurance, but not really periodicity) and that is what the
> "power spectrum" is revealing.  This is not a typical use of the
fourier
> transform, but may be valid.  this signal is not stationary So I would
> suggest using wavelet analysis, but it still does not seem be a classical
> signal analysis problem-  I would look at Price et al. in the reference
> section to see if there is presidence for this type of analysis.  But from
> my experience in time domain to frequency domain problems this does not fit
> the model of data that I have worked with, and therefore it may be a bias
on
> my part, but I would use the histogram as my justification for the
distances
> being significant.
> Good Luck
>
> Stephen
>
>
> On Mon, Jun 16, 2008 at 2:04 PM, stephen sefick <ssefick@gmail.com>
wrote:
>
>> i am reading the paper and trying to figure out what they are doing. 
At
>> this point in time it looks like what they are doing is using the value
at
>> the top of the histogram bar as the value at the distance on the
x-axis.
>> they then use the equivalent of spec.pgram.  My nearest approximation
of
>> what this does is that this analysis is integrating the area under the
curve
>> at a particular time.  It doesn't seem that there is any
periodicity in this
>> data because of the fact that there isn't a real signal here- it is
binned
>> by distance between the genes.  Not to say that spectral density is not
>> valid, but is is not a periodicity that this analysis is look at rather
an
>> amount of power (area under the curve).  I am not entirely sure that
this is
>> any more information than what is contained in the historgrams.  Take a
pen
>> and draw from the top of each box starting on the left-  This is the
>> "signal" that is being analyzed.  I need to read the rest of
the paper and
>> think about it a little bit more.  If you have any ideas- pass them
along.
>>
>> Stephen
>>
>>
>> On Mon, Jun 16, 2008 at 12:14 PM, Anthony Mathelier <
>> anthony.mathelier@gmail.com> wrote:
>>
>>> OK, it seems like I do not succeed in expressing what I do, or want
to
>>> do. So, I give you the example that bring me to this kind of
analysis. I
>>> wrote the paper "Chromosomal periodicity of evolutionary
conserved gene
>>> pairs" (which you can download at
>>> http://www.pnas.org/cgi/reprint/104/25/10559). In figure 2, they
have a
>>> histogram of distances between genes on a chromosome and they make
a
>>> discrete fourier transform analysis to exhibit a period of 117kb.
They
>>> explain how they did in the first paragraph of "Distributions
of distances
>>> and positions and fourier transform" (last page). I thought
that this kind
>>> of analysis was made by spec.pgram with a histogram. But perhaps I
am wrong
>>> because I really do not understand what they mean by "the
histogram was
>>> tranformed into a continuous probability density by using a
Gaussian
>>> smoothing window and normalizing the total density over the entire
genome to
>>> 1. A discrete Fourier transform of the data were computed from 0 to
1,000kb
>>> by using a Tukey window to taper the end (ratio of 0.5 for tapered
to
>>> untapered length.".
>>> I hope it explains better what I want to obtain from my distances.
>>> Best regards,
>>>
>>> Anthony
>>>
>>>
>>> On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick
<ssefick@gmail.com>
>>> wrote:
>>>
>>>> To get some sort of frequency which in your case seem to be
cycles per
>>>> distance?  Is a valid use of a fourier transform as long as it
is a distance
>>>> that is measured in a way that would be analogous to a time
series-  In
>>>> other words if the distance proceeds from an origin in one
direction-
>>>> geophysicists do this often with the realization of an
earthquake picked up
>>>> by sensors that are a distance away from the origin of the
epicenter, but
>>>> they are looking for coherencies in the signal from one place
to the next in
>>>> the frequency domain seperated by distance- this is called beam
forming-
>>>> They use the raw signal-  by binning (making a histogram) the
data you are
>>>> loosing the signal-  you are looking at frequency of occurance
of certain
>>>> values not for the underlying periodicities of the data (in
time or
>>>> space).   You are fitting cos and isin function to you data to
see if there
>>>> is periodicity-  the power is the integration of the
convolution of this sin
>>>> and cosine function with your data-  It seems to me meaningless
to preform
>>>> this convolution agianst something that is not a signal (the
histogram).  If
>>>> you want to use a frequency domain technique you have to have a
frequency to
>>>> investigate-  a histogram does not have this-  I is a frequency
of occurance
>>>> by bin size which is NOT what you want (your would have
cycles/binlength
>>>> that doesn't make any sense to me) to do this analysis on- 
You want a
>>>> signal-  dissolved oxygen curve, sunspot record, etc. through
time, or
>>>> distance as stated above- you are looking for the frequency of
a waveform-
>>>> Anyway,  I may be misunderstanding- supply some code and
explain the data
>>>> otherwise this line of though- in my limited expertise- is a
dead end,  but
>>>> agian I still don't know what it is that you are, exactly,
trying to do- and
>>>> what your dataset constits.  I hope these ruminations help
>>>>
>>>> I recommend doing this analysis on the raw data- It doesn't
matter that
>>>> you don't have the same amount of data points- as long as
both sets of data
>>>> have circa ten times the length of (cycles/distance) what you
want to
>>>> detect-  If things in your case are spaced by one meter then
the lowest
>>>> cycle perdistance that you can reliably detect if 0.5 meters, 
this is all
>>>> speculation because you don't have a problem with
reproducible code, and we
>>>> have no idea what you are measuring or what your data looks
like-  without
>>>> this information there is no way that I can say one way or the
other that
>>>> you approach (suggested non-histogram) would be right or wrong.
>>>>
>>>> Stephen
>>>>
>>>>
>>>> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier <
>>>> anthony.mathelier@gmail.com> wrote:
>>>>
>>>>> Perhaps I'm applying spec.pgram wrong as you said. I
will explain what
>>>>> I want, so you can tell me why I'm wrong and perhaps
what I have to do to do
>>>>> it well.
>>>>> I have some points in a 1-D space and I want to know if
they are spaced
>>>>> at a certain periodic distance. So, I computed all the
distances between
>>>>> points in my space. Then, I would like to know if a certain
distance
>>>>> (period), or multiples of a certain distance, is preferred
to space my data.
>>>>> I made a histogram of the distances and apply the
spec.pgram function to
>>>>> know the frequence (so the period) which is the most
important to space the
>>>>> original data.
>>>>> But, when I have to sets of data (without necessarily the
same number
>>>>> of observation in each set), I want to compare the
importance of the period
>>>>> given by spec.pgram between the sets. Could I normalize the
amplitude of the
>>>>> peaks given by spec.pgram?
>>>>> So, am I wrong to apply this methodology to exhibit a
periodic distance
>>>>> between my data? If, true, what could you recommend me to
do this?
>>>>> Thanks in advance for your answers.
>>>>> Best regards,
>>>>>
>>>>> Anthony
>>>>>
>>>>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick
<ssefick@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I from a first thought I would say that you are apply
this wrong!  The
>>>>>> fourier transform convolves a function (cos(x)+isin(x) 
(this may not be the
>>>>>> exact formula but I don't have my books near)) to
the data and then
>>>>>> integrates over -1/2 to 1/2 takes the modulus and plots
this- the
>>>>>> periodogram.  The reason you preform a fourier
transform is to look at
>>>>>> recurring frequencies in the data, which are in the
time domain.  The
>>>>>> fourier transform converts the time series into the
frequency domain and
>>>>>> viola you have a peak into the hidden/recurring parts
of your signal.  From
>>>>>> your explaination your are applying this technique
wrong-  look at schumway,
>>>>>> MASS4, et al. books to get a handle on how this
technique is used.  If you
>>>>>> are to apply a time series analysis please use it on a
time series.  Maybe
>>>>>> your logic is not flawed but I don't see how a
histogram with its associated
>>>>>> binning is a better candidate for time series analysis
than the original
>>>>>> time series if at all.
>>>>>> good luck
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler <
>>>>>> Matthieu.Stigler@gmail.com> wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I don't know exactly what you want to do but:
>>>>>>>
>>>>>>> -why do you use in your example h$counts and not h?
Furthermore helpl
>>>>>>> file says it should be a time series, why then
rather not your time series?
>>>>>>>
>>>>>>> -usually na.action will make the
"default" action, which you can see
>>>>>>> by getOptions("na.action")
>>>>>>>
>>>>>>> -here in this function it is provided in the
function values
>>>>>>> na.action = na.fail so it will just remove the NA
in the time series
>>>>>>>
>>>>>>> -if you want to study a function, I advise you to
copy it entirely,
>>>>>>> rename it and then just insert
print(curiousobject...) in the function, this
>>>>>>> will allow you to let the function run and grasp
the interessting objects,
>>>>>>> like:
>>>>>>>
>>>>>>> study<-function (x, spans = NULL, kernel = NULL,
taper = 0.1, pad >>>>>>> 0,   fast = TRUE, demean = FALSE,
detrend = TRUE, plot = TRUE,
>>>>>>>   na.action = na.fail, ...)
>>>>>>> {
>>>>>>>   series <- deparse(substitute(x))
>>>>>>>   x <- na.action(as.ts(x))
>>>>>>>   print(x)
>>>>>>>   xfreq <- frequency(x)
>>>>>>> ...}
>>>>>>> study(sunspots)
>>>>>>>
>>>>>>> -when you provide an example, instead of giving an
external reference
>>>>>>> for the data, try to search a convenient internal
data (accessed by data()
>>>>>>> ), so one will be able to reproduce your problems.
Here you could use
>>>>>>> sunspots
>>>>>>>
>>>>>>> -to obtain the commented code... I don't know
it...
>>>>>>>
>>>>>>> -good luck
>>>>>>>
>>>>>>> Matthieu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Hi everyone,
>>>>>>>>
>>>>>>>> first of all, I would like to say that I am a
newbie in R, so I
>>>>>>>> apologize in
>>>>>>>> advance if my questions seem to be too easy for
you.
>>>>>>>>
>>>>>>>> Well, I'm looking for periodicity in
histograms. I have histograms
>>>>>>>> of
>>>>>>>> certain phenomenons and I'm asking whether
a periodicity exists in
>>>>>>>> these
>>>>>>>> data. So, I make a periodogram with the
function spec.pgram. For
>>>>>>>> instance,
>>>>>>>> if I have a histogram h, I call spec.pgram by
spec.pgram (h,
>>>>>>>> log="no",
>>>>>>>> taper=0.5). So, I have some peaks that appear
and I would like to
>>>>>>>> interpret
>>>>>>>> them but I do not know how they are computed
and so what a peak with
>>>>>>>> a value
>>>>>>>> of 10000 represents in comparison with a peak
of value 600 with
>>>>>>>> another
>>>>>>>> histogram.
>>>>>>>> I looked at the source code of the function
spec.pgram to better
>>>>>>>> understand
>>>>>>>> what is behind. But, when I apply the source
code line by line, I've
>>>>>>>> got a
>>>>>>>> problem. For instance, I make:
>>>>>>>>
>>>>>>>>
>>>>>>>>> >data = scan ("file.txt")
>>>>>>>>> >h = hist (data, breaks=max(data)/5000)
>>>>>>>>>
>>>>>>>>>
>>>>>>>> #then I apply the first two lines of the
spec.pgram function
>>>>>>>>
>>>>>>>>
>>>>>>>>> >series <-
deparse(substitute(h$counts))
>>>>>>>>> >x <- na.action(as.ts(h$counts))
>>>>>>>>> >x
>>>>>>>>>
>>>>>>>>>
>>>>>>>> NULL
>>>>>>>> I do not understand why when I apply the first
two lines of the
>>>>>>>> function I
>>>>>>>> have x which is equal to NULL (which make a
mistake in the following
>>>>>>>> lines
>>>>>>>> of the code) but if I apply the function
directly with h$counts it
>>>>>>>> gives me
>>>>>>>> a result.
>>>>>>>> So, if someone can explain to me what is the
problem and/or how
>>>>>>>> spec.pgram
>>>>>>>> exactly computes the periodogram and how to
interpret it with my
>>>>>>>> data, I
>>>>>>>> would be so grateful.
>>>>>>>> And subsidiary questions:
>>>>>>>> - Is it possible to have the commented source
code of the function?
>>>>>>>> - I do not understand what is the function
na.action in the second
>>>>>>>> line of
>>>>>>>> spec.pgram, so if you can explain it to me.
>>>>>>>>
>>>>>>>> Thanks in advance for your answers.
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Anthony Mathelier
>>>>>>>>
>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Let's not spend our time and resources thinking
about things that are
>>>>>> so little or so large that all they really do for us is
puff us up and make
>>>>>> us feel like gods. We are mammals, and have not
exhausted the annoying
>>>>>> little problems of being mammals.
>>>>>>
>>>>>> -K. Mullis
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Let's not spend our time and resources thinking about
things that are so
>>>> little or so large that all they really do for us is puff us up
and make us
>>>> feel like gods. We are mammals, and have not exhausted the
annoying little
>>>> problems of being mammals.
>>>>
>>>> -K. Mullis
>>>>
>>>
>>>
>>
>>
>> --
>> Let's not spend our time and resources thinking about things that
are so
>> little or so large that all they really do for us is puff us up and
make us
>> feel like gods. We are mammals, and have not exhausted the annoying
little
>> problems of being mammals.
>>
>> -K. Mullis
>>
>
>
>
> --
> Let's not spend our time and resources thinking about things that are
so
> little or so large that all they really do for us is puff us up and make us
> feel like gods. We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
> -K. Mullis
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more reasonably related threads

R help - Jun 2008 - using spec.pgram

[R] using spec.pgram

[R] using spec.pgram

[R] using spec.pgram

Seemingly Similar Threads