Even if the signal given by the histogram is not really a signal, it seems
that spec.pgram can give an interesting evaluation of how the genes are
spaced in the chromosome, like in the article.
So now, when I study a chromosome with 200 interesting genes, I would like
to compare the amplitude of the spectrum of the periodogram given by
spec.pgram (applied on the histogram of the distances between genes) with
another periodogram for a chromosome with 350 interesting genes. As the
spectrum seems to be calculated approximately like fft(x)^2/N (according to
the computation of pgram[] in spec.pgram but perhaps I am wrong) where x
stands for the signal and N for the number of observations in the signal, I
suppose I can compare to periodogram (and the values of the peaks) by
dividing the value of the spectrum by N.
I apply this kind of technique with a cosine signal like
this:> x= 1:1000
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)
> x= 1:100
> cox = cos (x)
> spec.pgram (cox, log="no", taper=0.5)
and the two periodograms have an amplitude of the peak equals when we divide
it by the corresponding Ns.
Am I wrong if I use this technique to compare my periodograms?
Thanks in advance.
Best regards,
Anthony
On Mon, Jun 16, 2008 at 8:37 PM, stephen sefick <ssefick@gmail.com> wrote:
> OK so this is what I think- The gaussian smoothing window is just like the
> smooth curve that i suggested that you draw on top of the histogram (try
> looking at ?density and on the R site search page). They take this and
then
> make the density = 1. I am not sure how this is done, but i am sure that
> you could figure it out. I believe that what they are doing is still
taking
> the area under the curve at discrete "periodicities" (I would not
call these
> periodicities because there is not really a periodic part of this signal-
> there is re-occurance, but not really periodicity) and that is what the
> "power spectrum" is revealing. This is not a typical use of the
fourier
> transform, but may be valid. this signal is not stationary So I would
> suggest using wavelet analysis, but it still does not seem be a classical
> signal analysis problem- I would look at Price et al. in the reference
> section to see if there is presidence for this type of analysis. But from
> my experience in time domain to frequency domain problems this does not fit
> the model of data that I have worked with, and therefore it may be a bias
on
> my part, but I would use the histogram as my justification for the
distances
> being significant.
> Good Luck
>
> Stephen
>
>
> On Mon, Jun 16, 2008 at 2:04 PM, stephen sefick <ssefick@gmail.com>
wrote:
>
>> i am reading the paper and trying to figure out what they are doing.
At
>> this point in time it looks like what they are doing is using the value
at
>> the top of the histogram bar as the value at the distance on the
x-axis.
>> they then use the equivalent of spec.pgram. My nearest approximation
of
>> what this does is that this analysis is integrating the area under the
curve
>> at a particular time. It doesn't seem that there is any
periodicity in this
>> data because of the fact that there isn't a real signal here- it is
binned
>> by distance between the genes. Not to say that spectral density is not
>> valid, but is is not a periodicity that this analysis is look at rather
an
>> amount of power (area under the curve). I am not entirely sure that
this is
>> any more information than what is contained in the historgrams. Take a
pen
>> and draw from the top of each box starting on the left- This is the
>> "signal" that is being analyzed. I need to read the rest of
the paper and
>> think about it a little bit more. If you have any ideas- pass them
along.
>>
>> Stephen
>>
>>
>> On Mon, Jun 16, 2008 at 12:14 PM, Anthony Mathelier <
>> anthony.mathelier@gmail.com> wrote:
>>
>>> OK, it seems like I do not succeed in expressing what I do, or want
to
>>> do. So, I give you the example that bring me to this kind of
analysis. I
>>> wrote the paper "Chromosomal periodicity of evolutionary
conserved gene
>>> pairs" (which you can download at
>>> http://www.pnas.org/cgi/reprint/104/25/10559). In figure 2, they
have a
>>> histogram of distances between genes on a chromosome and they make
a
>>> discrete fourier transform analysis to exhibit a period of 117kb.
They
>>> explain how they did in the first paragraph of "Distributions
of distances
>>> and positions and fourier transform" (last page). I thought
that this kind
>>> of analysis was made by spec.pgram with a histogram. But perhaps I
am wrong
>>> because I really do not understand what they mean by "the
histogram was
>>> tranformed into a continuous probability density by using a
Gaussian
>>> smoothing window and normalizing the total density over the entire
genome to
>>> 1. A discrete Fourier transform of the data were computed from 0 to
1,000kb
>>> by using a Tukey window to taper the end (ratio of 0.5 for tapered
to
>>> untapered length.".
>>> I hope it explains better what I want to obtain from my distances.
>>> Best regards,
>>>
>>> Anthony
>>>
>>>
>>> On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick
<ssefick@gmail.com>
>>> wrote:
>>>
>>>> To get some sort of frequency which in your case seem to be
cycles per
>>>> distance? Is a valid use of a fourier transform as long as it
is a distance
>>>> that is measured in a way that would be analogous to a time
series- In
>>>> other words if the distance proceeds from an origin in one
direction-
>>>> geophysicists do this often with the realization of an
earthquake picked up
>>>> by sensors that are a distance away from the origin of the
epicenter, but
>>>> they are looking for coherencies in the signal from one place
to the next in
>>>> the frequency domain seperated by distance- this is called beam
forming-
>>>> They use the raw signal- by binning (making a histogram) the
data you are
>>>> loosing the signal- you are looking at frequency of occurance
of certain
>>>> values not for the underlying periodicities of the data (in
time or
>>>> space). You are fitting cos and isin function to you data to
see if there
>>>> is periodicity- the power is the integration of the
convolution of this sin
>>>> and cosine function with your data- It seems to me meaningless
to preform
>>>> this convolution agianst something that is not a signal (the
histogram). If
>>>> you want to use a frequency domain technique you have to have a
frequency to
>>>> investigate- a histogram does not have this- I is a frequency
of occurance
>>>> by bin size which is NOT what you want (your would have
cycles/binlength
>>>> that doesn't make any sense to me) to do this analysis on-
You want a
>>>> signal- dissolved oxygen curve, sunspot record, etc. through
time, or
>>>> distance as stated above- you are looking for the frequency of
a waveform-
>>>> Anyway, I may be misunderstanding- supply some code and
explain the data
>>>> otherwise this line of though- in my limited expertise- is a
dead end, but
>>>> agian I still don't know what it is that you are, exactly,
trying to do- and
>>>> what your dataset constits. I hope these ruminations help
>>>>
>>>> I recommend doing this analysis on the raw data- It doesn't
matter that
>>>> you don't have the same amount of data points- as long as
both sets of data
>>>> have circa ten times the length of (cycles/distance) what you
want to
>>>> detect- If things in your case are spaced by one meter then
the lowest
>>>> cycle perdistance that you can reliably detect if 0.5 meters,
this is all
>>>> speculation because you don't have a problem with
reproducible code, and we
>>>> have no idea what you are measuring or what your data looks
like- without
>>>> this information there is no way that I can say one way or the
other that
>>>> you approach (suggested non-histogram) would be right or wrong.
>>>>
>>>> Stephen
>>>>
>>>>
>>>> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier <
>>>> anthony.mathelier@gmail.com> wrote:
>>>>
>>>>> Perhaps I'm applying spec.pgram wrong as you said. I
will explain what
>>>>> I want, so you can tell me why I'm wrong and perhaps
what I have to do to do
>>>>> it well.
>>>>> I have some points in a 1-D space and I want to know if
they are spaced
>>>>> at a certain periodic distance. So, I computed all the
distances between
>>>>> points in my space. Then, I would like to know if a certain
distance
>>>>> (period), or multiples of a certain distance, is preferred
to space my data.
>>>>> I made a histogram of the distances and apply the
spec.pgram function to
>>>>> know the frequence (so the period) which is the most
important to space the
>>>>> original data.
>>>>> But, when I have to sets of data (without necessarily the
same number
>>>>> of observation in each set), I want to compare the
importance of the period
>>>>> given by spec.pgram between the sets. Could I normalize the
amplitude of the
>>>>> peaks given by spec.pgram?
>>>>> So, am I wrong to apply this methodology to exhibit a
periodic distance
>>>>> between my data? If, true, what could you recommend me to
do this?
>>>>> Thanks in advance for your answers.
>>>>> Best regards,
>>>>>
>>>>> Anthony
>>>>>
>>>>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick
<ssefick@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I from a first thought I would say that you are apply
this wrong! The
>>>>>> fourier transform convolves a function (cos(x)+isin(x)
(this may not be the
>>>>>> exact formula but I don't have my books near)) to
the data and then
>>>>>> integrates over -1/2 to 1/2 takes the modulus and plots
this- the
>>>>>> periodogram. The reason you preform a fourier
transform is to look at
>>>>>> recurring frequencies in the data, which are in the
time domain. The
>>>>>> fourier transform converts the time series into the
frequency domain and
>>>>>> viola you have a peak into the hidden/recurring parts
of your signal. From
>>>>>> your explaination your are applying this technique
wrong- look at schumway,
>>>>>> MASS4, et al. books to get a handle on how this
technique is used. If you
>>>>>> are to apply a time series analysis please use it on a
time series. Maybe
>>>>>> your logic is not flawed but I don't see how a
histogram with its associated
>>>>>> binning is a better candidate for time series analysis
than the original
>>>>>> time series if at all.
>>>>>> good luck
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler <
>>>>>> Matthieu.Stigler@gmail.com> wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I don't know exactly what you want to do but:
>>>>>>>
>>>>>>> -why do you use in your example h$counts and not h?
Furthermore helpl
>>>>>>> file says it should be a time series, why then
rather not your time series?
>>>>>>>
>>>>>>> -usually na.action will make the
"default" action, which you can see
>>>>>>> by getOptions("na.action")
>>>>>>>
>>>>>>> -here in this function it is provided in the
function values
>>>>>>> na.action = na.fail so it will just remove the NA
in the time series
>>>>>>>
>>>>>>> -if you want to study a function, I advise you to
copy it entirely,
>>>>>>> rename it and then just insert
print(curiousobject...) in the function, this
>>>>>>> will allow you to let the function run and grasp
the interessting objects,
>>>>>>> like:
>>>>>>>
>>>>>>> study<-function (x, spans = NULL, kernel = NULL,
taper = 0.1, pad >>>>>>> 0, fast = TRUE, demean = FALSE,
detrend = TRUE, plot = TRUE,
>>>>>>> na.action = na.fail, ...)
>>>>>>> {
>>>>>>> series <- deparse(substitute(x))
>>>>>>> x <- na.action(as.ts(x))
>>>>>>> print(x)
>>>>>>> xfreq <- frequency(x)
>>>>>>> ...}
>>>>>>> study(sunspots)
>>>>>>>
>>>>>>> -when you provide an example, instead of giving an
external reference
>>>>>>> for the data, try to search a convenient internal
data (accessed by data()
>>>>>>> ), so one will be able to reproduce your problems.
Here you could use
>>>>>>> sunspots
>>>>>>>
>>>>>>> -to obtain the commented code... I don't know
it...
>>>>>>>
>>>>>>> -good luck
>>>>>>>
>>>>>>> Matthieu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> first of all, I would like to say that I am a
newbie in R, so I
>>>>>>>> apologize in
>>>>>>>> advance if my questions seem to be too easy for
you.
>>>>>>>>
>>>>>>>> Well, I'm looking for periodicity in
histograms. I have histograms
>>>>>>>> of
>>>>>>>> certain phenomenons and I'm asking whether
a periodicity exists in
>>>>>>>> these
>>>>>>>> data. So, I make a periodogram with the
function spec.pgram. For
>>>>>>>> instance,
>>>>>>>> if I have a histogram h, I call spec.pgram by
spec.pgram (h,
>>>>>>>> log="no",
>>>>>>>> taper=0.5). So, I have some peaks that appear
and I would like to
>>>>>>>> interpret
>>>>>>>> them but I do not know how they are computed
and so what a peak with
>>>>>>>> a value
>>>>>>>> of 10000 represents in comparison with a peak
of value 600 with
>>>>>>>> another
>>>>>>>> histogram.
>>>>>>>> I looked at the source code of the function
spec.pgram to better
>>>>>>>> understand
>>>>>>>> what is behind. But, when I apply the source
code line by line, I've
>>>>>>>> got a
>>>>>>>> problem. For instance, I make:
>>>>>>>>
>>>>>>>>
>>>>>>>>> >data = scan ("file.txt")
>>>>>>>>> >h = hist (data, breaks=max(data)/5000)
>>>>>>>>>
>>>>>>>>>
>>>>>>>> #then I apply the first two lines of the
spec.pgram function
>>>>>>>>
>>>>>>>>
>>>>>>>>> >series <-
deparse(substitute(h$counts))
>>>>>>>>> >x <- na.action(as.ts(h$counts))
>>>>>>>>> >x
>>>>>>>>>
>>>>>>>>>
>>>>>>>> NULL
>>>>>>>> I do not understand why when I apply the first
two lines of the
>>>>>>>> function I
>>>>>>>> have x which is equal to NULL (which make a
mistake in the following
>>>>>>>> lines
>>>>>>>> of the code) but if I apply the function
directly with h$counts it
>>>>>>>> gives me
>>>>>>>> a result.
>>>>>>>> So, if someone can explain to me what is the
problem and/or how
>>>>>>>> spec.pgram
>>>>>>>> exactly computes the periodogram and how to
interpret it with my
>>>>>>>> data, I
>>>>>>>> would be so grateful.
>>>>>>>> And subsidiary questions:
>>>>>>>> - Is it possible to have the commented source
code of the function?
>>>>>>>> - I do not understand what is the function
na.action in the second
>>>>>>>> line of
>>>>>>>> spec.pgram, so if you can explain it to me.
>>>>>>>>
>>>>>>>> Thanks in advance for your answers.
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Anthony Mathelier
>>>>>>>>
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Let's not spend our time and resources thinking
about things that are
>>>>>> so little or so large that all they really do for us is
puff us up and make
>>>>>> us feel like gods. We are mammals, and have not
exhausted the annoying
>>>>>> little problems of being mammals.
>>>>>>
>>>>>> -K. Mullis
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Let's not spend our time and resources thinking about
things that are so
>>>> little or so large that all they really do for us is puff us up
and make us
>>>> feel like gods. We are mammals, and have not exhausted the
annoying little
>>>> problems of being mammals.
>>>>
>>>> -K. Mullis
>>>>
>>>
>>>
>>
>>
>> --
>> Let's not spend our time and resources thinking about things that
are so
>> little or so large that all they really do for us is puff us up and
make us
>> feel like gods. We are mammals, and have not exhausted the annoying
little
>> problems of being mammals.
>>
>> -K. Mullis
>>
>
>
>
> --
> Let's not spend our time and resources thinking about things that are
so
> little or so large that all they really do for us is puff us up and make us
> feel like gods. We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
> -K. Mullis
>
[[alternative HTML version deleted]]