thr3ads.net - R help - [R] Changing the binning of collected data [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Lorenzo Isella

2009-Apr-21 17:43 UTC

[R] Changing the binning of collected data

Dear All,
Apologies if this is too simple for this list.
Let us assume that you have an instrument measuring particle distributions.
The output is a set of counts {n_i} corresponding to a set of average
sizes {d_i}.
The set of {d_i} ranges from d_i_min to d_i_max either linearly of
logarithmically.
There is no access to further detailed information about the
distribution of the measured sizes, but at least you know enough to
plot n(d_i) (number of counts as a function of particle size).
If you can fit the {n_i} to a known distribution (e.g. normal or
lognormal), then you can choose a new set of average sizes, {D_i} and
plot the corresponding n_i(D_i).
But what if the initial {n_i}'s observations do not belong to a known
distribution and you still want to calculate n(D_i)?
On the top of my head, I think that whatever I do must conserve the
original total number of observations N=\sum_i{n_i}, but this does not
terribly constrain the problem.
Any suggestion is welcome.
Many thanks

Lorenzo

Luc Villandre

2009-Apr-21 18:40 UTC

head link

[R] Changing the binning of collected data

Hi Lorenzo,

I think it would be better if you provided a few example 
datasets/tables. Right now, I can't exactly circumscribe your problem.

When binning data, the cut() function tends to be very useful. To fit 
common univariate distributions to a given dataset, you should take a 
look at the fitdistr() function in the MASS package.

If this doesn't answer your question, please try to explain in details 
how your problem relates to R.

Best of luck,

Luc

Lorenzo Isella wrote:> Dear All,
> Apologies if this is too simple for this list.
> Let us assume that you have an instrument measuring particle distributions.
> The output is a set of counts {n_i} corresponding to a set of average
> sizes {d_i}.
> The set of {d_i} ranges from d_i_min to d_i_max either linearly of
> logarithmically.
> There is no access to further detailed information about the
> distribution of the measured sizes, but at least you know enough to
> plot n(d_i) (number of counts as a function of particle size).
> If you can fit the {n_i} to a known distribution (e.g. normal or
> lognormal), then you can choose a new set of average sizes, {D_i} and
> plot the corresponding n_i(D_i).
> But what if the initial {n_i}'s observations do not belong to a known
> distribution and you still want to calculate n(D_i)?
> On the top of my head, I think that whatever I do must conserve the
> original total number of observations N=\sum_i{n_i}, but this does not
> terribly constrain the problem.
> Any suggestion is welcome.
> Many thanks
>
> Lorenzo
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jim Lemon

2009-Apr-22 11:44 UTC

head link

[R] Changing the binning of collected data

Lorenzo Isella wrote:> Dear All,
> Apologies if this is too simple for this list.
> Let us assume that you have an instrument measuring particle distributions.
> The output is a set of counts {n_i} corresponding to a set of average
> sizes {d_i}.
> The set of {d_i} ranges from d_i_min to d_i_max either linearly of
> logarithmically.
> There is no access to further detailed information about the
> distribution of the measured sizes, but at least you know enough to
> plot n(d_i) (number of counts as a function of particle size).
> If you can fit the {n_i} to a known distribution (e.g. normal or
> lognormal), then you can choose a new set of average sizes, {D_i} and
> plot the corresponding n_i(D_i).
> But what if the initial {n_i}'s observations do not belong to a known
> distribution and you still want to calculate n(D_i)?
> On the top of my head, I think that whatever I do must conserve the
> original total number of observations N=\sum_i{n_i}, but this does not
> terribly constrain the problem.
> Any suggestion is welcome.
>   Hi Lorenzo,
You should probably be aware that both the position and spacing of 
category boundaries can have a large effect on parameter location tests 
carried out on the categorized data. See:

Wainer, H., Geseroli, M. & Verdi, M. (2006) Finding what is not there 
through the unfortunate binning of results: The Mendel effect. 
Chance,19(1): 49-52.

Lemon, J. On the perils of categorizing responses. Tutorials in 
Quantitative Methods for Psychology, 5(1): 35-39.

Jim

Maybe Matching Threads

Search for more maybe matching threads

R help - Apr 2009 - Changing the binning of collected data

[R] Changing the binning of collected data

[R] Changing the binning of collected data

[R] Changing the binning of collected data

Maybe Matching Threads