thr3ads.net - R help - [R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats" [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Tom Willems

2007-Aug-30 09:00 UTC

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

Een ingesloten tekst met niet-gespecificeerde tekenset is
van het bericht gescrubt ...
Naam: niet beschikbaar
Url:
https://stat.ethz.ch/pipermail/r-help/attachments/20070830/e557d2a7/attachment.ksh

Uwe Ligges

2007-Aug-30 09:57 UTC

head link

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

Tom Willems wrote:> Dear R ussers,
> 
> My question is, " How can my mean be outside the confidence intervals
?!"
> 
> I think i have the answer for it, but i would like to hear some other 
> ideas on it.
> 
> First my data is not continuose but categorical, it is a titre calculated 
> on a dilution serie.
> It is stored as a column of values, and a column indicating the phase of 
> the trail.
> Theoreticaly it is possible to have a value ranging from 0 to 4, but in 
> practice, only sertain values will occure, and they will repeat.
> So it are frequencies.
> 
> This is why i belief that it is better to work with a median than with a 
> mean, because it represents the cluster of values wich occure most.
> Below I only give one example, but the mean being below the lowest 
> confidence limit occures several times over different tests.
> 
> does my answer seam reasonable, or should i perhapes use an other methode, 
> any sugestion?
> 
>         summary_1d  = summary(subset(eda_data,  phase=='1' &
test=='test
> 1' ,select=lg_value), na.rm = T)
>         conf_1d  = boxplot.stats(subset(eda_data,  phase=='1' &
> test=='test 1' ,select=lg_value))
> 
>         Mean            Median             95% Confidence     Int. StDev.  
> Variance
>         1.198            1.681                   1.441 >  < 1.922
0.931
>    0.866

I do not understand which "confidence" has been calculated? Based on 
which assumptions / data? Is it pointwise or not? We need much more 
information - and if you think it is a problem with R or usage of R 
functions, then please give us a reproducible example.

Uwe Ligges


> Kind regards,
> Tom W.
> 
> 
> Disclaimer: click here
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

S Ellison

2007-Aug-30 12:17 UTC

head link

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

If you look at ?boxplot.stats, you will find that the confidence interval it
reports is centred on the median and :
"The notches (if requested) extend to '+/-1.58 IQR/sqrt(n)'."

If you have skewed data it is very possible (as you have found) that the mean is
outside median+/-1.58 IQR/sqrt(n).

All that is happening is that the majority of the data are around 1 or 2  and
you have a substantial number near zero. Result: mean much lower than median.
And with a high n, the boxplot notch is very narrow and excludes the mean.

But it does sound very much as if you are doing something questionable at best.
I would not trust IQR as a dispersion measure on discrete data with few possible
values even if they were on an interval scale; too much risk of getting the same
IQR for many different distributions. On an ordinal scale it is worse; the only
points that are valid at all are the valid scale values, so a CI that uses
intermediate values is formally meaningless (what is a shoe size of 7.2, for
example? Answer: Not a shoe size at all). It is of course entirely meaningless
to talk about an IQR on a categorical scale.

It sounds like boxplot.stats is an inappropriate tool for summarising your data.
>>> Tom Willems <Tom.Willems at var.fgov.be> 30/08/2007 10:00:50
>>>Dear R ussers,

My question is, " How can my mean be outside the confidence intervals
?!"

I think i have the answer for it, but i would like to hear some other 
ideas on it.

First my data is not continuose but categorical, it is a titre calculated 
on a dilution serie.
It is stored as a column of values, and a column indicating the phase of 
the trail.
Theoreticaly it is possible to have a value ranging from 0 to 4, but in 
practice, only sertain values will occure, and they will repeat.
So it are frequencies.

This is why i belief that it is better to work with a median than with a 
mean, because it represents the cluster of values wich occure most.
Below I only give one example, but the mean being below the lowest 
confidence limit occures several times over different tests.

does my answer seam reasonable, or should i perhapes use an other methode, 
any sugestion?

        summary_1d  = summary(subset(eda_data,  phase=='1' &
test=='test
1' ,select=lg_value), na.rm = T)
        conf_1d  = boxplot.stats(subset(eda_data,  phase=='1' & 
test=='test 1' ,select=lg_value))

        Mean            Median             95% Confidence     Int. StDev.  
Variance
        1.198            1.681                   1.441 >  < 1.922 0.931
   0.866

Kind regards,
Tom W.


Disclaimer: click here
	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

*******************************************************************
This email and any attachments are confidential. Any use, co...{{dropped}}

Apparently Analagous Threads

Search for more maybe matching threads

R help - Aug 2007 - Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

[R] Q: Mean, median and confidence intervals with functions "summary" & "boxplot.stats"

Apparently Analagous Threads