If there's been an answer to this, I've missed it.
Here's my take.
Antje wrote:> Hi there,
>
> I was wondering if anybody can explain to me why the boxplot ends up
> with different results in the following case:
>
> I have some integer data as a vector and I compare the stats of boxplot
> with the same data divided by a factor.
>
> I've attached a csv file with both data present (d1, d2). The factor is
> 34.16667.
>
> If I run the boxplot function on d1 I get the following stats:
>
> 0.848...
> 0.907...
> 0.936...
> 0.965...
> 1.024...
>
> For d2 I get these stats:
>
> 29
> 31
> 32
> 33
> 36
>
>
> If I convert the stats of d1 with the factor, I get
>
> 29
> 31
> 32
> 33
> 35
>
> Obviously different for the upper whisker. But why???
>
> Antje
Antje:
Three comments:
1. I think your 'factor' is actually 205/6, not 34.16667.
2. This looks like another case of FAQ 7.31:
# Let's take your d2 and create d1; I'll call them x and y:
x <- rep(c(29:38, 40), c(7, 24, 50, 71, 24, 12, 14, 7, 13, 5, 1))
y <- x * 6 / 205
# x is your d2, sorted
# y is your d1, sorted
# The critical values are x[202:203] and y[202:203];
x[201:204]
#[1] 35 35 36 36
# The boxplot stats are:
sx <- boxplot.stats(x)$stats
sy <- boxplot.stats(y)$stats
# Calculate potential extent of upper whisker:
ux <- sx[4] + (sx[4] - sx[2]) * 1.5 #36
uy <- sy[4] + (sy[4] - sy[2]) * 1.5 #1.053658536585366
# Is y[203] <= uy?
y[203] <= uy
#[1] FALSE #!!!
y[202] <= uy
#[1] TRUE
# For x:
x[203] <= ux
#[1] TRUE
And there's your answer: for y the whisker
goes to y[202], not y[203], due to the inevitable
imprecision in machine calculation.
3. last comment: I would not use boxplots for data like this.
-Peter Ehlers
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.