Hi,
I think you're not understanding quite what's going on with hist. Reread
the
help, and take a look at this small example. The solution I'd use is the
last
item.
> x <- rep(1:10, times=1:10)
> table(x)
x
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10>
>
> hist(x, plot=FALSE, right=TRUE)$counts
[1] 3 3 4 5 6 7 8 9 10> hist(x, plot=FALSE, right=TRUE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10> hist(x, plot=FALSE, right=TRUE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5>
>
> hist(x, plot=FALSE, right=FALSE)$counts
[1] 1 2 3 4 5 6 7 8 19> hist(x, plot=FALSE, right=FALSE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10> hist(x, plot=FALSE, right=FALSE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5>
>
> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
[1] 1 2 3 4 5 6 7 8 9 10> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
[1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
10.5> hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
[1] 1 2 3 4 5 6 7 8 9 10
Sarah
On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <aren at arencambre.com>
wrote:> I have two large datasets (156K and 2.06M records). Each row has the
> hour that an event happened, represented by an integer from 0 to 23.
>
> R's histogram is combining some data.
>
> Here's the command I ran to get the histogram:
>> histinfo <- hist(crashes$hour, right=FALSE)
>
> Here's histinfo:
>> histinfo
> $breaks
> ?[1] ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22
23
>
> $counts
> ?[1] ?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601
> ?6596 ?7152 ?7490 ?8166
> [16] ?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 11669
>
> $intensities
> ?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
> ?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $density
> ?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
> 0.02937602 0.03930449
> ?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
> 0.05223967 0.06242403
> [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
> 0.07464911
>
> $mids
> ?[1] ?0.5 ?1.5 ?2.5 ?3.5 ?4.5 ?5.5 ?6.5 ?7.5 ?8.5 ?9.5 10.5 11.5 12.5
> 13.5 14.5 15.5 16.5 17.5
> [19] 18.5 19.5 20.5 21.5 22.5
>
> $xname
> [1] "crashes$hour"
>
> $equidist
> [1] TRUE
>
> attr(,"class")
> [1] "histogram"
>
> Note how the last value in counts is?11669. It's relevant to the
> output of table(crashes$hour):
> ? ? 0 ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? 7 ? ? 8 ? ? 9 ? ?10
> 11 ? ?12 ? ?13 ? ?14
> ?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601
> 6596 ?7152 ?7490 ?8166
> ? ?15 ? ?16 ? ?17 ? ?18 ? ?19 ? ?20 ? ?21 ? ?22 ? ?23
> ?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 ?6000 ?5669
>
> Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
> that correct for the histogram to combine hours 22 and 23? Since I
> specified right = FALSE, I figured there's no way 23 would be combined
> with 22?
>
> Adding?breaks=24 to the hist makes no difference; it's still stuck at
> 23 breaks. I also tried breaks=25 and 23?and several other values, in
> case I am misinterpreting breaks's meaning, but none of them make a
> difference.
>
> I imagine this is a n00b question, so my apologies if this is obvious.
>
> Aren
>
--
Sarah Goslee
functionaldiversity.org