Dear R user,
I am using UK census data on travel to work. The authorities have provided a
breakdown in each area by mode (car, bicycle etc.) and distance travelled (0
? 2 km, 2 ? 5 km etc). Therefore, after processing, the data for Sheffield
look like this https://files.one.ubuntu.com/ej2VtVbJTEaelvMRlsocRg :
dshef <- read.table("distmodesheff.csv", sep=",",
header=TRUE)
print(dshef)
Dist Tr Bici Met Pas Foot Bus Car
1 2 > 45 571 491 2125 16644 4469 13494
2 2 ? 5 80 1136 2540 4738 3659 17290 30212
3 5 ? 10 217 466 2335 3994 1041 12963 35221
4 10 ? 20 191 76 491 1333 332 2439 16322
5 20 ? 30 168 6 25 235 41 175 3711
6 30 ? 40 78 6 3 122 20 74 2179
7 40 ? 60 349 6 21 261 96 333 3501
8 60 < 332 62 125 369 534 433 3276
9 Other 148 40 79 905 388 622 6481
It's interesting to look at the different distributions of different
transport modes:
attach(dshef)
rs <- rbind(Tr,Bici,Met,Pas,Foot,Bus,Car)
barplot(rs, beside=TRUE, names=Dist, col=rainbow(7), legend=TRUE)
http://r.789695.n4.nabble.com/file/n3758198/1.png
This is brilliant, and creates output similar to that of OO calc:
http://r.789695.n4.nabble.com/file/n3758198/egraphmini.jpg
However, as you can see, the pre-made categories (0 ? 2 km etc.) are
unevenly spaced bins within a continuous variable. This puts the analysis
into histogram mode (with frequency determined by the area, not the height).
What I would look for for the vector Car, for example, would be something
like this:
n <- c(rep(1.5,Car[1]), rep(3,Car[2]), rep(7.7,Car[3]),
rep(15,Car[4]),rep(25,Car[5]),
rep(35,Car[6]), rep(50,Car[7]), rep(100,Car[8]) )
hist(n, breaks=c(0,2,5,10,20,30,40,60,200))
http://r.789695.n4.nabble.com/file/n3758198/2.png
This produces a histogram, but it's a tedious an ugly way of getting there.
Also, this does not allow for trend-line analysis of the likely distribution
of the continuous variable distance: lines(density(n)), for example results
in peaks around my arbitrary value.
Has anyone else encountered similar issues? I've searched high and low but
can find no solution other than creating a barplot with variable widths:
http://r.789695.n4.nabble.com/Histogram-using-frequency-data-td827927.html
Any ideas about how to resolve this issue very greatly appreciated.
Eventually I hope to model the distribution of distances travelled in order
to estimate the mean distance within each bin.
Many thanks,
Robin
--
View this message in context:
http://r.789695.n4.nabble.com/Histogram-from-frequency-data-in-pre-made-bins-tp3758198p3758198.html
Sent from the R help mailing list archive at Nabble.com.