Louise Mair
2011-Jan-27 14:58 UTC
[R] creating categorical frequency tables from continuous data
Hello, I am working with a dataset which essentially has only one column - a list of distances in metres, accurate to several decimal places. eg distance 1000 6403.124 1000 1414.214 1414.214 1000 I want to organise this into a frequency table, grouping into categories of 0 - 999, 1000 - 1999, 2000-2999 etc. I'd also like the rows where there are no data points in that category to contain 0, in order to be able to plot a histrogram with a linear x axis, and to statistically analyse differences between datasets. I have tried table() which doesn't group the data the way I'd like it, I've also tried cut() but couldn't make it work. Ideally I'd like the output to look something like this... distance frequency 0-999 0 1000-1999 3 2000-2999 0 ... Any suggestions that are an improvement on doing it manually please? Thanks in advance! Louise
Sascha Vieweg
2011-Jan-27 15:31 UTC
[R] creating categorical frequency tables from continuous data
On 11-01-27 14:58, Louise Mair wrote:> Hello, > > I am working with a dataset which essentially has only one column - a list of > distances in metres, accurate to several decimal places. eg > > distance > 1000 > 6403.124 > 1000 > 1414.214 > 1414.214 > 1000 > > I want to organise this into a frequency table, grouping into categories of 0 > - 999, 1000 - 1999, 2000-2999 etc. I'd also like the rows where there are no > data points in that category to contain 0, in order to be able to plot a > histrogram with a linear x axis, and to statistically analyse differences > between datasets. > > I have tried table() which doesn't group the data the way I'd like it, I've > also tried cut() but couldn't make it work. Ideally I'd like the output to > look something like this... > > distance frequency > 0-999 0 > 1000-1999 3 > 2000-2999 0 > ...Could be a starting point for testing: x <- abs(rnorm(500, 5000, 3000) br <- seq(0, 20000, 1000) summary(cut(x, br, labels=br[-1], include.lowest=T, ordered_result=T)) Look at ?cut to find out more. Good luck, *S*> > Any suggestions that are an improvement on doing it manually please? > > Thanks in advance! > > Louise > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Sascha Vieweg, saschaview at gmail.com
Dennis Murphy
2011-Jan-27 15:32 UTC
[R] creating categorical frequency tables from continuous data
Hi: Using your data below as the template, u <- cut(x, breaks = c(0, seq(1000, 7000, by = 1000)), dig.lab = 4, right FALSE) as.data.frame(table(u)) u Freq 1 [0,1000) 0 2 [1000,2000) 5 3 [2000,3000) 0 4 [3000,4000) 0 5 [4000,5000) 0 6 [5000,6000) 0 7 [6000,7000) 1 It makes no sense to use labels like 0 - 999, 1000 - 1999, etc. when your values are continuous. If you round or truncate them, you can define cut() the same way as above and use the labels argument to format the labels as you wish, something like labs <- c('0-999', '1000-1999', '2000-2999', '3000-3999', '4000-4999', '5000-5999', '6000-6999') v <- cut(x, breaks = c(0, seq(1000, 7000, by = 1000)), labels = labs, right = FALSE) as.data.frame(table(v)) HTH, Dennis On Thu, Jan 27, 2011 at 6:58 AM, Louise Mair <lm609@york.ac.uk> wrote:> Hello, > > I am working with a dataset which essentially has only one column - a list > of distances in metres, accurate to several decimal places. eg > > distance > 1000 > 6403.124 > 1000 > 1414.214 > 1414.214 > 1000 > > I want to organise this into a frequency table, grouping into categories of > 0 - 999, 1000 - 1999, 2000-2999 etc. I'd also like the rows where there are > no data points in that category to contain 0, in order to be able to plot a > histrogram with a linear x axis, and to statistically analyse differences > between datasets. > > I have tried table() which doesn't group the data the way I'd like it, > I've also tried cut() but couldn't make it work. Ideally I'd like the output > to look something like this... > > distance frequency > 0-999 0 > 1000-1999 3 > 2000-2999 0 > ... > > Any suggestions that are an improvement on doing it manually please? > > Thanks in advance! > > Louise > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Possibly Parallel Threads
- Unexpected values obtained when reading in data using ncdf and ncdf4
- Creating polygons from scattered points
- Unexpected values obtained when reading in data using ncdf and ncdf4
- chisq.test vs manual calculation - why are different results produced?
- Samba Won't start after upgrading Operating system