Hi, I have a column in a data frame looking something like: $sex $language $count male english 0 male english 0 female english 32 male spanish 154 female english 11 female norweigan 7 and so on. What I want to do is to order these in to categories, for instance one category where count>=0 & count<10 and so on.. I want my data to turn out looking something like: male english 0-10 1324 male english 11-20 756 ..... male spanish 0-10 354 ... female english 0-10 1557 ... and so on, where the right hand is the count of the number of people in each category. Up until now I've been subsetting the data frame into each category, and then counting number of rows in each subset. However I now have a large amount of different factor combinations which makes this process tedious. Any help would be appreciated! Chris [[alternative HTML version deleted]]
On 03/26/2010 08:41 PM, Christoffer Karlsson wrote:> Hi, > > I have a column in a data frame looking something like: > > $sex $language $count > male english 0 > male english 0 > female english 32 > male spanish 154 > female english 11 > female norweigan 7 > > and so on. > What I want to do is to order these in to categories, for instance one > category where count>=0& count<10 and so on.. > > I want my data to turn out looking something like: > > male english 0-10 1324 > male english 11-20 756 > ..... > male spanish 0-10 354 > ... > female english 0-10 1557 > ... > > and so on, where the right hand is the count of the number of people in each > category. > Up until now I've been subsetting the data frame into each category, and > then counting number of rows in each subset. However I now have a large > amount of different factor combinations which makes this process tedious. > > Any help would be appreciated!Hi Chris, As luck would have it, I have been working on a very similar problem, that of graphically representing multi-level summaries. What you could do is to create a new factor variable with the "cut" function (say, "countcut"), then call the "by" function like this: by(mydf$sex,list(mydf$language,mydf$countcut),sum) You will not get the format you have specified, but you will get the numbers that can be reformatted. Jim
Hi r-help-bounces at r-project.org napsal dne 26.03.2010 10:41:29:> Hi, > > I have a column in a data frame looking something like: > > $sex $language $count > male english 0 > male english 0 > female english 32 > male spanish 154 > female english 11 > female norweigan 7 > > and so on. > What I want to do is to order these in to categories, for instance one > category where count>=0 & count<10 and so on..Break your counts into desired levels, see ?cut cut(1:100, breaks=10)> > I want my data to turn out looking something like: > > male english 0-10 1324 > male english 11-20 756 > ..... > male spanish 0-10 354 > ... > female english 0-10 1557 > ...aggregate your data with(your.data, aggregate(count, list(sex, language, cutted.count), length)) Regards Petr> > and so on, where the right hand is the count of the number of people ineach> category. > Up until now I've been subsetting the data frame into each category, and > then counting number of rows in each subset. However I now have a large > amount of different factor combinations which makes this processtedious.> > Any help would be appreciated! > Chris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Christoffer Karlsson wrote:> > Hi, > > I have a column in a data frame looking something like: > > $sex $language $count > male english 0 > male english 0 > female english 32 > male spanish 154 > female english 11 > female norweigan 7 > > and so on. > What I want to do is to order these in to categories, for instance one > category where count>=0 & count<10 and so on.. > > I want my data to turn out looking something like: > > male english 0-10 1324 > male english 11-20 756 > ..... > male spanish 0-10 354 > ... > female english 0-10 1557 > ... > > and so on, where the right hand is the count of the number of people in > each > category. > Up until now I've been subsetting the data frame into each category, and > then counting number of rows in each subset. However I now have a large > amount of different factor combinations which makes this process tedious. > > Any help would be appreciated! > Chris >You can quickly assign a category to each row in your data frame with the cut() function: testData <- structure(list(sex = structure(c(2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("female", "male"), class = "factor"), language structure(c(1L, 1L, 1L, 3L, 1L, 2L, 3L, 3L, 1L), .Label = c("english", "norweigan", "spanish"), class = "factor"), count = c(0L, 0L, 32L, 154L, 11L, 7L, 3L, 5L, 2L)), .Names = c("sex", "language", "count"), class "data.frame", row.names = c(NA, -9L)) binMax <- ceiling( max(testData$count) / 10 ) * 10 binBreaks <- seq( 0, binMax, by = 10 ) testData$bin <- cut( testData$count, binBreaks, include.lowest = TRUE ) And then as Petr said: with( testData, aggregate(count, list(sex, language, bin), length)) Hope this helps! -Charlie ----- Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://n4.nabble.com/Creating-a-vector-of-categories-tp1691911p1692028.html Sent from the R help mailing list archive at Nabble.com.
Sharpie wrote:> > testData$bin <- cut( testData$count, binBreaks, include.lowest = TRUE ) >I also made a slight mistake, you will want to replace inclde.lowest = TRUE with right = FALSE to the call to cut() to preserve the greater-than-or-equal boundary at the lower end of each bin. Sorry if that caused any confusion! -Charlie ----- Charlie Sharpsteen Undergraduate-- Environmental Resources Engineering Humboldt State University -- View this message in context: http://n4.nabble.com/Creating-a-vector-of-categories-tp1691911p1692030.html Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- Confidence interval bars on Lattice barchart with groups
- "xtable" results doesn't correspond to data.frame
- Frequency and summary statistics table with different variables and categories
- count each answer category in each column
- I need to create new variables based on two numeric variables and one dichotomize conditional category variables.