michael watson (IAH-C)
2005-Jan-20 13:57 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
I think that title makes sense... I hope it does... I have a data frame, one of the columns of which is a factor. I want the rows of data that correspond to the level in that factor which occurs the most times. I can get a list by doing: by(data,data$pattern,subset) And go through each element of the list counting the rows, to find the maximum.... BUT I can't help thinking there's a more elegant way of doing this.... The second part is figuring out the rows which have the maximum number of consecutive patterns which are the same... Now that I would love some help with... :-) Thanks Mick
Chuck Cleland
2005-Jan-20 14:08 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
newdata <- subset(mydata, mydata$myfact == names(which.max(table(mydata$myfact)))) michael watson (IAH-C) wrote:> I think that title makes sense... I hope it does... > > I have a data frame, one of the columns of which is a factor. I want > the rows of data that correspond to the level in that factor which > occurs the most times. > > I can get a list by doing: > > by(data,data$pattern,subset) > > And go through each element of the list counting the rows, to find the > maximum.... > > BUT I can't help thinking there's a more elegant way of doing this.... > > The second part is figuring out the rows which have the maximum number > of consecutive patterns which are the same... Now that I would love some > help with... :-) > > Thanks > Mick > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
Sean Davis
2005-Jan-20 14:16 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
On Jan 20, 2005, at 8:57 AM, michael watson ((IAH-C)) wrote:> I think that title makes sense... I hope it does... > > I have a data frame, one of the columns of which is a factor. I want > the rows of data that correspond to the level in that factor which > occurs the most times. > > I can get a list by doing: > > by(data,data$pattern,subset) >see ?split. data.split <- split(data,data$pattern)> And go through each element of the list counting the rows, to find the > maximum.... >sort(sapply(data.split,nrow))> BUT I can't help thinking there's a more elegant way of doing this....We'll see what the other responses are.... Sean
Douglas Bates
2005-Jan-20 14:32 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
michael watson (IAH-C) wrote:> I think that title makes sense... I hope it does... > > I have a data frame, one of the columns of which is a factor. I want > the rows of data that correspond to the level in that factor which > occurs the most times.So first you want to determine the mode (in the sense of the most frequently occuring value) of the factor. One way to do this is names(which.max(table(fac))) Use this comparison for the subset as subset(data, pattern == names(which.max(table(pattern))))
Liaw, Andy
2005-Jan-20 16:12 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
> From: Douglas Bates > > michael watson (IAH-C) wrote: > > I think that title makes sense... I hope it does... > > > > I have a data frame, one of the columns of which is a > factor. I want > > the rows of data that correspond to the level in that factor which > > occurs the most times. > > So first you want to determine the mode (in the sense of the most > frequently occuring value) of the factor. One way to do this is > > names(which.max(table(fac))) > > Use this comparison for the subset as > > subset(data, pattern == names(which.max(table(pattern))))Just be careful that if there are ties (i.e., more than one level having the max) which.max() will randomly pick one of them. That may or may not be what's desired. If that is a possibility, Mick will need to think what he wants in such cases. Andy> ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Liaw, Andy
2005-Jan-20 17:40 UTC
[R] Subsetting a data frame by a factor, using the level that occurs the most times
> From: Douglas Bates > > Liaw, Andy wrote: > >>From: Douglas Bates > >> > >>michael watson (IAH-C) wrote: > >> > >>>I think that title makes sense... I hope it does... > >>> > >>>I have a data frame, one of the columns of which is a > >> > >>factor. I want > >> > >>>the rows of data that correspond to the level in that factor which > >>>occurs the most times. > >> > >>So first you want to determine the mode (in the sense of the most > >>frequently occuring value) of the factor. One way to do this is > >> > >>names(which.max(table(fac))) > >> > >>Use this comparison for the subset as > >> > >>subset(data, pattern == names(which.max(table(pattern)))) > > > > > > Just be careful that if there are ties (i.e., more than one > level having the > > max) which.max() will randomly pick one of them. That may > or may not be > > what's desired. If that is a possibility, Mick will need > to think what he > > wants in such cases. > > According to the documentation it picks the first one. Also, that's > what Martin Maechler told me and he wrote the code so I trust him on > that. I figure that if you have to trust someone to be > meticulous and > precise then a German-speaking Swiss is a good choice.My apologies! I got it mixed up with max.col, which does the tie-breaking. Andy