Dear list, I have a vector (array, table row, whatever is best) of frequency values for categories (or bins), and I need to find the median category. Trivial to do by hand, but I was wondering if there is a means to do it in R in an elegant way. The obvious medioan(vector) returns the median frequency for the binns, and that is not what I want. i.e,: freq cat1 1 cat2 10 cat3 100 cat4 1000 cat5 10000 I want it to return cat5, instead of cat3. Thanks a lot Martin
Martin Tomko wrote:> Dear list, > I have a vector (array, table row, whatever is best) of frequency values > for categories (or bins), and I need to find the median category. > Trivial to do by hand, but I was wondering if there is a means to do it > in R in an elegant way. > > The obvious medioan(vector) returns the median frequency for the binns, > and that is not what I want. i.e,: > freq > cat1 1 > cat2 10 > cat3 100 > cat4 1000 > cat5 10000 > > I want it to return cat5, instead of cat3.df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")), freq = c(1,10,100,1000,10000)) df binname freq 1 cat1 1 2 cat2 10 3 cat3 100 4 cat4 1000 5 cat5 10000 with(df, levels(binname)[median(rep(as.numeric(binname), freq))]) [1] "cat5"> Thanks a lot > Martin > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Thank you, Chuck, would you mind commenting a bit on the code, it is not all clear... HOw would you go to retrieve only the numeric value (not the category name)? I am just starting with R, and the functionality of replicate and levels is not quite clear. I tried the documentation, but am not any wiser. What if I had a vector v <- vector(c(1,10,100,1000,10000)) and wanted to perform it on that? Thanks a lot Martin Chuck Cleland wrote:> Martin Tomko wrote: >> Dear list, >> I have a vector (array, table row, whatever is best) of frequency values >> for categories (or bins), and I need to find the median category. >> Trivial to do by hand, but I was wondering if there is a means to do it >> in R in an elegant way. >> >> The obvious medioan(vector) returns the median frequency for the binns, >> and that is not what I want. i.e,: >> freq >> cat1 1 >> cat2 10 >> cat3 100 >> cat4 1000 >> cat5 10000 >> >> I want it to return cat5, instead of cat3. > > df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")), > freq = c(1,10,100,1000,10000)) > > df > binname freq > 1 cat1 1 > 2 cat2 10 > 3 cat3 100 > 4 cat4 1000 > 5 cat5 10000 > > with(df, levels(binname)[median(rep(as.numeric(binname), freq))]) > [1] "cat5" > >> Thanks a lot >> Martin >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >-- Martin Tomko Postdoctoral Research Assistant Geographic Information Systems Division Department of Geography University of Zurich - Irchel Winterthurerstr. 190 CH-8057 Zurich, Switzerland email: martin.tomko at geo.uzh.ch site: http://www.geo.uzh.ch/~mtomko mob: +41-788 629 558 tel: +41-44-6355256 fax: +41-44-6356848
Alternatively levels(df$binname)[which(df$freq >0.5*cumsum(df$freq)[nrow(df)])[1]] --- Chuck Cleland <ccleland at optonline.net> wrote:> Martin Tomko wrote: > > Dear list, > > I have a vector (array, table row, whatever is > best) of frequency values > > for categories (or bins), and I need to find the > median category. > > Trivial to do by hand, but I was wondering if > there is a means to do it > > in R in an elegant way. > > > > The obvious medioan(vector) returns the median > frequency for the binns, > > and that is not what I want. i.e,: > > freq > > cat1 1 > > cat2 10 > > cat3 100 > > cat4 1000 > > cat5 10000 > > > > I want it to return cat5, instead of cat3. > > df <- data.frame(binname = as.factor(paste("cat", > 1:5, sep="")), > freq = c(1,10,100,1000,10000)) > > df > binname freq > 1 cat1 1 > 2 cat2 10 > 3 cat3 100 > 4 cat4 1000 > 5 cat5 10000 > > with(df, > levels(binname)[median(rep(as.numeric(binname), > freq))]) > [1] "cat5" > > > Thanks a lot > > Martin > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > -- > Chuck Cleland, Ph.D. > NDRI, Inc. > 71 West 23rd Street, 8th floor > New York, NY 10010 > tel: (212) 845-4495 (Tu, Th) > tel: (732) 512-0171 (M, W, F) > fax: (917) 438-0894 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >