Hi R-users, I have a simple question for R heavy users. If I have a data frame like this dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr <- dfr[order(dfr$categ),] and I want to score values or points in variable named "var3" following this kind of logic: 1. the highest value of var3 within category (variable named "categ") -> "high" 2. the second highest value -> "mid" 3. lowest value -> "low" This would be the output of this reasoning: dfr$score <- factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low")) dfr The question is how I do this programmatically in R (i.e. if I have 2000 rows in my dfr)? I appreciate your help! Cheers, Lauri [[alternative HTML version deleted]]
try this: dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) dfr <- dfr[order(dfr$categ), ] dfr$score <- unlist(tapply(dfr$var3, dfr$categ, function (x) { sn <- sort(unique(x), decreasing = TRUE) labs <- c("high", "mid", rep("low", length(sn) - 2)) labs[match(x, sn)] })) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Lauri Nikkinen" <lauri.nikkinen at iki.fi> To: <r-help at stat.math.ethz.ch> Sent: Friday, May 18, 2007 3:15 PM Subject: [R] Simple programming question> Hi R-users, > > I have a simple question for R heavy users. If I have a data frame > like this > > > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) > dfr <- dfr[order(dfr$categ),] > > and I want to score values or points in variable named "var3" > following this > kind of logic: > > 1. the highest value of var3 within category (variable named > "categ") -> > "high" > 2. the second highest value -> "mid" > 3. lowest value -> "low" > > This would be the output of this reasoning: > > dfr$score <- > factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low")) > dfr > > The question is how I do this programmatically in R (i.e. if I have > 2000 > rows in my dfr)? > > I appreciate your help! > > Cheers, > Lauri > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest within a category. ave applies f to each category. Finally we convert it to a factor. f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low", "mid", "high")) On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi> wrote:> Hi R-users, > > I have a simple question for R heavy users. If I have a data frame like this > > > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) > dfr <- dfr[order(dfr$categ),] > > and I want to score values or points in variable named "var3" following this > kind of logic: > > 1. the highest value of var3 within category (variable named "categ") -> > "high" > 2. the second highest value -> "mid" > 3. lowest value -> "low" > > This would be the output of this reasoning: > > dfr$score <- > factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low")) > dfr > > The question is how I do this programmatically in R (i.e. if I have 2000 > rows in my dfr)? > > I appreciate your help! > > Cheers, > Lauri > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
According to your post you are assuming that there are only 3 unique values for var3 within each category. But category C and D have 4 unique values for var3. split(dfr, dfr$categ) ... $C id categ var3 score 3 3 C 6 high 7 7 C 5 mid 11 11 C 3 low 15 15 C 1 low ... If you meant something different, then just change myfun() below gmax <- function(x, rnk=1){ ## generalized maximum with rnk=1 being the bigest value (i.e. max) return( sort( unique(x), decreasing=T )[rnk] ) } myfun <- function(x){ ifelse( x==gmax(x,1), "high", ifelse( x==gmax(x,2), "med", "low" ) ) } out <- lapply( split(dfr$var3, dfr$categ), myfun ) data.frame( dfr, my.score = unsplit(out, dfr$categ) ) Regards, Adai Lauri Nikkinen wrote:> Hi R-users, > > I have a simple question for R heavy users. If I have a data frame like this > > > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) > dfr <- dfr[order(dfr$categ),] > > and I want to score values or points in variable named "var3" following this > kind of logic: > > 1. the highest value of var3 within category (variable named "categ") -> > "high" > 2. the second highest value -> "mid" > 3. lowest value -> "low" > > This would be the output of this reasoning: > > dfr$score <- > factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low")) > dfr > > The question is how I do this programmatically in R (i.e. if I have 2000 > rows in my dfr)? > > I appreciate your help! > > Cheers, > Lauri > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >
There was a problem in the first line in the case that the highest number is not unique within a category. In this example its not apparent since that never occurs. At any rate, it should be: f <- function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE))) factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low", "mid", "high")) Also note that the factor labels were arranged so that "low", "mid" and "high" correspond to levels 1, 2 and 3 respectively. On 5/18/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest > within a category. ave applies f to each category. Finally we convert it to a > factor. > > f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE))) > factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low", "mid", "high")) > > > > On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi> wrote: > > Hi R-users, > > > > I have a simple question for R heavy users. If I have a data frame like this > > > > > > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4), > > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1)) > > dfr <- dfr[order(dfr$categ),] > > > > and I want to score values or points in variable named "var3" following this > > kind of logic: > > > > 1. the highest value of var3 within category (variable named "categ") -> > > "high" > > 2. the second highest value -> "mid" > > 3. lowest value -> "low" > > > > This would be the output of this reasoning: > > > > dfr$score <- > > factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low")) > > dfr > > > > The question is how I do this programmatically in R (i.e. if I have 2000 > > rows in my dfr)? > > > > I appreciate your help! > > > > Cheers, > > Lauri > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >