Hi R-users,
I have a simple question for R heavy users. If I have a data frame like this
dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr <- dfr[order(dfr$categ),]
and I want to score values or points in variable named "var3"
following this
kind of logic:
1. the highest value of var3 within category (variable named "categ")
->
"high"
2. the second highest value -> "mid"
3. lowest value -> "low"
This would be the output of this reasoning:
dfr$score <-
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
dfr
The question is how I do this programmatically in R (i.e. if I have 2000
rows in my dfr)?
I appreciate your help!
Cheers,
Lauri
[[alternative HTML version deleted]]
try this:
dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr <- dfr[order(dfr$categ), ]
dfr$score <- unlist(tapply(dfr$var3, dfr$categ, function (x) {
sn <- sort(unique(x), decreasing = TRUE)
labs <- c("high", "mid", rep("low",
length(sn) - 2))
labs[match(x, sn)]
}))
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Lauri Nikkinen" <lauri.nikkinen at iki.fi>
To: <r-help at stat.math.ethz.ch>
Sent: Friday, May 18, 2007 3:15 PM
Subject: [R] Simple programming question
> Hi R-users,
>
> I have a simple question for R heavy users. If I have a data frame
> like this
>
>
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
>
> and I want to score values or points in variable named "var3"
> following this
> kind of logic:
>
> 1. the highest value of var3 within category (variable named
> "categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
>
> This would be the output of this reasoning:
>
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
>
> The question is how I do this programmatically in R (i.e. if I have
> 2000
> rows in my dfr)?
>
> I appreciate your help!
>
> Cheers,
> Lauri
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Try this. f assigns 1, 2 and 3 to the highest, second highest and third highest
within a category. ave applies f to each category. Finally we convert it to a
factor.
f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))
On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi>
wrote:> Hi R-users,
>
> I have a simple question for R heavy users. If I have a data frame like
this
>
>
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
>
> and I want to score values or points in variable named "var3"
following this
> kind of logic:
>
> 1. the highest value of var3 within category (variable named
"categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
>
> This would be the output of this reasoning:
>
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
>
> The question is how I do this programmatically in R (i.e. if I have 2000
> rows in my dfr)?
>
> I appreciate your help!
>
> Cheers,
> Lauri
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
According to your post you are assuming that there are only 3 unique
values for var3 within each category. But category C and D have 4 unique
values for var3.
split(dfr, dfr$categ)
...
$C
id categ var3 score
3 3 C 6 high
7 7 C 5 mid
11 11 C 3 low
15 15 C 1 low
...
If you meant something different, then just change myfun() below
gmax <- function(x, rnk=1){
## generalized maximum with rnk=1 being the bigest value (i.e. max)
return( sort( unique(x), decreasing=T )[rnk] )
}
myfun <- function(x){ ifelse( x==gmax(x,1), "high",
ifelse( x==gmax(x,2), "med",
"low" ) ) }
out <- lapply( split(dfr$var3, dfr$categ), myfun )
data.frame( dfr, my.score = unsplit(out, dfr$categ) )
Regards, Adai
Lauri Nikkinen wrote:> Hi R-users,
>
> I have a simple question for R heavy users. If I have a data frame like
this
>
>
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
>
> and I want to score values or points in variable named "var3"
following this
> kind of logic:
>
> 1. the highest value of var3 within category (variable named
"categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
>
> This would be the output of this reasoning:
>
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
>
> The question is how I do this programmatically in R (i.e. if I have 2000
> rows in my dfr)?
>
> I appreciate your help!
>
> Cheers,
> Lauri
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
There was a problem in the first line in the case that the highest number
is not unique within a category. In this example its not apparent since
that never occurs. At any rate, it should be:
f <- function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))
Also note that the factor labels were arranged so that
"low", "mid" and "high" correspond to levels 1, 2
and 3
respectively.
On 5/18/07, Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:> Try this. f assigns 1, 2 and 3 to the highest, second highest and third
highest
> within a category. ave applies f to each category. Finally we convert it
to a
> factor.
>
> f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
> factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))
>
>
>
> On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi> wrote:
> > Hi R-users,
> >
> > I have a simple question for R heavy users. If I have a data frame
like this
> >
> >
> > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> > dfr <- dfr[order(dfr$categ),]
> >
> > and I want to score values or points in variable named
"var3" following this
> > kind of logic:
> >
> > 1. the highest value of var3 within category (variable named
"categ") ->
> > "high"
> > 2. the second highest value -> "mid"
> > 3. lowest value -> "low"
> >
> > This would be the output of this reasoning:
> >
> > dfr$score <-
> >
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> > dfr
> >
> > The question is how I do this programmatically in R (i.e. if I have
2000
> > rows in my dfr)?
> >
> > I appreciate your help!
> >
> > Cheers,
> > Lauri
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>