thr3ads.net - R help - [R] Simple programming question [May 2007]

If this information is useful, please help other people find it:
Share via:

Lauri Nikkinen

2007-May-18 13:15 UTC

[R] Simple programming question

Hi R-users,

I have a simple question for R heavy users. If I have a data frame like this


dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr <- dfr[order(dfr$categ),]

and I want to score values or points in variable named "var3"
following this
kind of logic:

1. the highest value of var3 within category (variable named "categ")
->
"high"
2. the second highest value -> "mid"
3. lowest value -> "low"

This would be the output of this reasoning:

dfr$score <-
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
dfr

The question is how I do this programmatically in R (i.e. if I have 2000
rows in my dfr)?

I appreciate your help!

Cheers,
Lauri

	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2007-May-18 13:37 UTC

head link

[R] Simple programming question

try this:

dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
    var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr <- dfr[order(dfr$categ), ]

dfr$score <- unlist(tapply(dfr$var3, dfr$categ, function (x) {
    sn <- sort(unique(x), decreasing = TRUE)
    labs <- c("high", "mid", rep("low",
length(sn) - 2))
    labs[match(x, sn)]
}))


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Lauri Nikkinen" <lauri.nikkinen at iki.fi>
To: <r-help at stat.math.ethz.ch>
Sent: Friday, May 18, 2007 3:15 PM
Subject: [R] Simple programming question

> Hi R-users,
>
> I have a simple question for R heavy users. If I have a data frame 
> like this
>
>
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
>
> and I want to score values or points in variable named "var3" 
> following this
> kind of logic:
>
> 1. the highest value of var3 within category (variable named 
> "categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
>
> This would be the output of this reasoning:
>
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
>
> The question is how I do this programmatically in R (i.e. if I have 
> 2000
> rows in my dfr)?
>
> I appreciate your help!
>
> Cheers,
> Lauri
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Gabor Grothendieck

2007-May-18 14:09 UTC

head link

[R] Simple programming question

Try this.  f assigns 1, 2 and 3 to the highest, second highest and third highest
within a category.  ave applies f to each category.  Finally we convert it to a
factor.

f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))



On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi>
wrote:> Hi R-users,
>
> I have a simple question for R heavy users. If I have a data frame like
this
>
>
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
>
> and I want to score values or points in variable named "var3"
following this
> kind of logic:
>
> 1. the highest value of var3 within category (variable named
"categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
>
> This would be the output of this reasoning:
>
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
>
> The question is how I do this programmatically in R (i.e. if I have 2000
> rows in my dfr)?
>
> I appreciate your help!
>
> Cheers,
> Lauri
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Adaikalavan Ramasamy

2007-May-18 14:23 UTC

head link

[R] Simple programming question

According to your post you are assuming that there are only 3 unique 
values for var3 within each category. But category C and D have 4 unique 
values for var3.

    split(dfr, dfr$categ)
    ...
    $C
       id categ var3 score
    3   3     C    6  high
    7   7     C    5   mid
    11 11     C    3   low
    15 15     C    1   low
    ...

If you meant something different, then just change myfun() below


  gmax <- function(x, rnk=1){
   ## generalized maximum with rnk=1 being the bigest value (i.e. max)
   return( sort( unique(x), decreasing=T )[rnk] )
  }

  myfun <- function(x){ ifelse( x==gmax(x,1), "high",
                                ifelse( x==gmax(x,2), "med",
"low" ) ) }

  out   <- lapply( split(dfr$var3, dfr$categ), myfun )

  data.frame( dfr, my.score = unsplit(out, dfr$categ) )

Regards, Adai



Lauri Nikkinen wrote:> Hi R-users,
> 
> I have a simple question for R heavy users. If I have a data frame like
this
> 
> 
> dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> dfr <- dfr[order(dfr$categ),]
> 
> and I want to score values or points in variable named "var3"
following this
> kind of logic:
> 
> 1. the highest value of var3 within category (variable named
"categ") ->
> "high"
> 2. the second highest value -> "mid"
> 3. lowest value -> "low"
> 
> This would be the output of this reasoning:
> 
> dfr$score <-
>
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> dfr
> 
> The question is how I do this programmatically in R (i.e. if I have 2000
> rows in my dfr)?
> 
> I appreciate your help!
> 
> Cheers,
> Lauri
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>

Gabor Grothendieck

2007-May-18 14:31 UTC

head link

[R] Simple programming question

There was a problem in the first line in the case that the highest number
is not unique within a category.   In this example its not apparent since
that never occurs.  At any rate, it should be:

f <- function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))

Also note that the factor labels were arranged so that
"low", "mid" and "high" correspond to levels 1, 2
and 3
respectively.

On 5/18/07, Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:> Try this.  f assigns 1, 2 and 3 to the highest, second highest and third
highest
> within a category.  ave applies f to each category.  Finally we convert it
to a
> factor.
>
> f <- function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
> factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c("low",
"mid", "high"))
>
>
>
> On 5/18/07, Lauri Nikkinen <lauri.nikkinen at iki.fi> wrote:
> > Hi R-users,
> >
> > I have a simple question for R heavy users. If I have a data frame
like this
> >
> >
> > dfr <- data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
> > var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
> > dfr <- dfr[order(dfr$categ),]
> >
> > and I want to score values or points in variable named
"var3" following this
> > kind of logic:
> >
> > 1. the highest value of var3 within category (variable named
"categ") ->
> > "high"
> > 2. the second highest value -> "mid"
> > 3. lowest value -> "low"
> >
> > This would be the output of this reasoning:
> >
> > dfr$score <-
> >
factor(c("high","mid","low","low","high","mid","mid","low","high","mid","low","low","high","mid","low","low"))
> > dfr
> >
> > The question is how I do this programmatically in R (i.e. if I have
2000
> > rows in my dfr)?
> >
> > I appreciate your help!
> >
> > Cheers,
> > Lauri
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

Maybe Matching Threads

Search for more apparently analagous threads

R help - May 2007 - Simple programming question

[R] Simple programming question

[R] Simple programming question

[R] Simple programming question

[R] Simple programming question

[R] Simple programming question

Maybe Matching Threads