Dear List, creating factors in a given non-default orders is notoriously difficult to explain in a course. Students love the ifelse construct given below most, but I remember some comment from Martin M?chler (?) that ifelse should be banned from courses. Any better idea? Not necessarily short, easy to remember is important. Dieter data = c(1,7,10,50,70) levs = c("Pre","Post") # Typical C-Programmer style factor(levs[as.integer(data >10)+1], levels=levs) # Easiest to understand factor(ifelse(data <=10, levs[1], levs[2]), levels=levs) -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html Sent from the R help mailing list archive at Nabble.com.
On Sep 30, 2009, at 3:43 AM, Dieter Menne wrote:> > Dear List, > > creating factors in a given non-default orders is notoriously > difficult to > explain in a course. Students love the ifelse construct given below > most, > but I remember some comment from Martin M?chler (?) that ifelse > should be > banned from courses. > > Any better idea? Not necessarily short, easy to remember is important. > > Dieter > > > data = c(1,7,10,50,70) > levs = c("Pre","Post") > > # Typical C-Programmer style > factor(levs[as.integer(data >10)+1], levels=levs)I agree with your observation that many people express a preference for the ifelse version. I had the same sort of comment on some of my Excel code (not in a statistical application) a couple of days ago. In your code the as.integer function is superfluous and you could argue that it might even be easier to understand for the Boolean- challenged masses if you substituted as.logical(). It would be also superfluous, but it might convey a message that the programmer _knew+ that the "+" operation is capable of doing the necessary coercion.> > # Easiest to understand > factor(ifelse(data <=10, levs[1], levs[2]), levels=levs) > > ---- Boole Rules David Winsemius, MD Heritage Laboratories West Hartford, CT
David Winsemius wrote:> > >> # Typical C-Programmer style >> factor(levs[as.integer(data >10)+1], levels=levs) > > In your code the as.integer function is superfluousOops... done too much c# lately, getting invalid cast challenged. Dieter -- View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html Sent from the R help mailing list archive at Nabble.com.
1. A common way of doing this is cut: > cut(data, c(-Inf, 10, Inf), lab = levs, right = TRUE) [1] Pre Pre Pre Post Post Levels: Pre Post We don't actually need right=TRUE as its the default but if you omit it then it can be hard to remember whether the right end of intervals are included or excluded in the subdivision so I would recommend including it as a matter of course. Slightly less safe but if you knew the values were integral then another approach that would allow dropping the right= argument would be to use 10.5 as the breakpoint in which case the setting of right= does not matter anyways. 2. Similar to cut is findInterval so the subscripting of your first solution could be done via findInterval: > levs[ findInterval(data, c(-Inf, 10), right = TRUE) ] [1] "Pre" "Pre" "Pre" "Post" "Post" The same comment regarding 10.5 applies. I've omitted the factor(...) part to focus on the difference and in the remaining examples have done that too. 3. Either of these could replace the ifelse. Both work by vectorizing an ordinary if but sapply is a more common way to do it so is likely preferable from the viewpoint of clarity. > # 3a > sapply(data, function(x) if (x <= 10) levs[1] else levs[2]) [1] "Pre" "Pre" "Pre" "Post" "Post" > # 3b > Vectorize(function(x) if (x <= 10) levs[1] else levs[2])(data) [1] "Pre" "Pre" "Pre" "Post" "Post" 4. The subscripting in your first solution could be done like this which is a bit longer but is arguably easier to understand: > levs[ 1 * (data <=10) + 2 * (data > 10) ] [1] "Pre" "Pre" "Pre" "Post" "Post" On Wed, Sep 30, 2009 at 3:43 AM, Dieter Menne <dieter.menne at menne-biomed.de> wrote:> > Dear List, > > creating factors in a given non-default orders is notoriously difficult to > explain in a course. Students love the ifelse construct given below most, > but I remember some comment from Martin M?chler (?) that ifelse should be > banned from courses. > > Any better idea? Not necessarily short, easy to remember is important. > > Dieter > > > data = c(1,7,10,50,70) > levs = c("Pre","Post") > > # Typical C-Programmer style > factor(levs[as.integer(data >10)+1], levels=levs) > > # Easiest to understand > factor(ifelse(data <=10, levs[1], levs[2]), levels=levs) > > -- > View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25676411.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
An extremely verbose, but (in my view) easy to understand approach is:> data.f <- data; data.f[which(data <= 10)] <- levs[1]; data.f[which(data > 10)] <- levs[2]; data.f <- factor(data.f)-Ista On Wed, Sep 30, 2009 at 8:31 AM, Dieter Menne <dieter.menne at menne-biomed.de> wrote:> > > > David Winsemius wrote: >> >> >>> # Typical C-Programmer style >>> factor(levs[as.integer(data >10)+1], levels=levs) >> >> In your code the as.integer function is superfluous > > Oops... done too much c# lately, getting invalid cast challenged. > > Dieter > > > -- > View this message in context: http://www.nabble.com/Condition-to-factor-%28easy-to-remember%29-tp25676411p25680111.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester http://yourpsyche.org
On Wed, Sep 30, 2009 at 2:43 AM, Dieter Menne <dieter.menne at menne-biomed.de> wrote:> Dear List,> creating factors in a given non-default orders is notoriously difficult to > explain in a course. Students love the ifelse construct given below most, > but I remember some comment from Martin M?chler (?) that ifelse should be > banned from courses.> Any better idea? Not necessarily short, easy to remember is important.> Dieter> data = c(1,7,10,50,70) > levs = c("Pre","Post") > > # Typical C-Programmer style > factor(levs[as.integer(data >10)+1], levels=levs) > > # Easiest to understand > factor(ifelse(data <=10, levs[1], levs[2]), levels=levs)Why not> factor(data > 10, labels = c("Pre", "Post"))[1] Pre Pre Pre Post Post Levels: Pre Post All you have to remember is that FALSE comes before TRUE.