Dear list, What is a better way relative to the one below to keep the order of factor levels created from cut()? Notice, I'm simply pasting letters to levels before converting to character so to keep the desired order of levels. This is not very elegant... I'm converting to character so I can call the helper fun with vapply() from the main fun. Removing this line of code " levels(xc) <- paste(letters[1:nlevels(xc)], levels(xc), sep=":")" would result in factor levels that are not ordered according to x1. set.seed(1) df <- data.frame(x1 = rnorm(1000), x2 = rnorm(1000)) main_fun <- function(data) { data.frame(vapply(data, helper_fun, character(nrow(df)))) } helper_fun <- function(x) { xc <- cut(x, breaks = unique(quantile(x, seq(0, 1, 1/10), na.rm = TRUE)), include.lowest = TRUE) levels(xc) <- paste(letters[1:nlevels(xc)], levels(xc), sep=":") as.character(xc) } res <- main_fun(df) levels(res$x1) levels(res$x1) [1] "a:[-3.01,-1.34]" "b:(-1.34,-0.882]" "c:(-0.882,-0.511]" "d:(-0.511,-0.296]" "e:(-0.296,-0.0353]" [6] "f:(-0.0353,0.245]" "g:(0.245,0.536]" "h:(0.536,0.854]" "i:(0.854,1.32]" "j:(1.32,3.81]">Thanks Leo. _______________________________________________________________________ If you received this email in error, please advise the sender (by return email or otherwise) immediately. You have consented to receive the attached electronically at the above-noted email address; please retain a copy of this confirmation for future reference. Si vous recevez ce courriel par erreur, veuillez en aviser l'exp?diteur imm?diatement, par retour de courriel ou par un autre moyen. Vous avez accept? de recevoir le(s) document(s) ci-joint(s) par voie ?lectronique ? l'adresse courriel indiqu?e ci-dessus; veuillez conserver une copie de cette confirmation pour les fins de reference future. [[alternative HTML version deleted]]
Don't use vapply() here - use lapply() instead and then leave cut's output alone. vapply() will combine its outputs to create a character matrix and data.frame will pull apart the character matrix into its columns. Skipping the matrix intermediary solves lots of issues. Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Jan 11, 2016 at 11:24 AM, Guelman, Leo <leo.guelman at rbc.com> wrote:> Dear list, > > What is a better way relative to the one below to keep the order of factor > levels created from cut()? Notice, I'm simply pasting letters to levels > before converting to character so to keep the desired order of levels. This > is not very elegant... I'm converting to character so I can call the helper > fun with vapply() from the main fun. > > Removing this line of code " levels(xc) <- paste(letters[1:nlevels(xc)], > levels(xc), sep=":")" would result in factor levels that are not ordered > according to x1. > > set.seed(1) > df <- data.frame(x1 = rnorm(1000), x2 = rnorm(1000)) > > main_fun <- function(data) { > data.frame(vapply(data, helper_fun, character(nrow(df)))) > } > > helper_fun <- function(x) { > xc <- cut(x, breaks = unique(quantile(x, seq(0, 1, 1/10), na.rm > TRUE)), > include.lowest = TRUE) > levels(xc) <- paste(letters[1:nlevels(xc)], levels(xc), sep=":") > as.character(xc) > > } > > > res <- main_fun(df) > levels(res$x1) > levels(res$x1) > [1] "a:[-3.01,-1.34]" "b:(-1.34,-0.882]" "c:(-0.882,-0.511]" > "d:(-0.511,-0.296]" "e:(-0.296,-0.0353]" > [6] "f:(-0.0353,0.245]" "g:(0.245,0.536]" "h:(0.536,0.854]" > "i:(0.854,1.32]" "j:(1.32,3.81]" > > > > Thanks > Leo. > > _______________________________________________________________________ > If you received this email in error, please advise the sender (by return > email or otherwise) immediately. You have consented to receive the attached > electronically at the above-noted email address; please retain a copy of > this confirmation for future reference. > > Si vous recevez ce courriel par erreur, veuillez en aviser l'exp?diteur > imm?diatement, par retour de courriel ou par un autre moyen. Vous avez > accept? de recevoir le(s) document(s) ci-joint(s) par voie ?lectronique ? > l'adresse courriel indiqu?e ci-dessus; veuillez conserver une copie de > cette confirmation pour les fins de reference future. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Here's a solution with dplyr my_cut <- function(x){ breaks <- quantile(x, seq(0, 1, by = 0.1)) y <- cut(x, breaks = breaks, include.lowest = TRUE) levels(y) <- paste(head(letters, length(breaks) - 1), levels(y), sep = ": ") return(y) } library(dplyr) mutate_each(df, funs = funs(my_cut)) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-01-11 20:34 GMT+01:00 William Dunlap via R-help <r-help at r-project.org>:> Don't use vapply() here - use lapply() instead and then leave cut's output > alone. > > vapply() will combine its outputs to create a character matrix and > data.frame will pull apart the character matrix into its columns. Skipping > the matrix intermediary solves > lots of issues. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Mon, Jan 11, 2016 at 11:24 AM, Guelman, Leo <leo.guelman at rbc.com> > wrote: > > > Dear list, > > > > What is a better way relative to the one below to keep the order of > factor > > levels created from cut()? Notice, I'm simply pasting letters to levels > > before converting to character so to keep the desired order of levels. > This > > is not very elegant... I'm converting to character so I can call the > helper > > fun with vapply() from the main fun. > > > > Removing this line of code " levels(xc) <- paste(letters[1:nlevels(xc)], > > levels(xc), sep=":")" would result in factor levels that are not ordered > > according to x1. > > > > set.seed(1) > > df <- data.frame(x1 = rnorm(1000), x2 = rnorm(1000)) > > > > main_fun <- function(data) { > > data.frame(vapply(data, helper_fun, character(nrow(df)))) > > } > > > > helper_fun <- function(x) { > > xc <- cut(x, breaks = unique(quantile(x, seq(0, 1, 1/10), na.rm > > TRUE)), > > include.lowest = TRUE) > > levels(xc) <- paste(letters[1:nlevels(xc)], levels(xc), sep=":") > > as.character(xc) > > > > } > > > > > > res <- main_fun(df) > > levels(res$x1) > > levels(res$x1) > > [1] "a:[-3.01,-1.34]" "b:(-1.34,-0.882]" "c:(-0.882,-0.511]" > > "d:(-0.511,-0.296]" "e:(-0.296,-0.0353]" > > [6] "f:(-0.0353,0.245]" "g:(0.245,0.536]" "h:(0.536,0.854]" > > "i:(0.854,1.32]" "j:(1.32,3.81]" > > > > > > > Thanks > > Leo. > > > > _______________________________________________________________________ > > If you received this email in error, please advise the sender (by return > > email or otherwise) immediately. You have consented to receive the > attached > > electronically at the above-noted email address; please retain a copy of > > this confirmation for future reference. > > > > Si vous recevez ce courriel par erreur, veuillez en aviser l'exp?diteur > > imm?diatement, par retour de courriel ou par un autre moyen. Vous avez > > accept? de recevoir le(s) document(s) ci-joint(s) par voie ?lectronique ? > > l'adresse courriel indiqu?e ci-dessus; veuillez conserver une copie de > > cette confirmation pour les fins de reference future. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
I left out the example:> set.seed(1) > df <- data.frame(x1 = rpois(1000,4), x2 = rpois(1000,8)) > helper_fun <- function(x) {+ cut(x, breaks = unique(quantile(x, seq(0, 1, 1/10), na.rm = TRUE)), + include.lowest = TRUE) + }> df2 <- data.frame(lapply(df, helper_fun)) > lapply(df2, levels)$x1 [1] "[0,2]" "(2,3]" "(3,4]" "(4,5]" "(5,6]" "(6,7]" "(7,14]" $x2 [1] "[1,4]" "(4,5]" "(5,6]" "(6,7]" "(7,8]" "(8,9]" "(9,10]" [8] "(10,12]" "(12,18]" Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Jan 11, 2016 at 11:34 AM, William Dunlap <wdunlap at tibco.com> wrote:> Don't use vapply() here - use lapply() instead and then leave cut's output > alone. > > vapply() will combine its outputs to create a character matrix and > data.frame will pull apart the character matrix into its columns. Skipping > the matrix intermediary solves > lots of issues. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Mon, Jan 11, 2016 at 11:24 AM, Guelman, Leo <leo.guelman at rbc.com> > wrote: > >> Dear list, >> >> What is a better way relative to the one below to keep the order of >> factor levels created from cut()? Notice, I'm simply pasting letters to >> levels before converting to character so to keep the desired order of >> levels. This is not very elegant... I'm converting to character so I can >> call the helper fun with vapply() from the main fun. >> >> Removing this line of code " levels(xc) <- paste(letters[1:nlevels(xc)], >> levels(xc), sep=":")" would result in factor levels that are not ordered >> according to x1. >> >> set.seed(1) >> df <- data.frame(x1 = rnorm(1000), x2 = rnorm(1000)) >> >> main_fun <- function(data) { >> data.frame(vapply(data, helper_fun, character(nrow(df)))) >> } >> >> helper_fun <- function(x) { >> xc <- cut(x, breaks = unique(quantile(x, seq(0, 1, 1/10), na.rm >> TRUE)), >> include.lowest = TRUE) >> levels(xc) <- paste(letters[1:nlevels(xc)], levels(xc), sep=":") >> as.character(xc) >> >> } >> >> >> res <- main_fun(df) >> levels(res$x1) >> levels(res$x1) >> [1] "a:[-3.01,-1.34]" "b:(-1.34,-0.882]" "c:(-0.882,-0.511]" >> "d:(-0.511,-0.296]" "e:(-0.296,-0.0353]" >> [6] "f:(-0.0353,0.245]" "g:(0.245,0.536]" "h:(0.536,0.854]" >> "i:(0.854,1.32]" "j:(1.32,3.81]" >> > >> >> Thanks >> Leo. >> >> _______________________________________________________________________ >> If you received this email in error, please advise the sender (by return >> email or otherwise) immediately. You have consented to receive the attached >> electronically at the above-noted email address; please retain a copy of >> this confirmation for future reference. >> >> Si vous recevez ce courriel par erreur, veuillez en aviser l'exp?diteur >> imm?diatement, par retour de courriel ou par un autre moyen. Vous avez >> accept? de recevoir le(s) document(s) ci-joint(s) par voie ?lectronique ? >> l'adresse courriel indiqu?e ci-dessus; veuillez conserver une copie de >> cette confirmation pour les fins de reference future. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]