Michael Rennie
2007-Feb-01 17:13 UTC
[R] Losing factor levels when moving variables from one context to another
Hi, there I'm currently trying to figure out how to keep my "factor" levels for a variable when moving it from one data frame or matrix to another. Example below: vec1<-(rep("10",5)) vec2<-(rep("30",5)) vec3<-(rep("80",5)) vecs<-c(vec1, vec2, vec3) resp<-rnorm(2,15) dat<-as.data.frame(cbind(resp, vecs)) dat$vecs<-factor(dat$vecs) dat R returns: resp vecs 1 1.57606068767956 10 2 2.30271782269308 10 3 2.39874788444542 10 4 0.963987738423353 10 5 2.03620782454740 10 6 -0.0706713324725649 30 7 1.49001721222926 30 8 2.00587718501980 30 9 0.450576585429981 30 10 2.87120375367357 30 11 2.25575058079324 80 12 2.03471288724508 80 13 2.67432066972984 80 14 1.74102136279177 80 15 2.29827581276955 80 and now: newvar<-(rnorm(15,4)) newdat<-as.data.frame(cbind(newvar, dat$vecs)) newdat R returns: newvar V2 1 4.300788 1 2 5.295951 1 3 5.099849 1 4 3.211045 1 5 3.703554 1 6 3.693826 2 7 5.314679 2 8 4.222270 2 9 3.534515 2 10 4.037401 2 11 4.476808 3 12 4.842449 3 13 3.109677 3 14 4.752961 3 15 4.445216 3 > I seem to have lost everything I once has associated with "vecs", and it's turned my actual values into arbitrary groupings. I assume this has something to do with the behaviour of factors? Does anyone have any suggestions on how to get my original levels, etc., back? Cheers, Mike Michael Rennie Ph.D. Candidate, University of Toronto at Mississauga 3359 Mississauga Rd. N. Mississauga, ON L5L 1C6 Ph: 905-828-5452 Fax: 905-828-3792 www.utm.utoronto.ca/~w3rennie
Chuck Cleland
2007-Feb-01 17:34 UTC
[R] Losing factor levels when moving variables from one context to another
Michael Rennie wrote:> Hi, there > > I'm currently trying to figure out how to keep my "factor" levels for a > variable when moving it from one data frame or matrix to another. > > Example below: > > vec1<-(rep("10",5)) > vec2<-(rep("30",5)) > vec3<-(rep("80",5)) > vecs<-c(vec1, vec2, vec3) > > resp<-rnorm(2,15) > > dat<-as.data.frame(cbind(resp, vecs)) > dat$vecs<-factor(dat$vecs) > dat > > R returns: > resp vecs > 1 1.57606068767956 10 > 2 2.30271782269308 10 > 3 2.39874788444542 10 > 4 0.963987738423353 10 > 5 2.03620782454740 10 > 6 -0.0706713324725649 30 > 7 1.49001721222926 30 > 8 2.00587718501980 30 > 9 0.450576585429981 30 > 10 2.87120375367357 30 > 11 2.25575058079324 80 > 12 2.03471288724508 80 > 13 2.67432066972984 80 > 14 1.74102136279177 80 > 15 2.29827581276955 80 > > and now: > > newvar<-(rnorm(15,4)) > newdat<-as.data.frame(cbind(newvar, dat$vecs)) > newdat > > R returns: > > newvar V2 > 1 4.300788 1 > 2 5.295951 1 > 3 5.099849 1 > 4 3.211045 1 > 5 3.703554 1 > 6 3.693826 2 > 7 5.314679 2 > 8 4.222270 2 > 9 3.534515 2 > 10 4.037401 2 > 11 4.476808 3 > 12 4.842449 3 > 13 3.109677 3 > 14 4.752961 3 > 15 4.445216 3 > > > > I seem to have lost everything I once has associated with "vecs", and it's > turned my actual values into arbitrary groupings. > > I assume this has something to do with the behaviour of factors? Does > anyone have any suggestions on how to get my original levels, etc., back?It has more to do with the behavior of cbind(). Construct the data frame with data.frame() rather than the combination of as.data.frame() and cbind(). For example: vec1 <- (rep("10",2)) vec2 <- (rep("30",2)) vec3 <- (rep("80",2)) vecs <- c(vec1, vec2, vec3) resp <- rnorm(6,2) dat <- data.frame(resp, vecs) dat$vecs <- factor(dat$vecs) dat resp vecs 1 2.795851 10 2 3.673296 10 3 1.731921 30 4 1.172945 30 5 2.427164 80 6 1.470758 80 newvar <- (rnorm(6,4)) newdat <- data.frame(newvar, dat$vecs) newdat newvar dat.vecs 1 6.389386 10 2 3.453535 10 3 3.807821 30 4 6.067712 30 5 4.978724 80 6 3.015975 80 ?data.frame> Cheers, > > Mike > > Michael Rennie > Ph.D. Candidate, University of Toronto at Mississauga > 3359 Mississauga Rd. N. > Mississauga, ON L5L 1C6 > Ph: 905-828-5452 Fax: 905-828-3792 > www.utm.utoronto.ca/~w3rennie > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Marc Schwartz
2007-Feb-01 17:51 UTC
[R] Losing factor levels when moving variables from one context to another
On Thu, 2007-02-01 at 12:13 -0500, Michael Rennie wrote:> Hi, there > > I'm currently trying to figure out how to keep my "factor" levels for a > variable when moving it from one data frame or matrix to another. > > Example below: > > vec1<-(rep("10",5)) > vec2<-(rep("30",5)) > vec3<-(rep("80",5)) > vecs<-c(vec1, vec2, vec3) > > resp<-rnorm(2,15) > > dat<-as.data.frame(cbind(resp, vecs)) > dat$vecs<-factor(dat$vecs) > dat > > R returns: > resp vecs > 1 1.57606068767956 10 > 2 2.30271782269308 10 > 3 2.39874788444542 10 > 4 0.963987738423353 10 > 5 2.03620782454740 10 > 6 -0.0706713324725649 30 > 7 1.49001721222926 30 > 8 2.00587718501980 30 > 9 0.450576585429981 30 > 10 2.87120375367357 30 > 11 2.25575058079324 80 > 12 2.03471288724508 80 > 13 2.67432066972984 80 > 14 1.74102136279177 80 > 15 2.29827581276955 80 > > and now: > > newvar<-(rnorm(15,4)) > newdat<-as.data.frame(cbind(newvar, dat$vecs)) > newdat > > R returns: > > newvar V2 > 1 4.300788 1 > 2 5.295951 1 > 3 5.099849 1 > 4 3.211045 1 > 5 3.703554 1 > 6 3.693826 2 > 7 5.314679 2 > 8 4.222270 2 > 9 3.534515 2 > 10 4.037401 2 > 11 4.476808 3 > 12 4.842449 3 > 13 3.109677 3 > 14 4.752961 3 > 15 4.445216 3 > > > > I seem to have lost everything I once has associated with "vecs", and it's > turned my actual values into arbitrary groupings. > > I assume this has something to do with the behaviour of factors? Does > anyone have any suggestions on how to get my original levels, etc., back? > > Cheers, > > MikeMike, The problem (specific to your example) is that you are using as.data.frame() and cbind(), which will first coerce the columns to a common data type, create a matrix and then coerce the matrix to a dataframe. Thus, in the second case, your factor dat$vecs is first being coerced to its numeric equivalent values, rather then being retained as a factor, since a matrix can contain only one data type and the first column is numeric. Try this instead: vec1<-(rep("10", 5)) vec2<-(rep("30", 5)) vec3<-(rep("80", 5)) vecs<-c(vec1, vec2, vec3) set.seed(1) resp<-rnorm(15, 2) dat <- data.frame(resp, vecs)> str(dat)'data.frame': 15 obs. of 2 variables: $ resp: num 1.37 2.18 1.16 3.60 2.33 ... $ vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 .. set.seed(2) newvar <- rnorm(15, 4) newdat <- data.frame(newvar, dat$vecs)> str(newdat)'data.frame': 15 obs. of 2 variables: $ newvar : num 3.10 4.18 5.59 2.87 3.92 ... $ dat.vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 ...> all(levels(newdat$dat.vecs) == levels(dat$vecs))[1] TRUE BTW, there may very well be times when you are combining two factors together and need to ensure that the factor levels either are intentionally different or need to "relevel" the combined factors into common levels. See the Warning and other information in ?factor. This would be critical, for example, if you are combining data sets to then run modeling functions on the combined data sets. HTH, Marc Schwartz