Tal Galili
2012-Oct-22 13:46 UTC
[R] What is behind class coercion of a factor into a character
Hello all, Please review the following simple code: # make a factor: x <- factor(c("one", "two")) # what should be the output to the following expression? c(x, "3") # <=== ???? # I expected it to be as the output of: c(as.character(x), "3") # But in fact, the output is what would happen if we had ran the next line: c(as.character(as.numeric(x)), "3") # p.s: c(x, 3) would of course behave differently... I imagine the above behavior is a "feature" (not a bug), but I am curious as to what is the rational behind it. Is it because of computational efficiency, or something that fixes some case study? Thanks, Tal [[alternative HTML version deleted]]
Bert Gunter
2012-Oct-22 13:58 UTC
[R] What is behind class coercion of a factor into a character
Tal: There was a recent discussion on this list about this (Sam Steingold was the OP IIRC). The issue is ?c . In particular: "c is sometimes used for its side effect of removing attributes except names, for example to turn an array into a vector." Hence, the factor attribute is removed and you get what you saw. As regards it's "rationale," you may find Bill Dunlap's comments on "c()'s unfortunate history" relevant. The problem with factors is "what should concatenation do, anyway?" If a <- factor(c("x", "y")) and b <- factor(c("y", "z")), what should c(a,b) be? -- There is no reason to assume that the "y" in a is the same as the "y" in b! Cheers, Bert On Mon, Oct 22, 2012 at 6:46 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hello all, > > Please review the following simple code: > > # make a factor: > x <- factor(c("one", "two")) > # what should be the output to the following expression? > c(x, "3") # <=== ???? > # I expected it to be as the output of: > c(as.character(x), "3") > # But in fact, the output is what would happen if we had ran the > next line: > c(as.character(as.numeric(x)), "3") > # p.s: c(x, 3) would of course behave differently... > > I imagine the above behavior is a "feature" (not a bug), but I am curious > as to what is the rational behind it. Is it because of computational > efficiency, or something that fixes some case study? > > Thanks, > Tal > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Apparently Analagous Threads
- Unexplained behavior of level names when using ordered factors in lm?
- question of merging two dataframes
- how to concatenate factor vectors?
- Why can't "apply" be used with "as.factor" on a data.frame ?
- Why do we have to turn factors into characters for various functions?