typeof applied to a factor always seems to return "integer", independently of the type of the levels. This has a strange side effect. When a variable is "imported" into a data frame, its type changes. character variables automatically are converted to factors when imported into data frames. Here is an example: > v1<-1:3 > v2<-c("a","b","c") > df<-data.frame(v1,v2) > typeof(v2) [1] "character" > typeof(df$v2) [1] "integer" It is somewhat surprising that the types of v2 and df$v2 are different. the answer is to do levels(df$v2)[df$v2] but that is somewhat involved. Should the types not be identical, and typeof applied to factors return the type of the levels? -- Erich Neuwirth, Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-38624 Fax: +43-1-4277-9386
?data.frame says: Details: A data frame is a list of variables of the same length with unique row names, given class '"data.frame"'. 'data.frame' converts each of its arguments to a data frame by calling 'as.data.frame(optional=TRUE)'. As that is a generic function, methods can be written to change the behaviour of arguments according to their classes: R comes with many such methods. Character variables passed to 'data.frame' are converted to factor columns unless protected by 'I'. ... (Note that last sentence.) I believe that answers your question. Best, Andy> From: Erich Neuwirth > > typeof applied to a factor always seems to return "integer", > independently of the type of the levels. > This has a strange side effect. > When a variable is "imported" into a data frame, > its type changes. > character variables automatically are converted > to factors when imported into data frames. > > Here is an example: > > > v1<-1:3 > > v2<-c("a","b","c") > > df<-data.frame(v1,v2) > > typeof(v2) > [1] "character" > > typeof(df$v2) > [1] "integer" > > It is somewhat surprising that > the types of v2 and df$v2 are different. > > the answer is to do > levels(df$v2)[df$v2] > but that is somewhat involved. > > Should the types not be identical, and typeof applied to factors > return the type of the levels? > > > -- > Erich Neuwirth, Computer Supported Didactics Working Group > Visit our SunSITE at http://sunsite.univie.ac.at > Phone: +43-1-4277-38624 Fax: +43-1-4277-9386 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
In some cases it makes sense to store "character" variables as factors (integers with labels) since this can take up much less memory. If you really want to store `v2' as character, just do data.frame(v1, I(v2)) -roger Erich Neuwirth wrote:> typeof applied to a factor always seems to return "integer", > independently of the type of the levels. > This has a strange side effect. > When a variable is "imported" into a data frame, > its type changes. > character variables automatically are converted > to factors when imported into data frames. > > Here is an example: > > > v1<-1:3 > > v2<-c("a","b","c") > > df<-data.frame(v1,v2) > > typeof(v2) > [1] "character" > > typeof(df$v2) > [1] "integer" > > It is somewhat surprising that > the types of v2 and df$v2 are different. > > the answer is to do > levels(df$v2)[df$v2] > but that is somewhat involved. > > Should the types not be identical, and typeof applied to factors > return the type of the levels? > >
On Wed, 8 Sep 2004, Erich Neuwirth wrote:> typeof applied to a factor always seems to return "integer", > independently of the type of the levels.typeof is telling you the internal structure. From ?factor 'factor' returns an object of class '"factor"' which has a set of integer codes the length of 'x' with a '"levels"' attribute of mode 'character'. (Despite that, we don't enforce this and people have managed to create factors with non-integer numeric codes.) Now ?typeof says 'typeof' determines the (R internal) type or storage mode of any object and that is the "integer" as the codes are stored in an INTSXP. BTW, factors were an internal type long ago, and were one of the two unnamed types which appear in output from memory.profile().> This has a strange side effect.It's a very well documented feature of data.frame, as others have pointed out.> When a variable is "imported" into a data frame, > its type changes. > character variables automatically are converted > to factors when imported into data frames. > > Here is an example: > > > v1<-1:3 > > v2<-c("a","b","c") > > df<-data.frame(v1,v2) > > typeof(v2) > [1] "character" > > typeof(df$v2) > [1] "integer" > > It is somewhat surprising that > the types of v2 and df$v2 are different. > > the answer is to do > levels(df$v2)[df$v2] > but that is somewhat involved. > > Should the types not be identical, and typeof applied to factors > return the type of the levels? > > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
The simple answer to my problem (are the values of a vector numeric or not) is is.numeric, and that is enough for what I need right now. But this way I do not get an answer discriminating between integers and and doubles. What is the canonical way of getting the type of the values of a vector? Is there a better way than valtype<-function(x) typeof(ifelse(is.factor(x),levels(x),x)) -- Erich Neuwirth, Computer Supported Didactics Working Group Visit our SunSITE at http://sunsite.univie.ac.at Phone: +43-1-4277-38624 Fax: +43-1-4277-9386