Dear all, I try for long to understand exactly what is the factor type and especially how it works, but it seems too difficult for me.... I read paragraphs about it, and I understand quite well what it is (I think) but I still can't figure how to deal with. Especially these 2 mysteries (for me) : 1st when I make a dataframe (with the as.data.frame() or the data.frame() commands) from vectors, it seems that some "columns" of the dataframe (which where vectors) are factors and some not, but I didn't find an explanation for which become factor and which don't. (I know I can use I() to avoid the factor transformaton but I think it is not an optimal solution to avoid the factor type just because I don't kno how to deal with) 2d I can't manage to deal with factors, so when I have some, I transform them in vectors (with levels()), but I think I miss the power and utility of the factor type ? Any help GREATLY appreciated, best regards, Florence. [[alternative HTML version deleted]]
On 10/6/2005 9:14 AM, Florence Combes wrote:> Dear all, > > I try for long to understand exactly what is the factor type and especially > how it works, but it seems too difficult for me.... > I read paragraphs about it, and I understand quite well what it is (I think) > but I still can't figure how to deal with. > Especially these 2 mysteries (for me) : > > 1st when I make a dataframe (with the as.data.frame() or the data.frame() > commands) from vectors, it seems that some "columns" of the dataframe (which > where vectors) are factors and some not, but I didn't find an explanation > for which become factor and which don't. > (I know I can use I() to avoid the factor transformaton but I think it is > not an optimal solution to avoid the factor type just because I don't kno > how to deal with)This is described in the ?data.frame man page: "Character variables passed to 'data.frame' are converted to factor columns unless protected by 'I'."> 2d I can't manage to deal with factors, so when I have some, I transform > them in vectors (with levels()), but I think I miss the power and utility of > the factor type ?levels() is not the conversion you want. That lists all the levels, but it doesn't tell you how they correspond to individual observations. For example, > df <- data.frame(x=1:3, y=c('a','b','a')) > df x y 1 1 a 2 2 b 3 3 a > levels(df$y) [1] "a" "b" If you need to convert back to character values, use as.character(): > as.character(df$y) [1] "a" "b" "a" For many purposes, you can ignore the fact that your data is stored as a factor instead of a character vector. There are a few differences: 1. You can't compare the levels of a factor unless you declared it to be ordered: > df$y[1] > df$y[2] [1] NA Warning message: > not meaningful for factors in: Ops.factor(df$y[1], df$y[2]) but > df$y <- ordered(df$y) > df$y[1] > df$y[2] [1] FALSE However, you need to watch out here: the comparison is done by the order of the factors, not an alphabetic comparison of their names: > levels(df$y) <- c("before", "after") > df x y 1 1 before 2 2 after 3 3 before > df$y[1] > df$y[2] [1] FALSE 2. as.integer() works differently on factors: it gets the position in the levels vector. For example, > as.integer(df$y) [1] 1 2 1 > as.integer(as.character(df$y)) [1] NA NA NA Warning message: NAs introduced by coercion There are other differences, but these are the two main ones that are likely to cause you trouble. Duncan Murdoch
> > > 2d I can't manage to deal with factors, so when I have some, I > transform > > > them in vectors (with levels()), but I think I miss the power and > utility > > of > > > the factor type ? > > > > levels() is not the conversion you want.in fact I use 'as.numeric(levels(f))[f]' (from the ?factor description) That lists all the levels, but> > it doesn't tell you how they correspond to individual observations. For > > example, > > > > > df <- data.frame(x=1:3, y=c('a','b','a')) > > > df > > x y > > 1 1 a > > 2 2 b > > 3 3 a > > > levels(df$y) > > [1] "a" "b" > > > > If you need to convert back to character values, use as.character(): > > > > > as.character(df$y) > > [1] "a" "b" "a"got it.> > 1. You can't compare the levels of a factor unless you declared it to > > be ordered: > > > > > df$y[1] > df$y[2] > > [1] NA > > Warning message: > > > not meaningful for factors in: Ops.factor(df$y[1], df$y[2]) > > > > but > > > > > df$y <- ordered(df$y) > > > df$y[1] > df$y[2] > > [1] FALSE > > > > However, you need to watch out here: the comparison is done by the order > > of the factorsI am sorry I don't understand this. here you compare the position of a in the factor and the position of b in the factor ? , not an alphabetic comparison of their names:> > > > > levels(df$y) <- c("before", "after") > > > df > > x y > > 1 1 before > > 2 2 after > > 3 3 before > > > df$y[1] > df$y[2] > > [1] FALSEbest regards, florence. [[alternative HTML version deleted]]