Hello all, I am not sure whether it actually is a bug, but it is not the behaviour I would expect. Please consider this:>Sibships[1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991 [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168 [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726 [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263 [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618 [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721 [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458 [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423 [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815 [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468 [56] Patient_468 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232>Comparison_Indices[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE> Sibships[Comparison_Indices][1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 [6] Patient_8901 Patient_7413 Patient_7413 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 The problem with this last command is that I would expect 4 levels (because only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: levels() does not take array indices into account or stated otherwise: if you use a subset in an array (vector), the levels() are not properly updated (to my opinion). What I additionally found is the following:> small_test <- factor(x=c("a", "b", "c")) > typeof(small_test)[1] "integer" The same happens to the Sibships that I defined as a factor? Why is it of type integer? This is the version() output:> version_ platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 6.1 year 2007 month 11 day 26 svn rev 43537 language R version.string R version 2.6.1 (2007-11-26)>So: should I submit a Bug report? Regards, Dr. Philip de Groot Wageningen University
Groot, Philip de wrote:> Hello all, > > I am not sure whether it actually is a bug, but it is not the behaviour I would expect. Please consider this: > > >> Sibships >> > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991 > [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168 > [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726 > [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263 > [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618 > [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721 > [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458 > [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423 > [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815 > [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468 > [56] Patient_468 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > > >> Comparison_Indices >> > [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE > [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > >> Sibships[Comparison_Indices] >> > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_7413 Patient_7413 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > > The problem with this last command is that I would expect 4 levels (because only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: levels() does not take array indices into account or stated otherwise: if you use a subset in an array (vector), the levels() are not properly updated (to my opinion). > > What I additionally found is the following: > >> small_test <- factor(x=c("a", "b", "c")) >> typeof(small_test) >> > [1] "integer" > > The same happens to the Sibships that I defined as a factor? Why is it of type integer? > > This is the version() output: > >> version >> > _ > platform x86_64-unknown-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 2 > minor 6.1 > year 2007 > month 11 > day 26 > svn rev 43537 > language R > version.string R version 2.6.1 (2007-11-26) > > > So: should I submit a Bug report? > >No. This is all completely as designed. Factors are internally integers (group codes), with a levels attribute that says what the codes mean. If you want the full story, use dput(small_test) or class(small_test) or str(small_test). And subsetting a factor retains the original factor levels. To drop unused levels, just use factor(f[index]) or f[index, drop=TRUE]. The opposite behaviour can be even more annoying/dangerous because it leads to empty cells dropping out of tables and bars disappearing from barplots. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
This is not a bug; it is deliberately designed this way. There are circumstances when you want to drop levels on subsetting and other circumstances where you don't, so the default behaviour can't make everyone happy. However, there is an option to get the behaviour you want> x<-as.factor(LETTERS) > levels(x[1])[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" [20] "T" "U" "V" "W" "X" "Y" "Z"> levels(x[1,drop=TRUE])[1] "A" On Mon, 28 Jan 2008, Groot, Philip de wrote:> Hello all, > > I am not sure whether it actually is a bug, but it is not the behaviour I would expect. Please consider this: > >> Sibships > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_4008 Patient_4008 Patient_7991 Patient_7991 > [11] Patient_8353 Patient_8353 Patient_1212 Patient_1212 Patient_2168 > [16] Patient_2168 Patient_2760 Patient_2760 Patient_4726 Patient_4726 > [21] Patient_6699 Patient_6699 Patient_7641 Patient_7641 Patient_8263 > [26] Patient_8263 Patient_1389 Patient_1389 Patient_1618 Patient_1618 > [31] Patient_2410 Patient_2410 Patient_2612 Patient_2612 Patient_2721 > [36] Patient_2721 Patient_5053 Patient_5053 Patient_8458 Patient_8458 > [41] Patient_211 Patient_211 Patient_9004 Patient_9004 Patient_3423 > [46] Patient_3423 Patient_7413 Patient_7413 Patient_7815 Patient_7815 > [51] Patient_9232 Patient_9232 Patient_2267 Patient_2267 Patient_468 > [56] Patient_468 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > >> Comparison_Indices > [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE > [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > >> Sibships[Comparison_Indices] > [1] Patient_2400 Patient_2400 Patient_345 Patient_345 Patient_8901 > [6] Patient_8901 Patient_7413 Patient_7413 > 28 Levels: Patient_1212 Patient_1389 Patient_1618 Patient_211 ... Patient_9232 > > The problem with this last command is that I would expect 4 levels (because only 8 "Comparison_Indices" are true, which is equal to 4 sibships. So: levels() does not take array indices into account or stated otherwise: if you use a subset in an array (vector), the levels() are not properly updated (to my opinion). > > What I additionally found is the following: >> small_test <- factor(x=c("a", "b", "c")) >> typeof(small_test) > [1] "integer" > > The same happens to the Sibships that I defined as a factor? Why is it of type integer? > > This is the version() output: >> version > _ > platform x86_64-unknown-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 2 > minor 6.1 > year 2007 > month 11 > day 26 > svn rev 43537 > language R > version.string R version 2.6.1 (2007-11-26) >> > > So: should I submit a Bug report? > > Regards, > > Dr. Philip de Groot > Wageningen University > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle