I have an ordering and factor problem to which there must be a simple solution! The version is R 2.0.1 (2004-11-15) on A Linux platform. A data frame H is read in from a .csv file using read.csv with as.is=TRUE. Another data frame HN is constructed from data and I want to compare two columns both named ss of the (sorted) data frames that are the same length. The problem is that HN$ss is always treated as a factor whatever I do while H$ss is treated as an integer, which is what I want. Somewhere R is making an implicit transformation but I can't see how to correct it. The data are all integers in the range 1:13 - in fact with no gaps. If I tabulate from H:> table(H$ss)1 2 3 4 5 6 7 8 9 10 11 12 13 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747 and for HN:> table(HN$ss)1 10 11 12 13 2 3 4 5 6 7 8 9 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 At some time while constructing HN, I have to make it a character matrix - otherwise gsub doesn't work when removing surplus blanks for example - but I have turned it back into a data frame in the end. If I check the modes, both data frames are lists and both columns are numeric - HN is not reported as a factor. Yet it appears to be treated as a factor, for example:> table(formatC(H$ss,dig=0,width=2,format="f",flag="0"))01 02 03 04 05 06 07 08 09 10 11 12 13 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747> table(formatC(HN$ss,dig=0,width=2,format="f",flag="0"))yet: 1 10 11 12 13 2 3 4 5 6 7 8 9 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 Warning messages: 1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0)) 2: "<" not meaningful for factors in: Ops.factor(x, 0) I have tried as.numeric but then I get the factor level rather than name returned:> table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0"))01 02 03 04 05 06 07 08 09 10 11 12 13 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 which obviously is a tabulation of the internal levels rather than the data. TIA John John Logsdon "Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler" j.logsdon at quantex-research.com a.einstein at relativity.org +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com
John, at ?factor, you can see : " Be careful only to compare factors with the same set of levels (in the same order). In particular, 'as.numeric' applied to a factor is meaningless, and may happen by implicit coercion. To "revert" a factor 'f' to its original numeric values, 'as.numeric(levels(f))[f]' is recommended and slightly more efficient than 'as.numeric(as.character(f))'. " 'as.numeric(levels(f))[f]' worked well for me in the similar situation i.e. to get back numeric values from a factor type. But see also the I() "option" of the data.frame() function, which allows you not to obtain a factor (from a character vector only) if it is not what you want. from ?data.frame : "Objects passed to 'data.frame' should have the same number of rows, but atomic vectors, factors and character vectors protected by 'I' will be recycled a whole number of times if necessary." see this example: --------------------------------------------------> v1<-c(1,2,3) > v2<-c("a","b","c") > df.A<-data.frame(v1,v2) > str(df.A)`data.frame': 3 obs. of 2 variables: $ v1: num 1 2 3 $ v2: Factor w/ 3 levels "a","b","c": 1 2 3> df.B<-data.frame(v1,I(v2)) > str(df.B)`data.frame': 3 obs. of 2 variables: $ v1: num 1 2 3 $ v2:Class 'AsIs' chr [1:3] "a" "b" "c" ------------------------------------------------- hope this helps, Florence. On 11/25/05, John Logsdon <j.logsdon@quantex-research.com> wrote:> > I have an ordering and factor problem to which there must be a simple > solution! The version is R 2.0.1 (2004-11-15) on A Linux platform. > > A data frame H is read in from a .csv file using read.csv with as.is=TRUE. > > Another data frame HN is constructed from data and I want to compare two > columns both named ss of the (sorted) data frames that are the same > length. > > The problem is that HN$ss is always treated as a factor whatever I do > while H$ss is treated as an integer, which is what I want. Somewhere R is > making an implicit transformation but I can't see how to correct it. > > The data are all integers in the range 1:13 - in fact with no gaps. If I > tabulate from H: > > > table(H$ss) > > 1 2 3 4 5 6 7 8 9 10 11 12 13 > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747 > > and for HN: > > > table(HN$ss) > > 1 10 11 12 13 2 3 4 5 6 7 8 9 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > > At some time while constructing HN, I have to make it a character matrix - > otherwise gsub doesn't work when removing surplus blanks for example - but > I have turned it back into a data frame in the end. > > If I check the modes, both data frames are lists and both columns are > numeric - HN is not reported as a factor. Yet it appears to be treated as > a factor, for example: > > > table(formatC(H$ss,dig=0,width=2,format="f",flag="0")) > > 01 02 03 04 05 06 07 08 09 10 11 12 13 > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747 > > table(formatC(HN$ss,dig=0,width=2,format="f",flag="0")) > > yet: > > 1 10 11 12 13 2 3 4 5 6 7 8 9 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > Warning messages: > 1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0)) > 2: "<" not meaningful for factors in: Ops.factor(x, 0) > > I have tried as.numeric but then I get the factor level rather than name > returned: > > > table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0")) > > 01 02 03 04 05 06 07 08 09 10 11 12 13 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > > which obviously is a tabulation of the internal levels rather than the > data. > > TIA > > John > > John Logsdon "Try to make things as simple > Quantex Research Ltd, Manchester UK as possible but not simpler" > j.logsdon@quantex-research.com a.einstein@relativity.org > +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]
have also a look at ?is.ordered, it may be more useful than my previous mail. florence. On 11/25/05, John Logsdon <j.logsdon@quantex-research.com> wrote:> > I have an ordering and factor problem to which there must be a simple > solution! The version is R 2.0.1 (2004-11-15) on A Linux platform. > > A data frame H is read in from a .csv file using read.csv with as.is=TRUE. > > Another data frame HN is constructed from data and I want to compare two > columns both named ss of the (sorted) data frames that are the same > length. > > The problem is that HN$ss is always treated as a factor whatever I do > while H$ss is treated as an integer, which is what I want. Somewhere R is > making an implicit transformation but I can't see how to correct it. > > The data are all integers in the range 1:13 - in fact with no gaps. If I > tabulate from H: > > > table(H$ss) > > 1 2 3 4 5 6 7 8 9 10 11 12 13 > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747 > > and for HN: > > > table(HN$ss) > > 1 10 11 12 13 2 3 4 5 6 7 8 9 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > > At some time while constructing HN, I have to make it a character matrix - > otherwise gsub doesn't work when removing surplus blanks for example - but > I have turned it back into a data frame in the end. > > If I check the modes, both data frames are lists and both columns are > numeric - HN is not reported as a factor. Yet it appears to be treated as > a factor, for example: > > > table(formatC(H$ss,dig=0,width=2,format="f",flag="0")) > > 01 02 03 04 05 06 07 08 09 10 11 12 13 > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747 > > table(formatC(HN$ss,dig=0,width=2,format="f",flag="0")) > > yet: > > 1 10 11 12 13 2 3 4 5 6 7 8 9 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > Warning messages: > 1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0)) > 2: "<" not meaningful for factors in: Ops.factor(x, 0) > > I have tried as.numeric but then I get the factor level rather than name > returned: > > > table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0")) > > 01 02 03 04 05 06 07 08 09 10 11 12 13 > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784 > > which obviously is a tabulation of the internal levels rather than the > data. > > TIA > > John > > John Logsdon "Try to make things as simple > Quantex Research Ltd, Manchester UK as possible but not simpler" > j.logsdon@quantex-research.com a.einstein@relativity.org > +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >[[alternative HTML version deleted]]