gordon.harrington@uni.edu
2001-Jan-31 09:11 UTC
[R] R 1.2.1 - read.table - factors problem or is it a data.frame problem
Patrick Connolly refers to the read.table help manual page to show how to coerce input columns to character or to numeric. Indeed coercion with a logical vector will set the mode regardless of the column content. He also notes one can set factors with factor(). However, the problem encountered is not one of setting factors but of unsetting them. The manual states that variables of mode or type character will become factors. My data input efforts showed no relationship between type and factor. With no evident reason, most character variables did not become factors while many real variables did. It is a bit disconcerting to get an output with thousands of floating point factor levels or error messages that one''s data are of the wrong mode for any analysis whatsoever. How does one unset mode assignment of factor and how does one avoid the problem of automatic misassignment with other datasets? Gordon> |> > |> R-1.2.1 Suse 7.0 binary > |> > |> > fooframe <- read.table("foo", header=FALSE, as.is=c(1:22,398), > |> col.names=foo.colheads) > |> > |> cols 1-9 are alphabetic, 10-22 and 398 are numbers but unordered > categorical |> 23-375 are numeric with and without decimal points > |> > |> As I read the description the "as.is" index numbers should force those > columns |> to be "character" and "factor". However only the 1-9 alpha > become "character" |> but they did not become "factor". Everything else > shows mode "numeric" but > > Here is your explanation: > > as.is: the default behavior of `read.table'' is to convert > non-numeric variables to factors. The variable `as.is'' > controls this conversion. Its value is either a vector of > logicals (values are recycled if necessary), or a vector of > numeric indices which specify which columns should be left as > character strings. > > Since your column 10, etc are not character, as.is will not have an > effect on them. I think it is simple enough to convert numeric > columns into factors (as distinct from continuous variables) with > factor(). > > > |> "is.factor" distributes TRUE to various variables in no pattern > discernible to |> me either in distribution or in the data content of the > columns. (I tried |> giving as.is a type vector but that just made > everything "numeric" with no |> pattern to factors.) No "as.is" parameter > still leaves the odd distribution of |> factors. > |> > |> The main effects are that for some statistical functions on data > subsets, one |> is warned one cannot perform the operations on categorical > data while others |> stop for NA''s. There are no NA''s in the dataset! > Running "unique" on each |> variate and collecting outside the frame shows > adequate dispersion for analysis |> with no zero variances. "cor" will only > run "pairwise" though "complete.cases" |> finds no NA''s. > |> > |> What am I missing? > > My guess is that something unplanned is happening when you try as.is > on numeric columns.Gordon M. Harrington Mail: 3720 Village Place, #6308 Professor Emeritus Waterloo, IA 50702-5848 University of Northern Iowa Phone: 319-291-8535 gordon.harrington at uni.edu Fax: 319-291-8491 dryfly at aya.yale.edu 319-291-8324 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley
2001-Jan-31 15:41 UTC
[R] R 1.2.1 - read.table - factors problem or is it a data.frame problem
On Wed, 31 Jan 2001 gordon.harrington at uni.edu wrote:> However, the problem encountered is not one of setting factors but of unsetting > them. The manual states that variables of mode or type character will become > factors. My data input efforts showed no relationship between type and factor. > With no evident reason, most character variables did not become factors while > many real variables did. It is a bit disconcerting to get an output with > thousands of floating point factor levels or error messages that one''s data are > of the wrong mode for any analysis whatsoever. > > How does one unset mode assignment of factor and how does one avoid the problem > of automatic misassignment with other datasets?You can convert a factor to the correct numeric values with as.numeric(as.character(the.factor)) We don''t have enough information to tell what happened in your case but in my experience the most common reason for a numeric variable to read as a factor has been misspecifying the missing value codes in the na.strings argument. This argument lists the strings that should be converted to NAs; any other strings will trigger a conversion to factor. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Heberto Ghezzo
2001-Feb-01 12:49 UTC
[R] R 1.2.1 - read.table - factors problem or is it a data.frame problem
I have some problems with read.table and floats turning up as factors. In my case it was not a blank in the file but an unary minus!! so 3.24,-57.23,... the 3.24 is numeric but -57.23 is a factor. Yes I turned it into a numeric with as.numeric(as.character(.. but I think it will be better to modify somehow the read.table/read.csv code. Thanks anyway. R. Heberto Ghezzo Ph.D. Meakins-Christie Labs McGill University Montreal - Canada heberto at meakins.lan.mcgill.ca -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._