Peter Tait
2006-Mar-16 03:37 UTC
[R] excluding factor levels with read.table() and colClasses=
Hi, I am reading a "|" delimited text file into R using read.table(). I am using colClasses= to specify some variables as factors. Some of these variables include missing values coded as "NA". Unfortunately the R code I am using (pasted bellow) includes "NA" as one of the factor levels. Is it possible to remove the "NA" level from a factor with in read.table()? If not what is the most efficient way of doing this? inrange<-read.table("C://...",header=T,sep="|",colClasses=c( id="factor")) Thanks for your help. Peter
Dieter Menne
2006-Mar-16 08:08 UTC
[R] excluding factor levels with read.table() and colClasses=
Peter Tait <petertait <at> sympatico.ca> writes:> Is it possible to remove the "NA" level from a factor with in > read.table()? If not what is the most efficient way of doing this? > > inrange<-read.table("C://...",header=T,sep="|",colClasses=c( id="factor"))See parameters na.strings in read.table. Dieter
Peter Tait
2006-Mar-17 02:57 UTC
[R] excluding factor levels with read.table() and colClasses=
Hi, I did try the code with the na.strings option but it did not work. The factor bmicat still contains "NA" as one of its levels. Can read.table() exclude "NA" values from the variables it reads from test.txt? If not what is the best way to remove these unwanted levels from a factor when programming a function? Thanks Peter>inrange<-read.table("C://test.txt", header=T, sep="|",na.strings=c("NA","."), colClasses=c(bmicat="factor"))>summary(inrange)bmicat <23 : 294>28 :148223-28 :1043 NA : 13> levels(bmicat)[1] "<23 " ">28 " "23-28 " "NA "> contrasts(bmicat)>28 23-28 NA <23 0 0 0>28 1 0 023-28 0 1 0 NA 0 0 1>
Gabor Grothendieck
2006-Mar-17 03:21 UTC
[R] excluding factor levels with read.table() and colClasses=
Can you provide a reproducible example along the lines of the following which, as seen below, does work (on R 2.2.1 Windows XP):> x <- head(letters) > x[2:3] <- c("NA", ".") > x[1] "a" "NA" "." "d" "e" "f"> DF <- read.table(textConnection(x), na.strings = c("NA", ".")) > DFV1 1 a 2 <NA> 3 <NA> 4 d 5 e 6 f> levels(DF[[1]])[1] "a" "d" "e" "f" On 3/16/06, Peter Tait <petertait at sympatico.ca> wrote:> Hi, > I did try the code with the na.strings option but it did not work. The > factor bmicat still contains "NA" as one of its levels. Can read.table() > exclude "NA" values from the variables it reads from test.txt? If not > what is the best way to remove these unwanted levels from a factor when > programming a function? > Thanks > Peter > > >inrange<-read.table("C://test.txt", header=T, sep="|", > na.strings=c("NA","."), colClasses=c(bmicat="factor")) > >summary(inrange) > bmicat > <23 : 294 > >28 :1482 > 23-28 :1043 > NA : 13 > > levels(bmicat) > [1] "<23 " ">28 " "23-28 " "NA " > > contrasts(bmicat) > >28 23-28 NA > <23 0 0 0 > >28 1 0 0 > 23-28 0 1 0 > NA 0 0 1 > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Peter Tait
2006-Mar-17 03:52 UTC
[R] excluding factor levels with read.table() and colClasses=
Hi Gabor, in your example X is not a factor, I don't know if this matters. > x <- head(letters) > is.factor(x) [1] FALSE Here is an example of my problem: The file C://test.txt contains id|bmicat|cat 1 |NA |. 2 |<23 |a 3 |>28 |b 4 |NA |c > test<-read.table("C://test.txt",header=T,sep="|",na.strings=c("NA","."),colClasses=c(id="factor", bmicat="factor", cat="factor")) > summary(test) id bmicat cat 1 :1 <23 :1 a :1 2 :1 >28 :1 b :1 3 :1 NA :2 c :1 4 :1 NA's:1 > levels(test$bmicat) [1] "<23 " ">28 " "NA " > levels(test$cat) [1] "a" "b" "c" > I tried the to read this file with out the cat variable and read.table() recognized the "NA" properly. Adding the cat variable and its other code for the missing (".") seems to confuse read.table(). Thanks for your help. Peter