thr3ads.net - R help - [R] excluding factor levels with read.table() and colClasses= [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Peter Tait

2006-Mar-16 03:37 UTC

[R] excluding factor levels with read.table() and colClasses=

Hi,

I am reading a "|" delimited text file into R using read.table(). I am
using colClasses= to specify some variables as factors. Some of these 
variables include missing values coded as "NA". Unfortunately the R
code
I am using (pasted bellow) includes "NA" as one of the factor levels.
Is
it possible to remove the "NA" level from a factor with in 
read.table()?  If not  what is the most efficient  way of doing this?
 
inrange<-read.table("C://...",header=T,sep="|",colClasses=c(
id="factor"))

Thanks for your help.
Peter

Dieter Menne

2006-Mar-16 08:08 UTC

head link

[R] excluding factor levels with read.table() and colClasses=

Peter Tait <petertait <at> sympatico.ca> writes:
> Is it possible to remove the "NA" level from a factor with in 
> read.table()?  If not  what is the most efficient  way of doing this?
> 
>
inrange<-read.table("C://...",header=T,sep="|",colClasses=c(
id="factor"))
See parameters na.strings in read.table. 

Dieter

Peter Tait

2006-Mar-17 02:57 UTC

head link

[R] excluding factor levels with read.table() and colClasses=

Hi,
I did try the code with the na.strings option but it did not work. The
factor bmicat still contains "NA" as one of its levels. Can
read.table()
exclude "NA" values from the variables it reads from test.txt? If  not
what is the best way to remove these unwanted levels from a factor when
programming a function?
Thanks
Peter
>inrange<-read.table("C://test.txt", header=T,
sep="|",na.strings=c("NA","."),
colClasses=c(bmicat="factor"))>summary(inrange)bmicat
  <23   : 294>28   :1482  23-28 :1043
NA    :  13> levels(bmicat)[1] "<23 "   ">28 "   "23-28 " "NA
"> contrasts(bmicat)       >28  23-28  NA
<23       0      0   0>28       1      0   023-28     0      1   0
NA        0      0   1>

Gabor Grothendieck

2006-Mar-17 03:21 UTC

head link

[R] excluding factor levels with read.table() and colClasses=

Can you provide a reproducible example along the lines of the
following which, as seen below, does work (on R 2.2.1 Windows XP):
> x <- head(letters)
> x[2:3] <- c("NA", ".")
> x[1] "a"  "NA" "."  "d"  "e" 
"f"> DF <- read.table(textConnection(x), na.strings = c("NA",
"."))
> DF    V1
1    a
2 <NA>
3 <NA>
4    d
5    e
6    f> levels(DF[[1]])[1] "a" "d" "e" "f"


On 3/16/06, Peter Tait <petertait at sympatico.ca>
wrote:> Hi,
> I did try the code with the na.strings option but it did not work. The
> factor bmicat still contains "NA" as one of its levels. Can
read.table()
> exclude "NA" values from the variables it reads from test.txt? If
not
> what is the best way to remove these unwanted levels from a factor when
> programming a function?
> Thanks
> Peter
>
> >inrange<-read.table("C://test.txt", header=T,
sep="|",
> na.strings=c("NA","."),
colClasses=c(bmicat="factor"))
> >summary(inrange)
> bmicat
>  <23   : 294
> >28   :1482
> 23-28 :1043
> NA    :  13
> > levels(bmicat)
> [1] "<23 "   ">28 "   "23-28 " "NA
"
> > contrasts(bmicat)
>       >28  23-28  NA
> <23       0      0   0
> >28       1      0   0
> 23-28     0      1   0
> NA        0      0   1
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Peter Tait

2006-Mar-17 03:52 UTC

head link

[R] excluding factor levels with read.table() and colClasses=

Hi Gabor,

in your example X is not a factor, I don't know if this matters.
 > x <- head(letters)
 > is.factor(x)
[1] FALSE

Here is an example of my problem:
The file C://test.txt contains
id|bmicat|cat
1 |NA |.
2 |<23 |a
3 |>28 |b
4 |NA |c

 > 
test<-read.table("C://test.txt",header=T,sep="|",na.strings=c("NA","."),colClasses=c(id="factor",
bmicat="factor", cat="factor"))
 > summary(test)
 id     bmicat    cat  1 :1   <23 :1   a   :1 2 :1   >28 :1   b   :1 3 
:1   NA  :2   c   :1 4 :1                 NA's:1 > levels(test$bmicat)
[1] "<23 " ">28 " "NA "
 > levels(test$cat)
[1] "a" "b" "c"
 >

I tried the to read this file with out the cat variable and read.table() 
recognized the "NA" properly. Adding the cat variable and its other
code
for the missing (".") seems to confuse read.table().

Thanks for your help.
Peter

Maybe Matching Threads

Search for more seemingly similar threads

R help - Mar 2006 - excluding factor levels with read.table() and colClasses=

[R] excluding factor levels with read.table() and colClasses=

[R] excluding factor levels with read.table() and colClasses=

[R] excluding factor levels with read.table() and colClasses=

[R] excluding factor levels with read.table() and colClasses=

[R] excluding factor levels with read.table() and colClasses=

Maybe Matching Threads