Hi, The colClasses seem to be supressing 'NA' vlaues. How do I fix this? R script and first 5 lines of output is below. File "test2.dat" has blanks that are read as "NA" when I do not use 'colClasses', but as blanks when I use 'colClasses'. temp.df <- read.fwf("test2.dat", width=c(10,1,1,1,1,2,2,3,3,1), col.names=c("psu","losewt","maintain","fewcal","phyact","age","income","weight", "wtdesire","gender"), colClasses=c("factor","factor","factor","factor","factor","numeric","factor", "numeric","numeric","factor"), nrows=270000, comment.char="") temp.df psu losewt maintain fewcal phyact age income weight wtdesire gender 1 2003009323 2 2 52 05 220 220 1 2 2003005181 2 1 2 2 58 08 165 145 2 3 2003015942 2 1 4 1 76 05 142 130 2 4 2003011406 2 1 3 1 43 03 110 110 2 5 2003006786 1 4 1 49 06 178 145 2 ? why am I not getting missing values when I use 'colClasses'?
Because by default blank fields aren't considered to be missing in factors but they are in integer vectors.> f1<-factor(c(1,2,"",3,4)) > f1[1] 1 2 3 4 Levels: 1 2 3 4 I think you can fix this by specifying na.strings=c("NA","") On 26/09/06, Anupam Tyagi <AnupTyagi@yahoo.com> wrote:> > Hi, > > The colClasses seem to be supressing 'NA' vlaues. How do I fix this? > > R script and first 5 lines of output is below. > > File "test2.dat" has blanks that are read as "NA" when I do not use > 'colClasses', but as blanks when I use 'colClasses'. > > temp.df <- read.fwf("test2.dat", width=c(10,1,1,1,1,2,2,3,3,1), > col.names=c > ("psu","losewt","maintain","fewcal","phyact","age","income","weight", > "wtdesire","gender"), > > colClasses=c("factor","factor","factor","factor","factor","numeric","factor", > "numeric","numeric","factor"), > nrows=270000, comment.char="") > > temp.df > psu losewt maintain fewcal phyact age income weight wtdesire > gender > 1 2003009323 2 2 52 > 05 220 220 1 > 2 2003005181 2 1 2 2 58 > 08 165 145 2 > 3 2003015942 2 1 4 1 76 > 05 142 130 2 > 4 2003011406 2 1 3 1 43 > 03 110 110 2 > 5 2003006786 1 4 1 49 > 06 178 145 2 > > ? why am I not getting missing values when I use 'colClasses'? > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ================================David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP [[alternative HTML version deleted]]
Anupam Tyagi wrote:> Hi, > > The colClasses seem to be supressing 'NA' vlaues. How do I fix this? > > R script and first 5 lines of output is below. > > File "test2.dat" has blanks that are read as "NA" when I do not use > 'colClasses', but as blanks when I use 'colClasses'.Well, you say it should be a factor, hence " " is taken as a level. Otherwise you have to specify na.string = " ". Uwe Ligges> temp.df <- read.fwf("test2.dat", width=c(10,1,1,1,1,2,2,3,3,1), > col.names=c("psu","losewt","maintain","fewcal","phyact","age","income","weight", > "wtdesire","gender"), > colClasses=c("factor","factor","factor","factor","factor","numeric","factor", > "numeric","numeric","factor"), > nrows=270000, comment.char="") > > temp.df > psu losewt maintain fewcal phyact age income weight wtdesire gender > 1 2003009323 2 2 52 05 220 220 1 > 2 2003005181 2 1 2 2 58 08 165 145 2 > 3 2003015942 2 1 4 1 76 05 142 130 2 > 4 2003011406 2 1 3 1 43 03 110 110 2 > 5 2003006786 1 4 1 49 06 178 145 2 > > ? why am I not getting missing values when I use 'colClasses'? > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.