I'm seeing what looks to me like odd behaviour when I use na.omit on a simple "length" function, as follows. > sno [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 > a [1] 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 > b [1] 1 1 0 1 1 1 0 0 NA 0 0 0 NA 0 1 NA 0 1 0 0 0 0 NA 0 0 0 0 NA 0 NA 0 1 0 0 #NA refers to no data available. > df=data.frame(sno,a,b) # I'm pasting the sorted data frame below: > sortdf=df[order(a,b),] > sortdf sno a b 3 3 0 0 7 7 0 0 8 8 0 0 10 10 0 0 11 11 0 0 12 12 0 0 14 14 0 0 17 17 0 0 20 20 0 0 21 21 0 0 22 22 0 0 24 24 0 0 25 25 0 0 26 26 0 0 27 27 0 0 29 29 0 0 31 31 0 0 33 33 0 0 34 34 0 0 1 1 0 1 4 4 0 1 9 9 0 NA 13 13 0 NA 23 23 0 NA 28 28 0 NA 30 30 0 NA 19 19 1 0 2 2 1 1 5 5 1 1 6 6 1 1 15 15 1 1 18 18 1 1 32 32 1 1 16 16 1 NA #Now I wish to count howmany records have a=1 AND b=0. From the lower section of that sorted dataframe we see the answer is 1 (record # 19). But instead I'm seeing 2. Probably counting record # 16 also. > na.omit(length(sno[a==1 & b==0])) [1] 2 I'd be grateful to anyone who can point out what I'm doing wrong. Regards.
This looks buggish to me (though at least non-intuitive), but I am almost sure there is an explanation for why the b==0 condition includes the NAs. You find a way to circumvent it in the last two lines of the example below. a=c(1,1,1,0,0,0) b=c(1,NA,0,1,NA,0) sno=rnorm(6) na.omit(length(sno[a==1 & b==0])) sno[a==1 & b==0] length(sno[a==1 & b==0]) which(a==1&b==0) sno[which(a==1&b==0)] Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Viju Moses Gesendet: Tuesday, October 06, 2009 12:52 AM An: r-help at r-project.org Betreff: [R] Problem with na.omit when using length() I'm seeing what looks to me like odd behaviour when I use na.omit on a simple "length" function, as follows. > sno [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 > a [1] 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 > b [1] 1 1 0 1 1 1 0 0 NA 0 0 0 NA 0 1 NA 0 1 0 0 0 0 NA 0 0 0 0 NA 0 NA 0 1 0 0 #NA refers to no data available. > df=data.frame(sno,a,b) # I'm pasting the sorted data frame below: > sortdf=df[order(a,b),] > sortdf sno a b 3 3 0 0 7 7 0 0 8 8 0 0 10 10 0 0 11 11 0 0 12 12 0 0 14 14 0 0 17 17 0 0 20 20 0 0 21 21 0 0 22 22 0 0 24 24 0 0 25 25 0 0 26 26 0 0 27 27 0 0 29 29 0 0 31 31 0 0 33 33 0 0 34 34 0 0 1 1 0 1 4 4 0 1 9 9 0 NA 13 13 0 NA 23 23 0 NA 28 28 0 NA 30 30 0 NA 19 19 1 0 2 2 1 1 5 5 1 1 6 6 1 1 15 15 1 1 18 18 1 1 32 32 1 1 16 16 1 NA #Now I wish to count howmany records have a=1 AND b=0. From the lower section of that sorted dataframe we see the answer is 1 (record # 19). But instead I'm seeing 2. Probably counting record # 16 also. > na.omit(length(sno[a==1 & b==0])) [1] 2 I'd be grateful to anyone who can point out what I'm doing wrong. Regards. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
just put na.omit() inside length() if you intend to omit the NA elements of the vector (otherwise you are trying to omit the NA's of the returned value of length() which is a scalar 2): length(na.omit(sno[a==1 & b==0])) Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-6609 Web: http://yihui.name Department of Statistics, Iowa State University 3211 Snedecor Hall, Ames, IA On Mon, Oct 5, 2009 at 11:51 PM, Viju Moses <vijumoses at gmail.com> wrote:> I'm seeing what looks to me like odd behaviour when I use na.omit on a > simple "length" function, as follows. > >> sno > ?[1] ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 > 25 26 27 28 29 30 31 32 33 34 >> a > ?[1] 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 >> b > ?[1] ?1 ?1 ?0 ?1 ?1 ?1 ?0 ?0 NA ?0 ?0 ?0 NA ?0 ?1 NA ?0 ?1 ?0 ?0 ?0 ?0 NA ?0 > ?0 ?0 ?0 NA ?0 NA ?0 ?1 ?0 ?0 > > #NA refers to no data available. > >> df=data.frame(sno,a,b) > # I'm pasting the sorted data frame below: >> sortdf=df[order(a,b),] >> sortdf > ? sno a ?b > 3 ? ?3 0 ?0 > 7 ? ?7 0 ?0 > 8 ? ?8 0 ?0 > 10 ?10 0 ?0 > 11 ?11 0 ?0 > 12 ?12 0 ?0 > 14 ?14 0 ?0 > 17 ?17 0 ?0 > 20 ?20 0 ?0 > 21 ?21 0 ?0 > 22 ?22 0 ?0 > 24 ?24 0 ?0 > 25 ?25 0 ?0 > 26 ?26 0 ?0 > 27 ?27 0 ?0 > 29 ?29 0 ?0 > 31 ?31 0 ?0 > 33 ?33 0 ?0 > 34 ?34 0 ?0 > 1 ? ?1 0 ?1 > 4 ? ?4 0 ?1 > 9 ? ?9 0 NA > 13 ?13 0 NA > 23 ?23 0 NA > 28 ?28 0 NA > 30 ?30 0 NA > 19 ?19 1 ?0 > 2 ? ?2 1 ?1 > 5 ? ?5 1 ?1 > 6 ? ?6 1 ?1 > 15 ?15 1 ?1 > 18 ?18 1 ?1 > 32 ?32 1 ?1 > 16 ?16 1 NA > > #Now I wish to count howmany records have a=1 AND b=0. From the lower > section of that sorted dataframe we see the answer is 1 (record # 19). But > instead I'm seeing 2. Probably counting record # 16 also. > >> na.omit(length(sno[a==1 & b==0])) > [1] 2 > > I'd be grateful to anyone who can point out what I'm doing wrong. > > Regards. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Reasonably Related Threads
- Fisher test problem
- extracting data from a list of unformatted text files
- lines don't wrap. must scroll horizontally to see/edit a long line in R GUI
- mgcp transfer takeback with ata186 (logs with comments - long post)
- How to order some of my columns (not rows) alphabetically