Jenifer Larson-Hall
2006-Jul-10 19:24 UTC
[R] Counting observations split by a factor when there are NAs in the data
I am a very novice R user, a social scientist (linguist) who is trying to learn to use R after being very familiar with SPSS. Please be kind! My concern: I cannot figure out a way to get an accurate count of observations of one column of data split by a factor when there are NAs in the data. I know how to use commands like tapply and summaryBy to obtain other summary statistics I am interested in, such as the following: tapply(RLWTEST, list(STATUS), mean, na.rm=T) summaryBy(RLWTEST~STATUS, data=lh.forgotten, FUN=c(mean, sd, min, max), na.rm=T) However, with tapply I know I cannot use length to get a count where there are NAs. summaryBy appears to work the same way. I do know how to get a count of the entire column using sum: sum(!is.na(lh.forgotten$RLWTEST)) However, this does not give me a count split up by my factor (STATUS). I have looked through Daalgard (2002) and Verzani (2005), and have searched the help files, but with no luck. Thank you in advance for your help. I love R and am interested in making it more accessible to social scientist types like me. I know it can do everything SPSS can and more, but sometimes the very simplest things seem to be a lot harder in R. Jenifer Dr. Jenifer Larson-Hall Assistant Professor of Linguistics University of North Texas (940)369-8950
Peter Dalgaard
2006-Jul-10 19:47 UTC
[R] Counting observations split by a factor when there are NAs in the data
"Jenifer Larson-Hall" <jenifer at unt.edu> writes:> I am a very novice R user, a social scientist (linguist) who is trying > to learn to use R after being very familiar with SPSS. Please be kind! > > My concern: > I cannot figure out a way to get an accurate count of observations of > one column of data split by a factor when there are NAs in the data. > > I know how to use commands like tapply and summaryBy to obtain other > summary statistics I am interested in, such as the following: > tapply(RLWTEST, list(STATUS), mean, na.rm=T) > summaryBy(RLWTEST~STATUS, data=lh.forgotten, FUN=c(mean, sd, min, max), > na.rm=T) > > However, with tapply I know I cannot use length to get a count where > there are NAs. summaryBy appears to work the same way. I do know how to > get a count of the entire column using sum: > sum(!is.na(lh.forgotten$RLWTEST)) > > However, this does not give me a count split up by my factor (STATUS). I > have looked through Daalgard (2002) and Verzani (2005), and have^^^^^^^^ Ahem!....> searched the help files, but with no luck.How about with(lh.forgotten, tapply(!is.na(RLWTEST), STATUS, sum) ) or maybe just table(STATUS[!is.na(RLWTEST)])> Thank you in advance for your help. I love R and am interested in making > it more accessible to social scientist types like me. I know it can do > everything SPSS can and more, but sometimes the very simplest things > seem to be a lot harder in R.-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907