Dear R users, I??ve got a simple question but somehow I can??t find the solution: I have a data frame with columns 1-5 containing one set of integer values, and columns 6-10 containing another set of integer values. Columns 6-10 contain NA??s at some places. I now want to calculate (1) the number of values in each row of columns 6-10 that were NA??s (2) the sum of all values on columns 1-5 for which there were no missing values in the corresponding cells of columns 6-10. Example: (let??s call the data frame "data") Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 1 2 5 2 3 NA 5 NA 1 4 3 1 4 5 2 6 NA 4 NA 1 The result would then be (for the first row) (1) "There were 2 NA??s in columns 6-10." (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in the 1st and 3rd position in rows 6-10) So far, I know how to calculate the rowSums for the data.frame, but I don??t know how to condition these on the values of columns 6-10 rowSums(data[,1:5]) #that??s straightforward apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine But I don??t know how to select just the desired values of columns 1-5 (as described above) Can anyone help me? Thanks a lot in advance! Best regards Christoph
----- Original Message ----- From: "Christoph Scherber" <Christoph.Scherber at uni-jena.de> To: <r-help at stat.math.ethz.ch> Sent: Monday, May 02, 2005 10:52 AM Subject: [R] "apply" question> Dear R users, > > I??ve got a simple question but somehow I can??t find the solution: > > I have a data frame with columns 1-5 containing one set of integer values, > and columns 6-10 containing another set of integer values. Columns 6-10 > contain NA??s at some places. > > I now want to calculate > (1) the number of values in each row of columns 6-10 that were NA??s > (2) the sum of all values on columns 1-5 for which there were no missing > values in the corresponding cells of columns 6-10. > > > Example: (let??s call the data frame "data") > > Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 > 1 2 5 2 3 NA 5 NA 1 4 > 3 1 4 5 2 6 NA 4 NA 1 > > The result would then be (for the first row) > (1) "There were 2 NA??s in columns 6-10." > (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in the > 1st and 3rd position in rows 6-10) > > So far, I know how to calculate the rowSums for the data.frame, but I > don??t know how to condition these on the values of columns 6-10 > > rowSums(data[,1:5]) #that??s straightforward > apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine > > But I don??t know how to select just the desired values of columns 1-5 (as > described above)tmp <- rowSums(data[apply(data[,6:19],1,function(x) sum(is.na(x)))==0,1:5]) Now, tmp contains only the rowsums for the rows with no NAs in the other columns. Sean
Try:> ## Number of NAs in columns 6-10. > colSums(is.na(data[6:10]))Col6 Col7 Col8 Col9 Col10 1 1 1 1 0> > ## Number of NAs in each row of columns 6-10. > rowSums(is.na(data[6:10]))1 2 2 2> > ## Sums of rows 1-5 omitting corresponding NAs in cols 6-10. > rowSums(data[,1:5] * !is.na(data[,6:10]))1 2 7 9 If all entries are numeric, it'd be easier to use matrices instead of data frames. HTH, Andy> From: Christoph Scherber > > Dear R users, > > I??ve got a simple question but somehow I can??t find the solution: > > I have a data frame with columns 1-5 containing one set of integer > values, and columns 6-10 containing another set of integer values. > Columns 6-10 contain NA??s at some places. > > I now want to calculate > (1) the number of values in each row of columns 6-10 that were NA??s > (2) the sum of all values on columns 1-5 for which there were > no missing > values in the corresponding cells of columns 6-10. > > > Example: (let??s call the data frame "data") > > Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 > 1 2 5 2 3 NA 5 NA 1 4 > 3 1 4 5 2 6 NA 4 NA 1 > > The result would then be (for the first row) > (1) "There were 2 NA??s in columns 6-10." > (2) The mean of Columns 1-5 was 2+2+3=7" (because there were > NA??s in the > 1st and 3rd position in rows 6-10) > > So far, I know how to calculate the rowSums for the data.frame, but I > don??t know how to condition these on the values of columns 6-10 > > rowSums(data[,1:5]) #that??s straightforward > apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine > > But I don??t know how to select just the desired values of columns 1-5 > (as described above) > > > Can anyone help me? Thanks a lot in advance! > > Best regards > Christoph > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
you could try something like this: dat <- rbind(c(1, 2, 5, 2, 3, NA, 5, NA, 1, 4), c(3, 1, 4, 5, 2, 6, NA, 4, NA, 1)) ########## # (1) rowSums(is.na(dat[, 6:10])) ## (2) dat. <- dat[, 1:5] dat.[is.na(dat[, 6:10])] <- NA rowSums(dat., na.rm=TRUE) rowMeans(dat., na.rm=TRUE) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/336899 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Christoph Scherber" <Christoph.Scherber at uni-jena.de> To: <r-help at stat.math.ethz.ch> Sent: Monday, May 02, 2005 4:52 PM Subject: [R] "apply" question> Dear R users, > > I??ve got a simple question but somehow I can??t find the solution: > > I have a data frame with columns 1-5 containing one set of integer > values, and columns 6-10 containing another set of integer values. > Columns 6-10 contain NA??s at some places. > > I now want to calculate > (1) the number of values in each row of columns 6-10 that were NA??s > (2) the sum of all values on columns 1-5 for which there were no > missing values in the corresponding cells of columns 6-10. > > > Example: (let??s call the data frame "data") > > Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 > 1 2 5 2 3 NA 5 NA 1 4 > 3 1 4 5 2 6 NA 4 NA 1 > > The result would then be (for the first row) > (1) "There were 2 NA??s in columns 6-10." > (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in > the 1st and 3rd position in rows 6-10) > > So far, I know how to calculate the rowSums for the data.frame, but > I don??t know how to condition these on the values of columns 6-10 > > rowSums(data[,1:5]) #that??s straightforward > apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine > > But I don??t know how to select just the desired values of columns > 1-5 (as described above) > > > Can anyone help me? Thanks a lot in advance! > > Best regards > Christoph > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On 5/2/05, Christoph Scherber <Christoph.Scherber at uni-jena.de> wrote:> Dear R users, > > I??ve got a simple question but somehow I can??t find the solution: > > I have a data frame with columns 1-5 containing one set of integer > values, and columns 6-10 containing another set of integer values. > Columns 6-10 contain NA??s at some places. > > I now want to calculate > (1) the number of values in each row of columns 6-10 that were NA??sSupposing our data is called DF, rowSums(!is.na(DF[,6:10]))> (2) the sum of all values on columns 1-5 for which there were no missing > values in the corresponding cells of columns 6-10.In the expression below 1 + 0 *DF[,6:10] is like DF[,6:10] except all non-NAs are replaced by 1. Multiplying DF[,1:5] by that effectively replaces each element in DF[,1:5] with an NA if the corresponding DF[,6:10] contained an NA. rowSums( DF[,1:5] * (1 + 0 * DF[,6:10]), na.rm = TRUE )> > Example: (let??s call the data frame "data") > > Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 > 1 2 5 2 3 NA 5 NA 1 4 > 3 1 4 5 2 6 NA 4 NA 1 > > The result would then be (for the first row) > (1) "There were 2 NA??s in columns 6-10." > (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in the > 1st and 3rd position in rows 6-10)I guess you meant sum when you referred to mean in (2). If you really do want the mean replace rowSums with rowMeans in the expression given above in the answer to (2).