thr3ads.net - R help - [R] "apply" question [May 2005]

If this information is useful, please help other people find it:
Share via:

Christoph Scherber

2005-May-02 14:52 UTC

[R] "apply" question

Dear R users,

I??ve got a simple question but somehow I can??t find the solution:

I have a data frame with columns 1-5 containing one set of integer 
values, and columns 6-10 containing another set of integer values. 
Columns 6-10 contain NA??s at some places.

I now want to calculate
(1) the number of values in each row of columns 6-10 that were NA??s
(2) the sum of all values on columns 1-5 for which there were no missing 
values in the corresponding cells of columns 6-10.


Example: (let??s call the data frame "data")

Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
1      2      5      2      3      NA      5      NA    1      4
3      1      4      5      2      6      NA      4     NA     1

The result would then be (for the first row)
(1) "There were 2 NA??s in columns 6-10."
(2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in the 
1st and 3rd position in rows 6-10)

So far, I know how to calculate the rowSums for the data.frame, but I 
don??t know how to condition these on the values of columns 6-10

rowSums(data[,1:5]) #that??s straightforward
apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine

But I don??t know how to select just the desired values of columns 1-5 
(as described above)


Can anyone help me? Thanks a lot in advance!

Best regards
Christoph

Sean Davis

2005-May-02 14:58 UTC

head link

[R] "apply" question

----- Original Message ----- 
From: "Christoph Scherber" <Christoph.Scherber at uni-jena.de>
To: <r-help at stat.math.ethz.ch>
Sent: Monday, May 02, 2005 10:52 AM
Subject: [R] "apply" question

> Dear R users,
>
> I??ve got a simple question but somehow I can??t find the solution:
>
> I have a data frame with columns 1-5 containing one set of integer values, 
> and columns 6-10 containing another set of integer values. Columns 6-10 
> contain NA??s at some places.
>
> I now want to calculate
> (1) the number of values in each row of columns 6-10 that were NA??s
> (2) the sum of all values on columns 1-5 for which there were no missing 
> values in the corresponding cells of columns 6-10.
>
>
> Example: (let??s call the data frame "data")
>
> Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
> 1      2      5      2      3      NA      5      NA    1      4
> 3      1      4      5      2      6      NA      4     NA     1
>
> The result would then be (for the first row)
> (1) "There were 2 NA??s in columns 6-10."
> (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in
the
> 1st and 3rd position in rows 6-10)
>
> So far, I know how to calculate the rowSums for the data.frame, but I 
> don??t know how to condition these on the values of columns 6-10
>
> rowSums(data[,1:5]) #that??s straightforward
> apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine
>
> But I don??t know how to select just the desired values of columns 1-5 (as 
> described above)
tmp <- rowSums(data[apply(data[,6:19],1,function(x) sum(is.na(x)))==0,1:5])

Now, tmp contains only the rowsums for the rows with no NAs in the other 
columns.

Sean

Liaw, Andy

2005-May-02 15:06 UTC

head link

[R] "apply" question

Try:
> ## Number of NAs in columns 6-10.
> colSums(is.na(data[6:10])) Col6  Col7  Col8  Col9 Col10 
    1     1     1     1     0 > 
> ## Number of NAs in each row of columns 6-10.
> rowSums(is.na(data[6:10]))1 2 
2 2 > 
> ## Sums of rows 1-5 omitting corresponding NAs in cols 6-10.
> rowSums(data[,1:5] * !is.na(data[,6:10]))1 2 
7 9 

If all entries are numeric, it'd be easier to use matrices instead of data
frames.

HTH,
Andy
> From: Christoph Scherber
> 
> Dear R users,
> 
> I??ve got a simple question but somehow I can??t find the solution:
> 
> I have a data frame with columns 1-5 containing one set of integer 
> values, and columns 6-10 containing another set of integer values. 
> Columns 6-10 contain NA??s at some places.
> 
> I now want to calculate
> (1) the number of values in each row of columns 6-10 that were NA??s
> (2) the sum of all values on columns 1-5 for which there were 
> no missing 
> values in the corresponding cells of columns 6-10.
> 
> 
> Example: (let??s call the data frame "data")
> 
> Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
> 1      2      5      2      3      NA      5      NA    1      4
> 3      1      4      5      2      6      NA      4     NA     1
> 
> The result would then be (for the first row)
> (1) "There were 2 NA??s in columns 6-10."
> (2) The mean of Columns 1-5 was 2+2+3=7" (because there were 
> NA??s in the 
> 1st and 3rd position in rows 6-10)
> 
> So far, I know how to calculate the rowSums for the data.frame, but I 
> don??t know how to condition these on the values of columns 6-10
> 
> rowSums(data[,1:5]) #that??s straightforward
> apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine
> 
> But I don??t know how to select just the desired values of columns 1-5 
> (as described above)
> 
> 
> Can anyone help me? Thanks a lot in advance!
> 
> Best regards
> Christoph
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Dimitris Rizopoulos

2005-May-02 15:13 UTC

head link

[R] "apply" question

you could try something like this:

dat <- rbind(c(1, 2, 5, 2, 3, NA, 5, NA, 1, 4),
             c(3, 1, 4, 5, 2, 6, NA, 4, NA, 1))
##########
# (1)
rowSums(is.na(dat[, 6:10]))

## (2)
dat. <- dat[, 1:5]
dat.[is.na(dat[, 6:10])] <- NA
rowSums(dat., na.rm=TRUE)
rowMeans(dat., na.rm=TRUE)


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Christoph Scherber" <Christoph.Scherber at uni-jena.de>
To: <r-help at stat.math.ethz.ch>
Sent: Monday, May 02, 2005 4:52 PM
Subject: [R] "apply" question

> Dear R users,
>
> I??ve got a simple question but somehow I can??t find the solution:
>
> I have a data frame with columns 1-5 containing one set of integer 
> values, and columns 6-10 containing another set of integer values. 
> Columns 6-10 contain NA??s at some places.
>
> I now want to calculate
> (1) the number of values in each row of columns 6-10 that were NA??s
> (2) the sum of all values on columns 1-5 for which there were no 
> missing values in the corresponding cells of columns 6-10.
>
>
> Example: (let??s call the data frame "data")
>
> Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
> 1      2      5      2      3      NA      5      NA    1      4
> 3      1      4      5      2      6      NA      4     NA     1
>
> The result would then be (for the first row)
> (1) "There were 2 NA??s in columns 6-10."
> (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in 
> the 1st and 3rd position in rows 6-10)
>
> So far, I know how to calculate the rowSums for the data.frame, but 
> I don??t know how to condition these on the values of columns 6-10
>
> rowSums(data[,1:5]) #that??s straightforward
> apply(data[,6:19],1,function(x)sum(is.na(x))) #this also works fine
>
> But I don??t know how to select just the desired values of columns 
> 1-5 (as described above)
>
>
> Can anyone help me? Thanks a lot in advance!
>
> Best regards
> Christoph
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Gabor Grothendieck

2005-May-02 15:19 UTC

head link

[R] "apply" question

On 5/2/05, Christoph Scherber <Christoph.Scherber at uni-jena.de>
wrote:> Dear R users,
> 
> I??ve got a simple question but somehow I can??t find the solution:
> 
> I have a data frame with columns 1-5 containing one set of integer
> values, and columns 6-10 containing another set of integer values.
> Columns 6-10 contain NA??s at some places.
> 
> I now want to calculate
> (1) the number of values in each row of columns 6-10 that were NA??s
Supposing our data is called DF,

rowSums(!is.na(DF[,6:10]))
> (2) the sum of all values on columns 1-5 for which there were no missing
> values in the corresponding cells of columns 6-10.
In the expression below 1 + 0 *DF[,6:10] is like DF[,6:10] except
all non-NAs are replaced by 1.  Multiplying DF[,1:5] by that
effectively replaces each element in DF[,1:5] with an NA if
the corresponding DF[,6:10] contained an NA.

rowSums( DF[,1:5] * (1 + 0 * DF[,6:10]), na.rm = TRUE )
> 
> Example: (let??s call the data frame "data")
> 
> Col1   Col2   Col3   Col4   Col5   Col6   Col7   Col8   Col9   Col10
> 1      2      5      2      3      NA      5      NA    1      4
> 3      1      4      5      2      6      NA      4     NA     1
> 
> The result would then be (for the first row)
> (1) "There were 2 NA??s in columns 6-10."
> (2) The mean of Columns 1-5 was 2+2+3=7" (because there were NA??s in
the
> 1st and 3rd position in rows 6-10)
I guess you meant sum when you referred to mean in (2).  If you really
do want the mean replace rowSums with rowMeans in the expression
given above in the answer to (2).

Possibly Parallel Threads

Search for more reasonably related threads

R help - May 2005 - "apply" question

[R] "apply" question

[R] "apply" question

[R] "apply" question

[R] "apply" question

[R] "apply" question

Possibly Parallel Threads