thr3ads.net - R help - [R] Using sapply to build a count matrix [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Murray Cooper

2009-Jul-02 02:15 UTC

[R] Using sapply to build a count matrix

Dear All,

I am new to R and slowly learning how to use the system.

The following code is an exercise I was trying.
The intent is to generate 10 random samples of size 5 from
a vector with integers 1:10 and 2 missing values. I then want
to generate a matrix, for each sample which shows the frequency
of missing values (NA) in each sample. My solution, using sapply
is at the end.

If anyone has the time and/or intrest to critique my method I'd
be very grateful. I'm especially interested in knowing if there is
a better way to accomplish this problem.
> (x<-replicate(10,sample(c(1:10,rep(NA,2)),5)))     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3   NA    3    4    2   10   NA    4    5     4
[2,]    5    7    7    3    9    2    8   NA    7     9
[3,]   NA    8    1    5   NA    7   10    2   NA     6
[4,]    2   NA    6   10    8    4    4    7    4     7
[5,]    7    9   10    8    3    6    1   NA    9    NA> # Since table will return only a single item of vaule FALSE
> # if there are no missing values (NA) in a sample, sapply
> # will return a list and not a matrix.
> # So to get a matrix, the factor function needs to be used
> # to identify possible results (FALSE, TRUE) for the table
> # function.
> sapply(1:10,function(i) table(factor(is.na(x[,i]),c(FALSE,TRUE))))      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
FALSE    4    3    5    5    4    5    4    3    4     4
TRUE     1    2    0    0    1    0    1    2    1     1> 
Thanks for your thoughts.

Murray M Cooper, Ph.D.
Richland Statistics
9800 N 24th St
Richland, MI, USA 49083
Mail: richstat at earthlink.net

Jorge Ivan Velez

2009-Jul-02 02:29 UTC

head link

[R] Using sapply to build a count matrix

Dear Murray,
Here is one way:  create a function that takes k sample()s from any vector
(e.g., x) and then calculates the number of NA values in it. Then replicate
the procedure as many times as you want.

# The function
 foo <- function(x, k = 5){
                   xsample <- sample(x, k)
                   sum( is.na(xsample) )
                   }

# Vector of data
y <- c(1:10, NA, NA)

# The replication (10 replicates are used)
#  replicate(10, foo(y) )
#  [1] 1 2 1 1 0 0 1 0 1 1

HTH,

Jorge


On Wed, Jul 1, 2009 at 10:15 PM, Murray Cooper
<myrmail@earthlink.net>wrote:
> Dear All,
>
> I am new to R and slowly learning how to use the system.
>
> The following code is an exercise I was trying.
> The intent is to generate 10 random samples of size 5 from
> a vector with integers 1:10 and 2 missing values. I then want
> to generate a matrix, for each sample which shows the frequency
> of missing values (NA) in each sample. My solution, using sapply
> is at the end.
>
> If anyone has the time and/or intrest to critique my method I'd
> be very grateful. I'm especially interested in knowing if there is
> a better way to accomplish this problem.
>
>  (x<-replicate(10,sample(c(1:10,rep(NA,2)),5)))
>>
>    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]    3   NA    3    4    2   10   NA    4    5     4
> [2,]    5    7    7    3    9    2    8   NA    7     9
> [3,]   NA    8    1    5   NA    7   10    2   NA     6
> [4,]    2   NA    6   10    8    4    4    7    4     7
> [5,]    7    9   10    8    3    6    1   NA    9    NA
>
>> # Since table will return only a single item of vaule FALSE
>> # if there are no missing values (NA) in a sample, sapply
>> # will return a list and not a matrix.
>> # So to get a matrix, the factor function needs to be used
>> # to identify possible results (FALSE, TRUE) for the table
>> # function.
>> sapply(1:10,function(i) table(factor(is.na(x[,i]),c(FALSE,TRUE))))
>>
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> FALSE    4    3    5    5    4    5    4    3    4     4
> TRUE     1    2    0    0    1    0    1    2    1     1
>
>>
>>
> Thanks for your thoughts.
>
> Murray M Cooper, Ph.D.
> Richland Statistics
> 9800 N 24th St
> Richland, MI, USA 49083
> Mail: richstat@earthlink.net
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Marc Schwartz

2009-Jul-02 02:38 UTC

head link

[R] Using sapply to build a count matrix

On Jul 1, 2009, at 9:15 PM, Murray Cooper wrote:
> Dear All,
>
> I am new to R and slowly learning how to use the system.
>
> The following code is an exercise I was trying.
> The intent is to generate 10 random samples of size 5 from
> a vector with integers 1:10 and 2 missing values. I then want
> to generate a matrix, for each sample which shows the frequency
> of missing values (NA) in each sample. My solution, using sapply
> is at the end.
>
> If anyone has the time and/or intrest to critique my method I'd
> be very grateful. I'm especially interested in knowing if there is
> a better way to accomplish this problem.
>
>> (x<-replicate(10,sample(c(1:10,rep(NA,2)),5)))
>    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]    3   NA    3    4    2   10   NA    4    5     4
> [2,]    5    7    7    3    9    2    8   NA    7     9
> [3,]   NA    8    1    5   NA    7   10    2   NA     6
> [4,]    2   NA    6   10    8    4    4    7    4     7
> [5,]    7    9   10    8    3    6    1   NA    9    NA
>> # Since table will return only a single item of vaule FALSE
>> # if there are no missing values (NA) in a sample, sapply
>> # will return a list and not a matrix.
>> # So to get a matrix, the factor function needs to be used
>> # to identify possible results (FALSE, TRUE) for the table
>> # function.
>> sapply(1:10,function(i) table(factor(is.na(x[,i]),c(FALSE,TRUE))))
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> FALSE    4    3    5    5    4    5    4    3    4     4
> TRUE     1    2    0    0    1    0    1    2    1     1
>
> Thanks for your thoughts.

Murray, if I correctly understand what you want as an end result, then:

 > x
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3   NA    3    4    2   10   NA    4    5     4
[2,]    5    7    7    3    9    2    8   NA    7     9
[3,]   NA    8    1    5   NA    7   10    2   NA     6
[4,]    2   NA    6   10    8    4    4    7    4     7
[5,]    7    9   10    8    3    6    1   NA    9    NA


 > colSums(is.na(x))
  [1] 1 2 0 0 1 0 1 2 1 1


To take that in stages:

 > is.na(x)
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
[1,] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
[3,]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
[4,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE


The above gives you either TRUE or FALSE at each position in the  
matrix. TRUE if the value is NA.

The colSums() function is optimized for speed using C code, to  
calculate the sum of the values in each column. Since a TRUE is equal  
to 1 and a FALSE is equal to 0, using colSums() on the above  
intermediate step, gives you a column by column count of the NA values  
in each.

 > as.numeric(TRUE)
[1] 1

 > as.numeric(FALSE)
[1] 0


See ?colSums for more information and the sister function rowSums().

HTH,

Marc Schwartz

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jul 2009 - Using sapply to build a count matrix

[R] Using sapply to build a count matrix

[R] Using sapply to build a count matrix

[R] Using sapply to build a count matrix

Possibly Parallel Threads