thr3ads.net - R help - [R] persuade tabulate function to count NAs in a data frame [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Bodnar Laszlo EB_HU

2011-Mar-19 14:58 UTC

[R] persuade tabulate function to count NAs in a data frame

Hi,

I'd like to ask you a question again. It is basically about data frames, NAs
and tabulate function.

I have this data frame. I already used this in one of the previous questions of
mine. It intentionally looks this simple, my real 'df' dataframe is much
bigger actually and again, I am not willing to annoy anyone with huge
databases... So, my database:

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
df <-data.frame(id,a,b,c,d,e)
df

I have managed to calculate the distributions of the numbers occurring in
columns 'b' to 'e' but considering the fact at the very same
time that these distributions should be 'groupped by' the id numbers in
column 'id'. It works fine, check it ->

matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2]))))
[[1]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3]))))
[[2]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4]))))
[[3]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5]))))
[[4]])),ncol=3,nrow=3,byrow=TRUE)
matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6]))))
[[5]])),ncol=4,nrow=3,byrow=TRUE)

Now my problem is: what if my data frame contains NA values here and there and
what if I want my in-built tabulate function to collect these NAs as well? So
what if I want it to count how many occurrences I have from these NAs?

Here's my modified data frame with the NAs:
id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4)
df <-data.frame(id,a,b,c,d,e)
df

At first I tried something like this (you see, the only thing I did was that I
tried to apply this "exclude=NULL" thing).
unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL)))) [[1]])

At least my code realizes the fact that I have 4 different levels in column
'a' (1,2,3,NA) and not only three (1,2,3). Check it here:
nlevels(factor(df[,2],exclude=NULL))

But you see in the result that somehow it could not calculate the NAs. It says
3  0  6  0(!)  4  3  3  0  4  1  5  0

Instead of the correct:
3  0  6  1(!)  4  3  3  0  4  1  5  0

Or in case of:
unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL)))) [[3]])

It says
2  4  4  0  2  3  4  0(!)  1  5  4  0

Instead of the correct
2  4  4  0  2  3  4  1(!)  1  5  4  0
etc.

Does someone have any ideas how to "persuade" the function tabulate to
count NAs? Is it possible at all?
Thanks very much and have a pleasant weekend,
Laszlo

____________________________________________________________________________________________________
Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy
jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben
nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása,
másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan
tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse
az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az
információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint
semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ
felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul,
hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos
munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}

Gavin Simpson

2011-Mar-19 15:28 UTC

head link

[R] persuade tabulate function to count NAs in a data frame

On Sat, 2011-03-19 at 15:58 +0100, Bodnar Laszlo EB_HU
wrote:> Hi,
I'll top-post as the original Q is very lengthy:

tabs <-lapply(df[,2:6], 
              function(x, id){ t(table(addNA(x), id, useNA = "ifany"))
}, df$id)

is one way of doing what you want. More details are here:

http://stackoverflow.com/questions/5362702/persuading-tabulate-function-to-count-nas-in-a-data-frame-in-r

where you also posted your Q.

HTH

G

> I'd like to ask you a question again. It is basically about data
frames, NAs and tabulate function.
> 
> I have this data frame. I already used this in one of the previous
questions of mine. It intentionally looks this simple, my real 'df'
dataframe is much bigger actually and again, I am not willing to annoy anyone
with huge databases... So, my database:
> 
> id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
> a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
> b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
> c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
> d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
> e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
> df <-data.frame(id,a,b,c,d,e)
> df
> 
> I have managed to calculate the distributions of the numbers occurring in
columns 'b' to 'e' but considering the fact at the very same
time that these distributions should be 'groupped by' the id numbers in
column 'id'. It works fine, check it ->
> 
> matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2]))))
[[1]])),ncol=3,nrow=3,byrow=TRUE)
> matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3]))))
[[2]])),ncol=3,nrow=3,byrow=TRUE)
> matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4]))))
[[3]])),ncol=3,nrow=3,byrow=TRUE)
> matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5]))))
[[4]])),ncol=3,nrow=3,byrow=TRUE)
> matrix(matrix(unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6]))))
[[5]])),ncol=4,nrow=3,byrow=TRUE)
> 
> Now my problem is: what if my data frame contains NA values here and there
and what if I want my in-built tabulate function to collect these NAs as well?
So what if I want it to count how many occurrences I have from these NAs?
> 
> Here's my modified data frame with the NAs:
> id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
> a <-c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
> b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
> c <-c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
> d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
> e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4)
> df <-data.frame(id,a,b,c,d,e)
> df
> 
> At first I tried something like this (you see, the only thing I did was
that I tried to apply this "exclude=NULL" thing).
> unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL)))) [[1]])
> 
> At least my code realizes the fact that I have 4 different levels in column
'a' (1,2,3,NA) and not only three (1,2,3). Check it here:
> nlevels(factor(df[,2],exclude=NULL))
> 
> But you see in the result that somehow it could not calculate the NAs. It
says
> 3  0  6  0(!)  4  3  3  0  4  1  5  0
> 
> Instead of the correct:
> 3  0  6  1(!)  4  3  3  0  4  1  5  0
> 
> Or in case of:
> unlist(lapply(df[,(-(1))],function(x)
tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL)))) [[3]])
> 
> It says
> 2  4  4  0  2  3  4  0(!)  1  5  4  0
> 
> Instead of the correct
> 2  4  4  0  2  3  4  1(!)  1  5  4  0
> etc.
> 
> Does someone have any ideas how to "persuade" the function
tabulate to count NAs? Is it possible at all?
> Thanks very much and have a pleasant weekend,
> Laszlo
> 
>
____________________________________________________________________________________________________
> Ez az e-mail ?s az ?sszes hozz? tartoz? csatolt mell?klet titkos ?s/vagy
jogilag, szakmailag vagy m?s m?don v?dett inform?ci?t tartalmazhat. Amennyiben
nem ?n a lev?l c?mzettje akkor a lev?l tartalm?nak k?zl?se, reproduk?l?sa,
m?sol?sa, vagy egy?b m?s ?ton t?rt?n? terjeszt?se, felhaszn?l?sa szigor?an
tilos. Amennyiben t?ved?sb?l kapta meg ezt az ?zenetet k?rj?k azonnal ?rtes?tse
az ?zenet k?ld?j?t. Az Erste Bank Hungary Zrt. (EBH) nem v?llal felel?ss?get az
inform?ci? teljes ?s pontos - c?mzett(ek)hez t?rt?n? - eljuttat?s??rt, valamint
semmilyen k?s?s?rt, kapcsolat megszakad?sb?l ered? hib??rt, vagy az inform?ci?
felhaszn?l?s?b?l vagy annak megb?zhatatlans?g?b?l ered? k?r?rt.
> 
> Az ?zenetek EBH-n k?v?li k?ld?je vagy c?mzettje tudom?sul veszi ?s
hozz?j?rul, hogy az ?zenetekhez m?s banki alkalmazott is hozz?f?rhet az EBH
folytonos munkamenet?nek biztos?t?sa ?rdek?ben.
> 
> 
> This e-mail and any attached files are confidential and/...{{dropped:19}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Jim Lemon

2011-Mar-19 21:42 UTC

head link

[R] persuade tabulate function to count NAs in a data frame

On 03/20/2011 01:58 AM, Bodnar Laszlo EB_HU wrote:> Hi,
>
> I'd like to ask you a question again. It is basically about data
frames, NAs and tabulate function.
>Hi Bodnar,
The "freq" function in the prettyR package might do what you want.

Jim

Possibly Parallel Threads

Search for more reasonably related threads

R help - Mar 2011 - persuade tabulate function to count NAs in a data frame

[R] persuade tabulate function to count NAs in a data frame

[R] persuade tabulate function to count NAs in a data frame

[R] persuade tabulate function to count NAs in a data frame

Possibly Parallel Threads