thr3ads.net - R help - [R] query about counting rows of a dataframe [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Stefano Sofia

2011-Nov-03 16:28 UTC

[R] query about counting rows of a dataframe

Dear R users,
I have got the following data frame, called my_df:

   gender day_birth month_birth year_birth labour
1           F             22                  10           2001          1
2           M            29                  10           2001          2
3           M              1                   11          2001          1
4           F               3                  11           2001          1
5           M              3                  11           2001          2
6           F              4                   11           2001          1
7           F              4                   11           2001          2
8           F              5                   12           2001          2
9           M           22                   14           2001          2
10         F           29                   13           2001          2
...

I need to count data in different ways:

1. count the births for each day (having 0 when necessary) independently from
the value of the "labour" column

2. count the births for each day (having 0 when necessary), divided by the value
of "labour" (which can have two valuers, 1 or 2)

3. count the births for each day of all the years (i.e. the 22nd of October of
all the years present in the data frame) independently from the value of
"labour"

4. count the births for each day of all the years (i.e. the 22nd of October of
all the years present in the data frame), divided by the value of
"labour"

I tried with the command

table(my_df$year_birth, my_df$month_birth, my_df$day_birth)

which satisfies (partially) question numer 1 (I am not able to have 0 in the not
available days).

Is there a smart way to do that without invoking too many loops?

thank you for your help
Stefano Sofia


AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere
informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla
ricezione. I messaggi di posta elettronica per i client di Regione Marche
possono contenere informazioni confidenziali e con privilegi legali. Se non si ?
il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo
messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente
ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art.
6 della  DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la
risposta al presente messaggio di posta elettronica pu? essere visionata da
persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons
entitled to receive the confidential information it may contain. E-mail messages
to clients of Regione Marche may contain information that is confidential and
legally privileged. Please do not read, copy, forward, or store this message
unless you are an intended recipient of it. If you have received this message in
error, please forward it to the sender and delete it completely from your
computer system.

David Winsemius

2011-Nov-03 21:40 UTC

head link

[R] query about counting rows of a dataframe

On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:
> Dear R users,
> I have got the following data frame, called my_df:
>
>   gender day_birth month_birth year_birth labour
> 1           F             22                  10            
> 2001          1
> 2           M            29                  10            
> 2001          2
> 3           M              1                   11           
> 2001          1
> 4           F               3                  11            
> 2001          1
> 5           M              3                  11            
> 2001          2
> 6           F              4                   11            
> 2001          1
> 7           F              4                   11            
> 2001          2
> 8           F              5                   12            
> 2001          2
> 9           M           22                   14            
> 2001          2
> 10         F           29                   13            
> 2001          2
> ...
>
> I need to count data in different ways:
>
> 1. count the births for each day (having 0 when necessary)  
> independently from the value of the "labour" column
xtabs sometimes give better results. If you want all 31 days then make  
day_birth a factor with levels=1:31)

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0
>
> 2. count the births for each day (having 0 when necessary), divided  
> by the value of "labour" (which can have two valuers, 1 or 2)
Cannot figure out what is being asked here. What to do with the two  
values? Just count them? This would give a partitioned count

 > xtabs( labour==1 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  0  0  0
        22  1  0  0  0  0
        29  0  0  0  0  0
 > xtabs( labour==2 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  0  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  1  0  0
        22  0  0  0  0  1
        29  1  0  0  1  0

>
> 3. count the births for each day of all the years (i.e. the 22nd of  
> October of all the years present in the data frame) independently  
> from the value of "labour"
If I understand correctly:

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0
>
> 4. count the births for each day of all the years (i.e. the 22nd of  
> October of all the years present in the data frame), divided by the  
> value of "labour"
Again confusing. Do you mean to use separate tables for labour==1 and  
labour==2? Perhaps context to explain what these values represent.  
Some of us are "concrete". The results of xtabs are tables and can be
divided like matrices.
>
> I tried with the command
>
> table(my_df$year_birth, my_df$month_birth, my_df$day_birth)
>
> which satisfies (partially) question numer 1 (I am not able to have  
> 0 in the not available days).
>
> Is there a smart way to do that without invoking too many loops?
>
> thank you for your help

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Jean V Adams

2011-Nov-03 21:54 UTC

head link

[R] query about counting rows of a dataframe

Stefano Sofia wrote on 11/03/2011 11:28:08 AM:> 
> Dear R users,
> I have got the following data frame, called my_df:
> 
>    gender day_birth month_birth year_birth labour
> 1           F             22                  10           2001 1
> 2           M            29                  10           2001 2
> 3           M              1                   11          2001 1
> 4           F               3                  11           2001  1
> 5           M              3                  11           2001 2
> 6           F              4                   11           2001  1
> 7           F              4                   11           2001  2
> 8           F              5                   12           2001  2
> 9           M           22                   14           2001 2
> 10         F           29                   13           2001          2
> ...
> 
> I need to count data in different ways:
> 
> 1. count the births for each day (having 0 when necessary) 
> independently from the value of the "labour" column
> 
> 2. count the births for each day (having 0 when necessary), divided 
> by the value of "labour" (which can have two valuers, 1 or 2)
> 
> 3. count the births for each day of all the years (i.e. the 22nd of 
> October of all the years present in the data frame) independently 
> from the value of "labour"
> 
> 4. count the births for each day of all the years (i.e. the 22nd of 
> October of all the years present in the data frame), divided by the 
> value of "labour"
> 
> I tried with the command
> 
> table(my_df$year_birth, my_df$month_birth, my_df$day_birth)
> 
> which satisfies (partially) question numer 1 (I am not able to have 
> 0 in the not available days).
> 
> Is there a smart way to do that without invoking too many loops?
> 
> thank you for your help
> Stefano Sofia
> 
I'm having a hard time understanding what you're trying to calculate. 
Can
you show us what the results would look like from the example data you 
shared?

Jean
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more reasonably related threads

R help - Nov 2011 - query about counting rows of a dataframe

[R] query about counting rows of a dataframe

[R] query about counting rows of a dataframe

[R] query about counting rows of a dataframe

Possibly Parallel Threads