Dear R users, I have got the following data frame, called my_df: gender day_birth month_birth year_birth labour 1 F 22 10 2001 1 2 M 29 10 2001 2 3 M 1 11 2001 1 4 F 3 11 2001 1 5 M 3 11 2001 2 6 F 4 11 2001 1 7 F 4 11 2001 2 8 F 5 12 2001 2 9 M 22 14 2001 2 10 F 29 13 2001 2 ... I need to count data in different ways: 1. count the births for each day (having 0 when necessary) independently from the value of the "labour" column 2. count the births for each day (having 0 when necessary), divided by the value of "labour" (which can have two valuers, 1 or 2) 3. count the births for each day of all the years (i.e. the 22nd of October of all the years present in the data frame) independently from the value of "labour" 4. count the births for each day of all the years (i.e. the 22nd of October of all the years present in the data frame), divided by the value of "labour" I tried with the command table(my_df$year_birth, my_df$month_birth, my_df$day_birth) which satisfies (partially) question numer 1 (I am not able to have 0 in the not available days). Is there a smart way to do that without invoking too many loops? thank you for your help Stefano Sofia AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system.
On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:> Dear R users, > I have got the following data frame, called my_df: > > gender day_birth month_birth year_birth labour > 1 F 22 10 > 2001 1 > 2 M 29 10 > 2001 2 > 3 M 1 11 > 2001 1 > 4 F 3 11 > 2001 1 > 5 M 3 11 > 2001 2 > 6 F 4 11 > 2001 1 > 7 F 4 11 > 2001 2 > 8 F 5 12 > 2001 2 > 9 M 22 14 > 2001 2 > 10 F 29 13 > 2001 2 > ... > > I need to count data in different ways: > > 1. count the births for each day (having 0 when necessary) > independently from the value of the "labour" columnxtabs sometimes give better results. If you want all 31 days then make day_birth a factor with levels=1:31) > xtabs( ~ day_birth + month_birth + year_birth, data=dat) , , year_birth = 2001 month_birth day_birth 10 11 12 13 14 1 0 1 0 0 0 3 0 2 0 0 0 4 0 2 0 0 0 5 0 0 1 0 0 22 1 0 0 0 1 29 1 0 0 1 0> > 2. count the births for each day (having 0 when necessary), divided > by the value of "labour" (which can have two valuers, 1 or 2)Cannot figure out what is being asked here. What to do with the two values? Just count them? This would give a partitioned count > xtabs( labour==1 ~ day_birth + month_birth , data=dat) month_birth day_birth 10 11 12 13 14 1 0 1 0 0 0 3 0 1 0 0 0 4 0 1 0 0 0 5 0 0 0 0 0 22 1 0 0 0 0 29 0 0 0 0 0 > xtabs( labour==2 ~ day_birth + month_birth , data=dat) month_birth day_birth 10 11 12 13 14 1 0 0 0 0 0 3 0 1 0 0 0 4 0 1 0 0 0 5 0 0 1 0 0 22 0 0 0 0 1 29 1 0 0 1 0> > 3. count the births for each day of all the years (i.e. the 22nd of > October of all the years present in the data frame) independently > from the value of "labour"If I understand correctly: > xtabs( ~ day_birth + month_birth + year_birth, data=dat) , , year_birth = 2001 month_birth day_birth 10 11 12 13 14 1 0 1 0 0 0 3 0 2 0 0 0 4 0 2 0 0 0 5 0 0 1 0 0 22 1 0 0 0 1 29 1 0 0 1 0> > 4. count the births for each day of all the years (i.e. the 22nd of > October of all the years present in the data frame), divided by the > value of "labour"Again confusing. Do you mean to use separate tables for labour==1 and labour==2? Perhaps context to explain what these values represent. Some of us are "concrete". The results of xtabs are tables and can be divided like matrices.> > I tried with the command > > table(my_df$year_birth, my_df$month_birth, my_df$day_birth) > > which satisfies (partially) question numer 1 (I am not able to have > 0 in the not available days). > > Is there a smart way to do that without invoking too many loops? > > thank you for your helpDavid Winsemius, MD Heritage Laboratories West Hartford, CT
Stefano Sofia wrote on 11/03/2011 11:28:08 AM:> > Dear R users, > I have got the following data frame, called my_df: > > gender day_birth month_birth year_birth labour > 1 F 22 10 2001 1 > 2 M 29 10 2001 2 > 3 M 1 11 2001 1 > 4 F 3 11 2001 1 > 5 M 3 11 2001 2 > 6 F 4 11 2001 1 > 7 F 4 11 2001 2 > 8 F 5 12 2001 2 > 9 M 22 14 2001 2 > 10 F 29 13 2001 2 > ... > > I need to count data in different ways: > > 1. count the births for each day (having 0 when necessary) > independently from the value of the "labour" column > > 2. count the births for each day (having 0 when necessary), divided > by the value of "labour" (which can have two valuers, 1 or 2) > > 3. count the births for each day of all the years (i.e. the 22nd of > October of all the years present in the data frame) independently > from the value of "labour" > > 4. count the births for each day of all the years (i.e. the 22nd of > October of all the years present in the data frame), divided by the > value of "labour" > > I tried with the command > > table(my_df$year_birth, my_df$month_birth, my_df$day_birth) > > which satisfies (partially) question numer 1 (I am not able to have > 0 in the not available days). > > Is there a smart way to do that without invoking too many loops? > > thank you for your help > Stefano Sofia >I'm having a hard time understanding what you're trying to calculate. Can you show us what the results would look like from the example data you shared? Jean [[alternative HTML version deleted]]