thr3ads.net - R help - [R] R - Populate Another Variable Based on Multiple Conditions

If this information is useful, please help other people find it:
Share via:

Jeff Newmiller

2016-Jul-03 19:09 UTC

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Typo on the second line

result <- (   result0 
          %>% select( -admin_period1 )
          %>% inner_join( result0 %>% select( ID, admin_period1, end=start
)
                       , by = c( ID="ID", admin_period
="admin_period1" )
                        )
          %>% mutate( ddays = end - start )
          )
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 11:55:14 AM PDT, Kevin Wamae <KWamae at
kemri-wellcome.org> wrote:>Hi Jeff, ?likes its Excel?, I don?t follow. Pardon me for any mix up.
>
>Thanks for the code.  After running it, this is the error I get.
>
>Error: cannot join on columns 'admin_period' x
'admin_period1': index
>out of bounds
>
>Regards
>-------------------------------------------------------------------------------
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 9:34 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>
>I still get the impression from your mixing of information types that
>you are thinking like this is Excel.
>
>Perhaps something like
>
>drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin,
>drug_study$ID, FUN=cumsum )
>library(dplyr)
>result0 <- (   drug_study
>          %>% filter( 0 != admin_period )
>          %>% group_by( ID, admin_period )
>          %>% summarise( start = min( date ) )
>          %>% mutate( admin_period1 = admin_period -1 )
>          )
>result <- (   result0 
>          %>% select( -admin_period )
>     %>% inner_join( result0 %>% select( ID, admin_period1, end=start
)
>                     , by = c( ID="ID", admin_period
="admin_period1" )
>                        )
>          %>% mutate( ddays = end - start )
>          )
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
><KWamae at kemri-wellcome.org> wrote:
>>HI Jeff, it?s been an uphill task working with the dataset and I am
>not
>>the first to complain. Nonetheless, data-cleaning is ongoing and since
>>I cannot wait for that to get done, I decided to make the most of what
>>the dataset looks like at this time. It appears the process may take a
>>while.
>>
>>Thanks for the script. From the output, I noticed that ?result?
>>contains the first and last date for each of the individuals and not
>>taking into account the variable ?drug-admin?. 
>>
>>ID	    start		end
>>J1/3	    1/5/09	12/25/10
>>R1/3	    1/4/07	12/15/08
>>R10/1	    1/4/07	3/5/12
>>
>>My aim is to pick the date, for example in 2007, where drug-admin
=>>?Y? as my start and the date in the subsequent year (2008 in this
>case)
>>where drug-admin == ?Y? as my end. Then, I should populate the
>variable
>>?study_id? with ?start? up to the entry just above the one whose date
>>matches ?end?, as the output below shows (I hope its structure is
>>maintained as I have copied it from R-Studio). The goal for now is to
>>then get difference in days between ?date? and ?study_id? and still
>get
>>to keep that column for ?study_id? as I might use it later.
>>
>>From the output, it can be seen that for this individual, the dates
>run
>>from 2007 to 2008. However, for some individuals, the dates run from
>>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>>deal with all the years as the dates range from 2001-2016
>>
>>ID	date	drug_admin	year	month	study_id
>>R1/3	5/11/07	Y	2007	5	5/11/07
>>R1/3	5/16/07		2007	5	5/11/07
>>R1/3	5/22/07		2007	5	5/11/07
>>R1/3	5/28/07		2007	5	5/11/07
>>R1/3	6/5/07			2007	6	5/11/07
>>R1/3	6/11/07		2007	6	5/11/07
>>R1/3	6/18/07		2007	6	5/11/07
>>R1/3	6/25/07		2007	6	5/11/07
>>R1/3	7/2/07			2007	7	5/11/07
>>R1/3	7/16/07		2007	7	5/11/07
>>R1/3	7/29/07		2007	7	5/11/07
>>R1/3	8/2/07			2007	8	5/11/07
>>R1/3	8/7/07			2007	8	5/11/07
>>R1/3	8/13/07		2007	8	5/11/07
>>R1/3	9/18/07		2007	9	5/11/07
>>R1/3	9/24/07		2007	9	5/11/07
>>R1/3	10/6/07		2007	10	5/11/07
>>R1/3	10/8/07		2007	10	5/11/07
>>R1/3	10/15/07		2007	10	5/11/07
>>R1/3	10/22/07		2007	10	5/11/07
>>R1/3	10/29/07		2007	10	5/11/07
>>R1/3	11/8/07		2007	11	5/11/07
>>R1/3	11/12/07		2007	11	5/11/07
>>R1/3	11/19/07		2007	11	5/11/07
>>R1/3	11/29/07		2007	11	5/11/07
>>R1/3	12/6/07		2007	12	5/11/07
>>R1/3	12/10/07		2007	12	5/11/07
>>R1/3	12/21/07		2007	12	5/11/07
>>R1/3	1/7/08			2008	1	5/11/07
>>R1/3	1/14/08		2008	1	5/11/07
>>R1/3	1/21/08		2008	1	5/11/07
>>R1/3	1/28/08		2008	1	5/11/07
>>R1/3	2/4/08		Y	2008	2	
>>
>>
>>Regards
>>-------------------------------------------------------------------------------
>>Kevin Wame 
>>
>>###############################################################
>>
>>###############################################################
>>
>>
>>
>>On 7/3/16, 7:05 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>>
>>result <- setNames( data.frame( aggregate( date~ID, data=drug_study,
>>FUN=min ),  aggregate( date~ID, data=drug_study, FUN=max )[2] ), c(
>>"ID", "start", "end" ) )
>>
>>
>>______________________________________________________________________
>>
>>This e-mail contains information which is confidential. It is intended
>>only for the use of the named recipient. If you have received this
>>e-mail in error, please let us know by replying to the sender, and
>>immediately delete it from your system.  Please note, that in these
>>circumstances, the use, disclosure, distribution or copying of this
>>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>cannot accept any responsibility for the  accuracy or completeness of
>>this message as it has been transmitted over a public network.
>Although
>>the Programme has taken reasonable precautions to ensure no viruses
>are
>>present in emails, it cannot accept responsibility for any loss or
>>damage arising from the use of the email or attachments. Any views
>>expressed in this message are those of the individual sender, except
>>where the sender specifically states them to be the views of
>>KEMRI-Wellcome Trust Programme.
>>______________________________________________________________________
>
>
>
>
>______________________________________________________________________
>
>This e-mail contains information which is confidential. It is intended
>only for the use of the named recipient. If you have received this
>e-mail in error, please let us know by replying to the sender, and
>immediately delete it from your system.  Please note, that in these
>circumstances, the use, disclosure, distribution or copying of this
>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>cannot accept any responsibility for the  accuracy or completeness of
>this message as it has been transmitted over a public network. Although
>the Programme has taken reasonable precautions to ensure no viruses are
>present in emails, it cannot accept responsibility for any loss or
>damage arising from the use of the email or attachments. Any views
>expressed in this message are those of the individual sender, except
>where the sender specifically states them to be the views of
>KEMRI-Wellcome Trust Programme.
>______________________________________________________________________

Kevin Wamae

2016-Jul-03 19:13 UTC

head link

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Thanks Jeff, let me try it on the larger dataset.

Regards
-------------------------------------------------------------------------------
Kevin Wame 
 

On 7/3/16, 10:09 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:

result <- (   result0 
          %>% select( -admin_period1 )
          %>% inner_join( result0 %>% select( ID, admin_period1, end=start
)
                       , by = c( ID="ID", admin_period
="admin_period1" )
                        )
          %>% mutate( ddays = end - start )
          )


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for
the use of the named recipient. If you have received this e-mail in error,
please let us know by replying to the sender, and immediately delete it from
your system.  Please note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly prohibited.
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the 
accuracy or completeness of this message as it has been transmitted over a
public network. Although the Programme has taken reasonable precautions to
ensure no viruses are present in emails, it cannot accept responsibility for any
loss or damage arising from the use of the email or attachments. Any views
expressed in this message are those of the individual sender, except where the
sender specifically states them to be the views of KEMRI-Wellcome Trust
Programme.
______________________________________________________________________

Kevin Wamae

2016-Jul-03 20:08 UTC

head link

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

Hi Jeff, It works on well on a dataset with 100000 rows and I figure it will
work well with the ?real? dataset. You?ve been of great help and I am starting
to make headway.

It creates a new dataframe (result), as shown below that doesn?t quite have the
result as I would want it.

ID	admin_period	start	end	ddays
J1/3	1	5/11/07	8/13/07	94
J1/3	2	8/13/07	11/12/07	91
J1/3	3	11/12/07	2/4/08	           84
J1/3	4	2/4/08	            5/5/08	            91
J1/3	5	5/5/08	             5/4/09            364
J1/3	6	5/4/09	             5/17/10	378
J1/3	7	5/17/10	5/16/11	364
J10/1	1	5/11/07	8/13/07	94
J10/1	2	8/13/07	11/12/07	91
J10/1	3	11/12/07	2/4/08	            84
J10/1	4	2/4/08	              5/5/08	91
J10/1	5	5/5/08	              5/8/09	368
J10/1	6	5/8/09	             5/17/10	374
J10/1	7	5/17/10	5/16/11	364
J102/1	1	5/15/07	8/15/07	92
J102/1	2	8/15/07	11/13/07	90
J102/1	3	11/13/07	2/5/08	           84
J102/1	4	2/5/08	              5/6/08	91
J102/1	5	5/6/08	              5/5/09	364
J102/1	6	5/5/09	              5/19/10	379

My supervisor doesn?t want me to create a new dataset, she?s afraid I might lose
some data?I cannot fight that.

Like you mentioned earlier, I might be mixing up things which I think is what
you alluded to earlier.

After consultation with my supervisor, this is what we?ve agreed. For every
individual, given the start and end date, create a new column (say, diff_days)
and for every row that falls within the range of start and end_date, get the
difference between the date in that row and start date and add it to the
diff_days column. Below is an example of the result. As it can be seen 5/11/2007
is the start while 2/4/2008 is the end. The diff_days has been populated
excluding the end date and that is because that is the start of the study in
2008 that will continue into 2009 and thus from 2/4/2008, I should compute
diff_days till 2009 and so no (I hope this makes sense).

ID	date	drug_admin	year	month	diff_days
R1/3	5/11/2007	Y	2007	5	0
R1/3	5/16/2007		2007	5	6
R1/3	5/22/2007		2007	5	11
R1/3	5/28/2007		2007	5	17
R1/3	1/14/2008		2008	1	248
R1/3	1/21/2008		2008	1	255
R1/3	1/28/2008		2008	1	263
R1/3	2/4/2008	Y	2008	2	


Regards
-------------------------------------------------------------------------------
Kevin Wame 
 

On 7/3/16, 10:09 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:

Typo on the second line

result <- (   result0 
          %>% select( -admin_period1 )
          %>% inner_join( result0 %>% select( ID, admin_period1, end=start
)
                       , by = c( ID="ID", admin_period
="admin_period1" )
                        )
          %>% mutate( ddays = end - start )
          )
-- 
Sent from my phone. Please excuse my brevity.

On July 3, 2016 11:55:14 AM PDT, Kevin Wamae <KWamae at
kemri-wellcome.org> wrote:>Hi Jeff, ?likes its Excel?, I don?t follow. Pardon me for any mix up.
>
>Thanks for the code.  After running it, this is the error I get.
>
>Error: cannot join on columns 'admin_period' x
'admin_period1': index
>out of bounds
>
>Regards
>-------------------------------------------------------------------------------
>Kevin Wame | Ph.D. Student (IDeAL)
>KEMRI-Wellcome Trust Collaborative Research Programme
>Centre for Geographic Medicine Research
>P.O. Box 230-80108, Kilifi, Kenya
> 
>
>On 7/3/16, 9:34 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>
>I still get the impression from your mixing of information types that
>you are thinking like this is Excel.
>
>Perhaps something like
>
>drug_study$admin_period  <- ave( "Y" == drug_study$drug_admin,
>drug_study$ID, FUN=cumsum )
>library(dplyr)
>result0 <- (   drug_study
>          %>% filter( 0 != admin_period )
>          %>% group_by( ID, admin_period )
>          %>% summarise( start = min( date ) )
>          %>% mutate( admin_period1 = admin_period -1 )
>          )
>result <- (   result0 
>          %>% select( -admin_period )
>     %>% inner_join( result0 %>% select( ID, admin_period1, end=start
)
>                     , by = c( ID="ID", admin_period
="admin_period1" )
>                        )
>          %>% mutate( ddays = end - start )
>          )
>-- 
>Sent from my phone. Please excuse my brevity.
>
>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
><KWamae at kemri-wellcome.org> wrote:
>>HI Jeff, it?s been an uphill task working with the dataset and I am
>not
>>the first to complain. Nonetheless, data-cleaning is ongoing and since
>>I cannot wait for that to get done, I decided to make the most of what
>>the dataset looks like at this time. It appears the process may take a
>>while.
>>
>>Thanks for the script. From the output, I noticed that ?result?
>>contains the first and last date for each of the individuals and not
>>taking into account the variable ?drug-admin?. 
>>
>>ID	    start		end
>>J1/3	    1/5/09	12/25/10
>>R1/3	    1/4/07	12/15/08
>>R10/1	    1/4/07	3/5/12
>>
>>My aim is to pick the date, for example in 2007, where drug-admin
=>>?Y? as my start and the date in the subsequent year (2008 in this
>case)
>>where drug-admin == ?Y? as my end. Then, I should populate the
>variable
>>?study_id? with ?start? up to the entry just above the one whose date
>>matches ?end?, as the output below shows (I hope its structure is
>>maintained as I have copied it from R-Studio). The goal for now is to
>>then get difference in days between ?date? and ?study_id? and still
>get
>>to keep that column for ?study_id? as I might use it later.
>>
>>From the output, it can be seen that for this individual, the dates
>run
>>from 2007 to 2008. However, for some individuals, the dates run from
>>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>>deal with all the years as the dates range from 2001-2016
>>
>>ID	date	drug_admin	year	month	study_id
>>R1/3	5/11/07	Y	2007	5	5/11/07
>>R1/3	5/16/07		2007	5	5/11/07
>>R1/3	5/22/07		2007	5	5/11/07
>>R1/3	5/28/07		2007	5	5/11/07
>>R1/3	6/5/07			2007	6	5/11/07
>>R1/3	6/11/07		2007	6	5/11/07
>>R1/3	6/18/07		2007	6	5/11/07
>>R1/3	6/25/07		2007	6	5/11/07
>>R1/3	7/2/07			2007	7	5/11/07
>>R1/3	7/16/07		2007	7	5/11/07
>>R1/3	7/29/07		2007	7	5/11/07
>>R1/3	8/2/07			2007	8	5/11/07
>>R1/3	8/7/07			2007	8	5/11/07
>>R1/3	8/13/07		2007	8	5/11/07
>>R1/3	9/18/07		2007	9	5/11/07
>>R1/3	9/24/07		2007	9	5/11/07
>>R1/3	10/6/07		2007	10	5/11/07
>>R1/3	10/8/07		2007	10	5/11/07
>>R1/3	10/15/07		2007	10	5/11/07
>>R1/3	10/22/07		2007	10	5/11/07
>>R1/3	10/29/07		2007	10	5/11/07
>>R1/3	11/8/07		2007	11	5/11/07
>>R1/3	11/12/07		2007	11	5/11/07
>>R1/3	11/19/07		2007	11	5/11/07
>>R1/3	11/29/07		2007	11	5/11/07
>>R1/3	12/6/07		2007	12	5/11/07
>>R1/3	12/10/07		2007	12	5/11/07
>>R1/3	12/21/07		2007	12	5/11/07
>>R1/3	1/7/08			2008	1	5/11/07
>>R1/3	1/14/08		2008	1	5/11/07
>>R1/3	1/21/08		2008	1	5/11/07
>>R1/3	1/28/08		2008	1	5/11/07
>>R1/3	2/4/08		Y	2008	2	
>>
>>
>>Regards
>>-------------------------------------------------------------------------------
>>Kevin Wame 
>>
>>###############################################################
>>
>>###############################################################
>>
>>
>>
>>On 7/3/16, 7:05 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>>
>>result <- setNames( data.frame( aggregate( date~ID, data=drug_study,
>>FUN=min ),  aggregate( date~ID, data=drug_study, FUN=max )[2] ), c(
>>"ID", "start", "end" ) )
>>
>>
>>______________________________________________________________________
>>
>>This e-mail contains information which is confidential. It is intended
>>only for the use of the named recipient. If you have received this
>>e-mail in error, please let us know by replying to the sender, and
>>immediately delete it from your system.  Please note, that in these
>>circumstances, the use, disclosure, distribution or copying of this
>>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>cannot accept any responsibility for the  accuracy or completeness of
>>this message as it has been transmitted over a public network.
>Although
>>the Programme has taken reasonable precautions to ensure no viruses
>are
>>present in emails, it cannot accept responsibility for any loss or
>>damage arising from the use of the email or attachments. Any views
>>expressed in this message are those of the individual sender, except
>>where the sender specifically states them to be the views of
>>KEMRI-Wellcome Trust Programme.
>>______________________________________________________________________
>
>
>
>
>______________________________________________________________________
>
>This e-mail contains information which is confidential. It is intended
>only for the use of the named recipient. If you have received this
>e-mail in error, please let us know by replying to the sender, and
>immediately delete it from your system.  Please note, that in these
>circumstances, the use, disclosure, distribution or copying of this
>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>cannot accept any responsibility for the  accuracy or completeness of
>this message as it has been transmitted over a public network. Although
>the Programme has taken reasonable precautions to ensure no viruses are
>present in emails, it cannot accept responsibility for any loss or
>damage arising from the use of the email or attachments. Any views
>expressed in this message are those of the individual sender, except
>where the sender specifically states them to be the views of
>KEMRI-Wellcome Trust Programme.
>______________________________________________________________________



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for
the use of the named recipient. If you have received this e-mail in error,
please let us know by replying to the sender, and immediately delete it from
your system.  Please note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly prohibited.
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the 
accuracy or completeness of this message as it has been transmitted over a
public network. Although the Programme has taken reasonable precautions to
ensure no viruses are present in emails, it cannot accept responsibility for any
loss or damage arising from the use of the email or attachments. Any views
expressed in this message are those of the individual sender, except where the
sender specifically states them to be the views of KEMRI-Wellcome Trust
Programme.
______________________________________________________________________

Bert Gunter

2016-Jul-03 20:28 UTC

head link

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

I haven't followed this thread closely, but if it's not too late, I
might suggest that you stop worrying about how you want your data
frame to look and start worrying about you want to display/analyze
your data. As Jeff suggested, you and your supervisor are probably
being driven by paradigms from Excel, SPSS, or whatever that are
simply unnecessary for R. My guess would be that if you explained the
sort of analyses/plots you wish to do, you will find it can be done
fairly directly from your existing data. At the very least it would
give Jeff and other helpeRs a better idea of what you might need
rather than what you and your supervisor think you need.


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Jul 3, 2016 at 1:08 PM, Kevin Wamae <KWamae at kemri-wellcome.org>
wrote:> Hi Jeff, It works on well on a dataset with 100000 rows and I figure it
will work well with the ?real? dataset. You?ve been of great help and I am
starting to make headway.
>
> It creates a new dataframe (result), as shown below that doesn?t quite have
the result as I would want it.
>
> ID      admin_period    start   end     ddays
> J1/3    1       5/11/07 8/13/07 94
> J1/3    2       8/13/07 11/12/07        91
> J1/3    3       11/12/07        2/4/08             84
> J1/3    4       2/4/08              5/5/08                  91
> J1/3    5       5/5/08               5/4/09            364
> J1/3    6       5/4/09               5/17/10    378
> J1/3    7       5/17/10 5/16/11 364
> J10/1   1       5/11/07 8/13/07 94
> J10/1   2       8/13/07 11/12/07        91
> J10/1   3       11/12/07        2/4/08              84
> J10/1   4       2/4/08                5/5/08    91
> J10/1   5       5/5/08                5/8/09    368
> J10/1   6       5/8/09               5/17/10    374
> J10/1   7       5/17/10 5/16/11 364
> J102/1  1       5/15/07 8/15/07 92
> J102/1  2       8/15/07 11/13/07        90
> J102/1  3       11/13/07        2/5/08             84
> J102/1  4       2/5/08                5/6/08    91
> J102/1  5       5/6/08                5/5/09    364
> J102/1  6       5/5/09                5/19/10   379
>
> My supervisor doesn?t want me to create a new dataset, she?s afraid I might
lose some data?I cannot fight that.
>
> Like you mentioned earlier, I might be mixing up things which I think is
what you alluded to earlier.
>
> After consultation with my supervisor, this is what we?ve agreed. For every
individual, given the start and end date, create a new column (say, diff_days)
and for every row that falls within the range of start and end_date, get the
difference between the date in that row and start date and add it to the
diff_days column. Below is an example of the result. As it can be seen 5/11/2007
is the start while 2/4/2008 is the end. The diff_days has been populated
excluding the end date and that is because that is the start of the study in
2008 that will continue into 2009 and thus from 2/4/2008, I should compute
diff_days till 2009 and so no (I hope this makes sense).
>
> ID      date    drug_admin      year    month   diff_days
> R1/3    5/11/2007       Y       2007    5       0
> R1/3    5/16/2007               2007    5       6
> R1/3    5/22/2007               2007    5       11
> R1/3    5/28/2007               2007    5       17
> R1/3    1/14/2008               2008    1       248
> R1/3    1/21/2008               2008    1       255
> R1/3    1/28/2008               2008    1       263
> R1/3    2/4/2008        Y       2008    2
>
>
> Regards
>
-------------------------------------------------------------------------------
> Kevin Wame
>
>
> On 7/3/16, 10:09 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>
> Typo on the second line
>
> result <- (   result0
>           %>% select( -admin_period1 )
>           %>% inner_join( result0 %>% select( ID, admin_period1,
end=start )
>                        , by = c( ID="ID", admin_period
="admin_period1" )
>                         )
>           %>% mutate( ddays = end - start )
>           )
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 3, 2016 11:55:14 AM PDT, Kevin Wamae <KWamae at
kemri-wellcome.org> wrote:
>>Hi Jeff, ?likes its Excel?, I don?t follow. Pardon me for any mix up.
>>
>>Thanks for the code.  After running it, this is the error I get.
>>
>>Error: cannot join on columns 'admin_period' x
'admin_period1': index
>>out of bounds
>>
>>Regards
>>-------------------------------------------------------------------------------
>>Kevin Wame | Ph.D. Student (IDeAL)
>>KEMRI-Wellcome Trust Collaborative Research Programme
>>Centre for Geographic Medicine Research
>>P.O. Box 230-80108, Kilifi, Kenya
>>
>>
>>On 7/3/16, 9:34 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>>
>>I still get the impression from your mixing of information types that
>>you are thinking like this is Excel.
>>
>>Perhaps something like
>>
>>drug_study$admin_period  <- ave( "Y" ==
drug_study$drug_admin,
>>drug_study$ID, FUN=cumsum )
>>library(dplyr)
>>result0 <- (   drug_study
>>          %>% filter( 0 != admin_period )
>>          %>% group_by( ID, admin_period )
>>          %>% summarise( start = min( date ) )
>>          %>% mutate( admin_period1 = admin_period -1 )
>>          )
>>result <- (   result0
>>          %>% select( -admin_period )
>>     %>% inner_join( result0 %>% select( ID, admin_period1,
end=start )
>>                     , by = c( ID="ID", admin_period
="admin_period1" )
>>                        )
>>          %>% mutate( ddays = end - start )
>>          )
>>--
>>Sent from my phone. Please excuse my brevity.
>>
>>On July 3, 2016 10:24:51 AM PDT, Kevin Wamae
>><KWamae at kemri-wellcome.org> wrote:
>>>HI Jeff, it?s been an uphill task working with the dataset and I am
>>not
>>>the first to complain. Nonetheless, data-cleaning is ongoing and
since
>>>I cannot wait for that to get done, I decided to make the most of
what
>>>the dataset looks like at this time. It appears the process may take
a
>>>while.
>>>
>>>Thanks for the script. From the output, I noticed that ?result?
>>>contains the first and last date for each of the individuals and not
>>>taking into account the variable ?drug-admin?.
>>>
>>>ID        start               end
>>>J1/3      1/5/09      12/25/10
>>>R1/3      1/4/07      12/15/08
>>>R10/1     1/4/07      3/5/12
>>>
>>>My aim is to pick the date, for example in 2007, where drug-admin
=>>>?Y? as my start and the date in the subsequent year (2008 in this
>>case)
>>>where drug-admin == ?Y? as my end. Then, I should populate the
>>variable
>>>?study_id? with ?start? up to the entry just above the one whose
date
>>>matches ?end?, as the output below shows (I hope its structure is
>>>maintained as I have copied it from R-Studio). The goal for now is
to
>>>then get difference in days between ?date? and ?study_id? and still
>>get
>>>to keep that column for ?study_id? as I might use it later.
>>>
>>>From the output, it can be seen that for this individual, the dates
>>run
>>>from 2007 to 2008. However, for some individuals, the dates run from
>>>2008-2009, 2009-2010 and so on. Therefore, I need to make the script
>>>deal with all the years as the dates range from 2001-2016
>>>
>>>ID    date    drug_admin      year    month   study_id
>>>R1/3  5/11/07 Y       2007    5       5/11/07
>>>R1/3  5/16/07         2007    5       5/11/07
>>>R1/3  5/22/07         2007    5       5/11/07
>>>R1/3  5/28/07         2007    5       5/11/07
>>>R1/3  6/5/07                  2007    6       5/11/07
>>>R1/3  6/11/07         2007    6       5/11/07
>>>R1/3  6/18/07         2007    6       5/11/07
>>>R1/3  6/25/07         2007    6       5/11/07
>>>R1/3  7/2/07                  2007    7       5/11/07
>>>R1/3  7/16/07         2007    7       5/11/07
>>>R1/3  7/29/07         2007    7       5/11/07
>>>R1/3  8/2/07                  2007    8       5/11/07
>>>R1/3  8/7/07                  2007    8       5/11/07
>>>R1/3  8/13/07         2007    8       5/11/07
>>>R1/3  9/18/07         2007    9       5/11/07
>>>R1/3  9/24/07         2007    9       5/11/07
>>>R1/3  10/6/07         2007    10      5/11/07
>>>R1/3  10/8/07         2007    10      5/11/07
>>>R1/3  10/15/07                2007    10      5/11/07
>>>R1/3  10/22/07                2007    10      5/11/07
>>>R1/3  10/29/07                2007    10      5/11/07
>>>R1/3  11/8/07         2007    11      5/11/07
>>>R1/3  11/12/07                2007    11      5/11/07
>>>R1/3  11/19/07                2007    11      5/11/07
>>>R1/3  11/29/07                2007    11      5/11/07
>>>R1/3  12/6/07         2007    12      5/11/07
>>>R1/3  12/10/07                2007    12      5/11/07
>>>R1/3  12/21/07                2007    12      5/11/07
>>>R1/3  1/7/08                  2008    1       5/11/07
>>>R1/3  1/14/08         2008    1       5/11/07
>>>R1/3  1/21/08         2008    1       5/11/07
>>>R1/3  1/28/08         2008    1       5/11/07
>>>R1/3  2/4/08          Y       2008    2
>>>
>>>
>>>Regards
>>>-------------------------------------------------------------------------------
>>>Kevin Wame
>>>
>>>###############################################################
>>>
>>>###############################################################
>>>
>>>
>>>
>>>On 7/3/16, 7:05 PM, "Jeff Newmiller" <jdnewmil at
dcn.davis.ca.us> wrote:
>>>
>>>result <- setNames( data.frame( aggregate( date~ID,
data=drug_study,
>>>FUN=min ),  aggregate( date~ID, data=drug_study, FUN=max )[2] ), c(
>>>"ID", "start", "end" ) )
>>>
>>>
>>>______________________________________________________________________
>>>
>>>This e-mail contains information which is confidential. It is
intended
>>>only for the use of the named recipient. If you have received this
>>>e-mail in error, please let us know by replying to the sender, and
>>>immediately delete it from your system.  Please note, that in these
>>>circumstances, the use, disclosure, distribution or copying of this
>>>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>>cannot accept any responsibility for the  accuracy or completeness
of
>>>this message as it has been transmitted over a public network.
>>Although
>>>the Programme has taken reasonable precautions to ensure no viruses
>>are
>>>present in emails, it cannot accept responsibility for any loss or
>>>damage arising from the use of the email or attachments. Any views
>>>expressed in this message are those of the individual sender, except
>>>where the sender specifically states them to be the views of
>>>KEMRI-Wellcome Trust Programme.
>>>______________________________________________________________________
>>
>>
>>
>>
>>______________________________________________________________________
>>
>>This e-mail contains information which is confidential. It is intended
>>only for the use of the named recipient. If you have received this
>>e-mail in error, please let us know by replying to the sender, and
>>immediately delete it from your system.  Please note, that in these
>>circumstances, the use, disclosure, distribution or copying of this
>>information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>cannot accept any responsibility for the  accuracy or completeness of
>>this message as it has been transmitted over a public network. Although
>>the Programme has taken reasonable precautions to ensure no viruses are
>>present in emails, it cannot accept responsibility for any loss or
>>damage arising from the use of the email or attachments. Any views
>>expressed in this message are those of the individual sender, except
>>where the sender specifically states them to be the views of
>>KEMRI-Wellcome Trust Programme.
>>______________________________________________________________________
>
>
>
>
> ______________________________________________________________________
>
> This e-mail contains information which is confidential. It is intended only
for the use of the named recipient. If you have received this e-mail in error,
please let us know by replying to the sender, and immediately delete it from
your system.  Please note, that in these circumstances, the use, disclosure,
distribution or copying of this information is strictly prohibited.
KEMRI-Wellcome Trust Programme cannot accept any responsibility for the 
accuracy or completeness of this message as it has been transmitted over a
public network. Although the Programme has taken reasonable precautions to
ensure no viruses are present in emails, it cannot accept responsibility for any
loss or damage arising from the use of the email or attachments. Any views
expressed in this message are those of the individual sender, except where the
sender specifically states them to be the views of KEMRI-Wellcome Trust
Programme.
> ______________________________________________________________________
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jul 2016 - R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset

[R] R - Populate Another Variable Based on Multiple Conditions | For a Large Dataset