thr3ads.net - R help - [R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments [Nov 2024]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2024-Nov-27 19:13 UTC

[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

?s 16:30 de 27/11/2024, Sorkin, John escreveu:> I am an old, long time SAS programmer. I need to produce R code that
processes a dataframe in a manner that is equivalent to that produced by using a
by statement in SAS and an if first.day statement and a retain statement:
> 
> I want to take data (olddata) that looks like this
> ID	Day
> 1	1
> 1	1
> 1	2
> 1	2
> 1	3
> 1	3
> 1	4
> 1	4
> 1	5
> 1	5
> 2	5
> 2	5
> 2	5
> 2	6
> 2	6
> 2	6
> 3	10
> 3	10
> 
> and make it look like this:
> (withing each ID I am copying the first value of Day into a new variable,
FirstDay, and propagating the FirstDay value through all rows that have the same
ID:
> 
> ID	Day	FirstDay
> 1	1	1
> 1	1	1
> 1	2	1
> 1	2	1
> 1	3	1
> 1	3	1
> 1	4	1
> 1	4	1
> 1	5	1
> 1	5	1
> 2	5	5
> 2	5	5
> 2	5	5
> 2	6	5
> 2	6	5
> 2	6	5
> 3	10	3
> 3	10	3
> 
> SAS code that can do this is:
> 
> proc sort data=olddata;
>    by ID Day;
> run;
> 
> data newdata;
>    retain FirstDay;
>    set olddata;
>    by ID;
>    if first.ID then FirstDay=Day;
> run;
> 
> I have NO idea how to do this is R (so I can't post test-code), but
below I have R code that creates olddata:
> 
> ID <- c(rep(1,10),rep(2,6),rep(3,2))
> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
>            rep(5,3),rep(6,3),rep(10,2))
> date
> olddata <- data.frame(ID=ID,date=date)
> olddata
> 
> Any suggestions on how to do this would be appreciated. . . I have worked
on this for more than 12-hours, despite multiple we searches I have gotten
nowhere. . .
> 
> Thanks
> John
> 
> 
> 
> 
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
> PI?Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
> 
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

Isn't ?ave the simplest way?
The first one-liner assumes the dates are sorted in ascending order.


ave(olddata$date, olddata$ID, FUN = \(x) x[1L])
#>  [1]  1  1  1  1  1  1  1  1  1  1  5  5  5  5  5  5 10 10


If the dates are not sorted,


ave(olddata$date, olddata$ID, FUN = \(x) min(x))



Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

Jeff Newmiller

2024-Nov-27 19:38 UTC

head link

[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

Was wondering when this would be suggested. But the question was about getting
the final dataframe...


newdta <- olddta
newdta$FirstDay <- ave(newdata$date, newdata$ID, FUN = \(x) x[1L])

On November 27, 2024 11:13:49 AM PST, Rui Barradas <ruipbarradas at
sapo.pt> wrote:>?s 16:30 de 27/11/2024, Sorkin, John escreveu:
>> I am an old, long time SAS programmer. I need to produce R code that
processes a dataframe in a manner that is equivalent to that produced by using a
by statement in SAS and an if first.day statement and a retain statement:
>> 
>> I want to take data (olddata) that looks like this
>> ID	Day
>> 1	1
>> 1	1
>> 1	2
>> 1	2
>> 1	3
>> 1	3
>> 1	4
>> 1	4
>> 1	5
>> 1	5
>> 2	5
>> 2	5
>> 2	5
>> 2	6
>> 2	6
>> 2	6
>> 3	10
>> 3	10
>> 
>> and make it look like this:
>> (withing each ID I am copying the first value of Day into a new
variable, FirstDay, and propagating the FirstDay value through all rows that
have the same ID:
>> 
>> ID	Day	FirstDay
>> 1	1	1
>> 1	1	1
>> 1	2	1
>> 1	2	1
>> 1	3	1
>> 1	3	1
>> 1	4	1
>> 1	4	1
>> 1	5	1
>> 1	5	1
>> 2	5	5
>> 2	5	5
>> 2	5	5
>> 2	6	5
>> 2	6	5
>> 2	6	5
>> 3	10	3
>> 3	10	3
>> 
>> SAS code that can do this is:
>> 
>> proc sort data=olddata;
>>    by ID Day;
>> run;
>> 
>> data newdata;
>>    retain FirstDay;
>>    set olddata;
>>    by ID;
>>    if first.ID then FirstDay=Day;
>> run;
>> 
>> I have NO idea how to do this is R (so I can't post test-code), but
below I have R code that creates olddata:
>> 
>> ID <- c(rep(1,10),rep(2,6),rep(3,2))
>> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
>>            rep(5,3),rep(6,3),rep(10,2))
>> date
>> olddata <- data.frame(ID=ID,date=date)
>> olddata
>> 
>> Any suggestions on how to do this would be appreciated. . . I have
worked on this for more than 12-hours, despite multiple we searches I have
gotten nowhere. . .
>> 
>> Thanks
>> John
>> 
>> 
>> 
>> 
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine, University of Maryland School of Medicine;
>> Associate Director for Biostatistics and Informatics, Baltimore VA
Medical Center Geriatrics Research, Education, and Clinical Center;
>> PI?Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
>> Senior Statistician University of Maryland Center for Vascular
Research;
>> 
>> Division of Gerontology and Paliative Care,
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> Cell phone 443-418-5382
>> 
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>Hello,
>
>Isn't ?ave the simplest way?
>The first one-liner assumes the dates are sorted in ascending order.
>
>
>ave(olddata$date, olddata$ID, FUN = \(x) x[1L])
>#>  [1]  1  1  1  1  1  1  1  1  1  1  5  5  5  5  5  5 10 10
>
>
>If the dates are not sorted,
>
>
>ave(olddata$date, olddata$ID, FUN = \(x) min(x))
>
>
>
>Hope this helps,
>
>Rui Barradas
>
>
-- 
Sent from my phone. Please excuse my brevity.

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Nov 2024 - R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments

Seemingly Similar Threads