Sorkin, John
2024-Nov-27 16:30 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
I am an old, long time SAS programmer. I need to produce R code that processes a
dataframe in a manner that is equivalent to that produced by using a by
statement in SAS and an if first.day statement and a retain statement:
I want to take data (olddata) that looks like this
ID Day
1 1
1 1
1 2
1 2
1 3
1 3
1 4
1 4
1 5
1 5
2 5
2 5
2 5
2 6
2 6
2 6
3 10
3 10
and make it look like this:
(withing each ID I am copying the first value of Day into a new variable,
FirstDay, and propagating the FirstDay value through all rows that have the same
ID:
ID Day FirstDay
1 1 1
1 1 1
1 2 1
1 2 1
1 3 1
1 3 1
1 4 1
1 4 1
1 5 1
1 5 1
2 5 5
2 5 5
2 5 5
2 6 5
2 6 5
2 6 5
3 10 3
3 10 3
SAS code that can do this is:
proc sort data=olddata;
by ID Day;
run;
data newdata;
retain FirstDay;
set olddata;
by ID;
if first.ID then FirstDay=Day;
run;
I have NO idea how to do this is R (so I can't post test-code), but below I
have R code that creates olddata:
ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
rep(5,3),rep(6,3),rep(10,2))
date
olddata <- data.frame(ID=ID,date=date)
olddata
Any suggestions on how to do this would be appreciated. . . I have worked on
this for more than 12-hours, despite multiple we searches I have gotten nowhere.
. .
Thanks
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;?
PI?Biostatistics and Informatics Core, University of Maryland School of Medicine
Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
Tom Woolman
2024-Nov-27 17:05 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
Check out the dplyr package, specifically the mutate function. # Create new column based on existing column value df <- df %>% mutate(FirstDay = if(ID = 2, 5)) df Repeat as needed to capture all of the day/firstday combinations you want to account for. Like everything else in R, there are probably at least a dozen other ways to do this, between base R and all of the library packages available. On Wednesday, November 27th, 2024 at 11:30 AM, Sorkin, John <jsorkin at som.umaryland.edu> wrote:> > > I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: > > I want to take data (olddata) that looks like this > ID Day > 1 1 > 1 1 > 1 2 > 1 2 > 1 3 > 1 3 > 1 4 > 1 4 > 1 5 > 1 5 > 2 5 > 2 5 > 2 5 > 2 6 > 2 6 > 2 6 > 3 10 > 3 10 > > and make it look like this: > (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: > > ID Day FirstDay > 1 1 1 > 1 1 1 > 1 2 1 > 1 2 1 > 1 3 1 > 1 3 1 > 1 4 1 > 1 4 1 > 1 5 1 > 1 5 1 > 2 5 5 > 2 5 5 > 2 5 5 > 2 6 5 > 2 6 5 > 2 6 5 > 3 10 3 > 3 10 3 > > SAS code that can do this is: > > proc sort data=olddata; > by ID Day; > run; > > data newdata; > retain FirstDay; > set olddata; > by ID; > if first.ID then FirstDay=Day; > run; > > I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: > > ID <- c(rep(1,10),rep(2,6),rep(3,2)) > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), > rep(5,3),rep(6,3),rep(10,2)) > date > olddata <- data.frame(ID=ID,date=date) > olddata > > Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . . > > Thanks > John > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Richard M. Heiberger
2024-Nov-27 17:22 UTC
[R] [External] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
I would use base R.
newdata <- cbind(olddata, FirstDay=olddata$date)
newdata$FirstDay <- with(newdata, {
for (thisID in unique(ID))
FirstDay[ID==thisID] <- FirstDay[ID==thisID][1]
FirstDay}
)
newdata
note that both my solution and Olivier have newdata$FirstDay[17:18] == 10
which is what I thinkk you intended.
Rich
> On Nov 27, 2024, at 11:30, Sorkin, John <jsorkin at
som.umaryland.edu> wrote:
>
> I am an old, long time SAS programmer. I need to produce R code that
processes a dataframe in a manner that is equivalent to that produced by using a
by statement in SAS and an if first.day statement and a retain statement:
>
> I want to take data (olddata) that looks like this
> ID Day
> 1 1
> 1 1
> 1 2
> 1 2
> 1 3
> 1 3
> 1 4
> 1 4
> 1 5
> 1 5
> 2 5
> 2 5
> 2 5
> 2 6
> 2 6
> 2 6
> 3 10
> 3 10
>
> and make it look like this:
> (withing each ID I am copying the first value of Day into a new variable,
FirstDay, and propagating the FirstDay value through all rows that have the same
ID:
>
> ID Day FirstDay
> 1 1 1
> 1 1 1
> 1 2 1
> 1 2 1
> 1 3 1
> 1 3 1
> 1 4 1
> 1 4 1
> 1 5 1
> 1 5 1
> 2 5 5
> 2 5 5
> 2 5 5
> 2 6 5
> 2 6 5
> 2 6 5
> 3 10 3
> 3 10 3
>
> SAS code that can do this is:
>
> proc sort data=olddata;
> by ID Day;
> run;
>
> data newdata;
> retain FirstDay;
> set olddata;
> by ID;
> if first.ID then FirstDay=Day;
> run;
>
> I have NO idea how to do this is R (so I can't post test-code), but
below I have R code that creates olddata:
>
> ID <- c(rep(1,10),rep(2,6),rep(3,2))
> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
> rep(5,3),rep(6,3),rep(10,2))
> date
> olddata <- data.frame(ID=ID,date=date)
> olddata
>
> Any suggestions on how to do this would be appreciated. . . I have worked
on this for more than 12-hours, despite multiple we searches I have gotten
nowhere. . .
>
> Thanks
> John
>
>
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
> PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
>
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2024-Nov-27 17:39 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
On 11/27/24 08:30, Sorkin, John wrote:> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: > > I want to take data (olddata) that looks like this > ID Day > 1 1 > 1 1 > 1 2 > 1 2 > 1 3 > 1 3 > 1 4 > 1 4 > 1 5 > 1 5 > 2 5 > 2 5 > 2 5 > 2 6 > 2 6 > 2 6 > 3 10 > 3 10 > > and make it look like this: > (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: > > ID Day FirstDay > 1 1 1 > 1 1 1 > 1 2 1 > 1 2 1 > 1 3 1 > 1 3 1 > 1 4 1 > 1 4 1 > 1 5 1 > 1 5 1 > 2 5 5 > 2 5 5 > 2 5 5 > 2 6 5 > 2 6 5 > 2 6 5 > 3 10 3 > 3 10 3 > > SAS code that can do this is: > > proc sort data=olddata; > by ID Day; > run; > > data newdata; > retain FirstDay; > set olddata; > by ID; > if first.ID then FirstDay=Day; > run; > > I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: > > ID <- c(rep(1,10),rep(2,6),rep(3,2)) > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), > rep(5,3),rep(6,3),rep(10,2)) > date > olddata <- data.frame(ID=ID,date=date) > olddata > > Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .There's an R base function named, wait for it, ... `by` It returns a list? that is the results of a function applied to the sub-dataframes indexed by whatever grouping variable you specify in the second argument. My memory told me that it needed to be presented as a list which was why I chose to use the `[` function rather than `$` or `[[` by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1], times=nrow(x) )}) #------------------- ID: 1 [1] 1 1 1 1 1 1 1 1 1 1 ------------------------------------------------------------------------------------ ID: 2 [1] 2 2 2 2 2 2 ------------------------------------------------------------------------------------ ID: 3 [1] 3 3 So all you need to do from there is unlist it and assign to the new named column #------------------ olddata$FirstDay <- unlist( by(olddata, olddata["ID"], FUN= function(x) { rep( x$ID[1], times=nrow(x) )}) ) olddata #---------------------------- ID date FirstDay 1 1 1 1 2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1 9 1 5 1 10 1 5 1 11 2 5 2 12 2 5 2 13 2 5 2 14 2 6 2 15 2 6 2 16 2 6 2 17 3 10 3 18 3 10 3 HTH David.> > > Thanks > John > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttps://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
David Winsemius
2024-Nov-27 17:44 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
On 11/27/24 08:30, Sorkin, John wrote:> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: > > I want to take data (olddata) that looks like this > ID Day > 1 1 > 1 1 > 1 2 > 1 2 > 1 3 > 1 3 > 1 4 > 1 4 > 1 5 > 1 5 > 2 5 > 2 5 > 2 5 > 2 6 > 2 6 > 2 6 > 3 10 > 3 10 > > and make it look like this: > (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: > > ID Day FirstDay > 1 1 1 > 1 1 1 > 1 2 1 > 1 2 1 > 1 3 1 > 1 3 1 > 1 4 1 > 1 4 1 > 1 5 1 > 1 5 1 > 2 5 5 > 2 5 5 > 2 5 5 > 2 6 5 > 2 6 5 > 2 6 5 > 3 10 3 > 3 10 3 > > SAS code that can do this is: > > proc sort data=olddata; > by ID Day; > run; > > data newdata; > retain FirstDay; > set olddata; > by ID; > if first.ID then FirstDay=Day; > run; > > I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: > > ID <- c(rep(1,10),rep(2,6),rep(3,2)) > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), > rep(5,3),rep(6,3),rep(10,2)) > date > olddata <- data.frame(ID=ID,date=date) > olddata > > Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . .My earlier approach incorrectly picked the firs of the ID column rather than the first of the `date` column to be repeated withing the indexed group so here's the correct code:> olddata$FirstDay <- unlist( by(olddata, olddata["ID"], FUN= function(x){ rep( x$date[1], times=nrow(x) )}) ) > olddata ID date FirstDay 1 1 1 1 2 1 1 1 3 1 2 1 4 1 2 1 5 1 3 1 6 1 3 1 7 1 4 1 8 1 4 1 9 1 5 1 10 1 5 1 11 2 5 5 12 2 5 5 13 2 5 5 14 2 6 5 15 2 6 5 16 2 6 5 17 3 10 10 18 3 10 10> > Thanks > John > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttps://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
@vi@e@gross m@iii@g oii gm@ii@com
2024-Nov-27 18:27 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
John,
If I understood you, you want to take the minimum value of Day for each
grouping by ID and add a new column to contain that. Right?
There are likely many ways to do this in base R, but I prefer the
dplyr/tidyverse package in which you can use group_by(ID) piped to
mutate(FirstDay = min(Day))
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Sorkin, John
Sent: Wednesday, November 27, 2024 11:31 AM
To: r-help at r-project.org (r-help at r-project.org) <r-help at
r-project.org>
Subject: [R] R Processing dataframe by group - equivalent to SAS by group
processing with a first. and retain statments
I am an old, long time SAS programmer. I need to produce R code that
processes a dataframe in a manner that is equivalent to that produced by
using a by statement in SAS and an if first.day statement and a retain
statement:
I want to take data (olddata) that looks like this
ID Day
1 1
1 1
1 2
1 2
1 3
1 3
1 4
1 4
1 5
1 5
2 5
2 5
2 5
2 6
2 6
2 6
3 10
3 10
and make it look like this:
(withing each ID I am copying the first value of Day into a new variable,
FirstDay, and propagating the FirstDay value through all rows that have the
same ID:
ID Day FirstDay
1 1 1
1 1 1
1 2 1
1 2 1
1 3 1
1 3 1
1 4 1
1 4 1
1 5 1
1 5 1
2 5 5
2 5 5
2 5 5
2 6 5
2 6 5
2 6 5
3 10 3
3 10 3
SAS code that can do this is:
proc sort data=olddata;
by ID Day;
run;
data newdata;
retain FirstDay;
set olddata;
by ID;
if first.ID then FirstDay=Day;
run;
I have NO idea how to do this is R (so I can't post test-code), but below I
have R code that creates olddata:
ID <- c(rep(1,10),rep(2,6),rep(3,2))
date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
rep(5,3),rep(6,3),rep(10,2))
date
olddata <- data.frame(ID=ID,date=date)
olddata
Any suggestions on how to do this would be appreciated. . . I have worked on
this for more than 12-hours, despite multiple we searches I have gotten
nowhere. . .
Thanks
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;?
PI?Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2024-Nov-27 19:13 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
?s 16:30 de 27/11/2024, Sorkin, John escreveu:> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: > > I want to take data (olddata) that looks like this > ID Day > 1 1 > 1 1 > 1 2 > 1 2 > 1 3 > 1 3 > 1 4 > 1 4 > 1 5 > 1 5 > 2 5 > 2 5 > 2 5 > 2 6 > 2 6 > 2 6 > 3 10 > 3 10 > > and make it look like this: > (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: > > ID Day FirstDay > 1 1 1 > 1 1 1 > 1 2 1 > 1 2 1 > 1 3 1 > 1 3 1 > 1 4 1 > 1 4 1 > 1 5 1 > 1 5 1 > 2 5 5 > 2 5 5 > 2 5 5 > 2 6 5 > 2 6 5 > 2 6 5 > 3 10 3 > 3 10 3 > > SAS code that can do this is: > > proc sort data=olddata; > by ID Day; > run; > > data newdata; > retain FirstDay; > set olddata; > by ID; > if first.ID then FirstDay=Day; > run; > > I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: > > ID <- c(rep(1,10),rep(2,6),rep(3,2)) > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), > rep(5,3),rep(6,3),rep(10,2)) > date > olddata <- data.frame(ID=ID,date=date) > olddata > > Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . . > > Thanks > John > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, Isn't ?ave the simplest way? The first one-liner assumes the dates are sorted in ascending order. ave(olddata$date, olddata$ID, FUN = \(x) x[1L]) #> [1] 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 10 10 If the dates are not sorted, ave(olddata$date, olddata$ID, FUN = \(x) min(x)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
Naresh Gurbuxani
2024-Nov-28 03:35 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
In addition to many good solutions already provided, this solution uses
data.table package.
library(data.table)
mydf <- data.frame(id = c(rep(1,10),rep(2,6),rep(3,2)), date =
c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), rep(5,3),rep(6,3),rep(10,2)))
setDT(mydf)
mydf[, `:=`(firstdate = with(.SD, min(date))), by = .(id)]
setDF(mydf)
On Nov 27, 2024, at 11:30?AM, Sorkin, John <jsorkin at som.umaryland.edu>
wrote:
c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2),
rep(5,3),rep(6,3),rep(10,2))
[[alternative HTML version deleted]]
Possibly Parallel Threads
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- Identify first row of each ID within a data frame, create a variable first =1 for the first row and first=0 of all other rows