Rui Barradas
2024-Nov-27 19:13 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
?s 16:30 de 27/11/2024, Sorkin, John escreveu:> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: > > I want to take data (olddata) that looks like this > ID Day > 1 1 > 1 1 > 1 2 > 1 2 > 1 3 > 1 3 > 1 4 > 1 4 > 1 5 > 1 5 > 2 5 > 2 5 > 2 5 > 2 6 > 2 6 > 2 6 > 3 10 > 3 10 > > and make it look like this: > (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: > > ID Day FirstDay > 1 1 1 > 1 1 1 > 1 2 1 > 1 2 1 > 1 3 1 > 1 3 1 > 1 4 1 > 1 4 1 > 1 5 1 > 1 5 1 > 2 5 5 > 2 5 5 > 2 5 5 > 2 6 5 > 2 6 5 > 2 6 5 > 3 10 3 > 3 10 3 > > SAS code that can do this is: > > proc sort data=olddata; > by ID Day; > run; > > data newdata; > retain FirstDay; > set olddata; > by ID; > if first.ID then FirstDay=Day; > run; > > I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: > > ID <- c(rep(1,10),rep(2,6),rep(3,2)) > date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), > rep(5,3),rep(6,3),rep(10,2)) > date > olddata <- data.frame(ID=ID,date=date) > olddata > > Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . . > > Thanks > John > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine, University of Maryland School of Medicine; > Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; > PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; > Senior Statistician University of Maryland Center for Vascular Research; > > Division of Gerontology and Paliative Care, > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > Cell phone 443-418-5382 > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, Isn't ?ave the simplest way? The first one-liner assumes the dates are sorted in ascending order. ave(olddata$date, olddata$ID, FUN = \(x) x[1L]) #> [1] 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 10 10 If the dates are not sorted, ave(olddata$date, olddata$ID, FUN = \(x) min(x)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com
Jeff Newmiller
2024-Nov-27 19:38 UTC
[R] R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
Was wondering when this would be suggested. But the question was about getting the final dataframe... newdta <- olddta newdta$FirstDay <- ave(newdata$date, newdata$ID, FUN = \(x) x[1L]) On November 27, 2024 11:13:49 AM PST, Rui Barradas <ruipbarradas at sapo.pt> wrote:>?s 16:30 de 27/11/2024, Sorkin, John escreveu: >> I am an old, long time SAS programmer. I need to produce R code that processes a dataframe in a manner that is equivalent to that produced by using a by statement in SAS and an if first.day statement and a retain statement: >> >> I want to take data (olddata) that looks like this >> ID Day >> 1 1 >> 1 1 >> 1 2 >> 1 2 >> 1 3 >> 1 3 >> 1 4 >> 1 4 >> 1 5 >> 1 5 >> 2 5 >> 2 5 >> 2 5 >> 2 6 >> 2 6 >> 2 6 >> 3 10 >> 3 10 >> >> and make it look like this: >> (withing each ID I am copying the first value of Day into a new variable, FirstDay, and propagating the FirstDay value through all rows that have the same ID: >> >> ID Day FirstDay >> 1 1 1 >> 1 1 1 >> 1 2 1 >> 1 2 1 >> 1 3 1 >> 1 3 1 >> 1 4 1 >> 1 4 1 >> 1 5 1 >> 1 5 1 >> 2 5 5 >> 2 5 5 >> 2 5 5 >> 2 6 5 >> 2 6 5 >> 2 6 5 >> 3 10 3 >> 3 10 3 >> >> SAS code that can do this is: >> >> proc sort data=olddata; >> by ID Day; >> run; >> >> data newdata; >> retain FirstDay; >> set olddata; >> by ID; >> if first.ID then FirstDay=Day; >> run; >> >> I have NO idea how to do this is R (so I can't post test-code), but below I have R code that creates olddata: >> >> ID <- c(rep(1,10),rep(2,6),rep(3,2)) >> date <- c(rep(1,2),rep(2,2),rep(3,2),rep(4,2),rep(5,2), >> rep(5,3),rep(6,3),rep(10,2)) >> date >> olddata <- data.frame(ID=ID,date=date) >> olddata >> >> Any suggestions on how to do this would be appreciated. . . I have worked on this for more than 12-hours, despite multiple we searches I have gotten nowhere. . . >> >> Thanks >> John >> >> >> >> >> John David Sorkin M.D., Ph.D. >> Professor of Medicine, University of Maryland School of Medicine; >> Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; >> PI?Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; >> Senior Statistician University of Maryland Center for Vascular Research; >> >> Division of Gerontology and Paliative Care, >> 10 North Greene Street >> GRECC (BT/18/GR) >> Baltimore, MD 21201-1524 >> Cell phone 443-418-5382 >> >> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >Hello, > >Isn't ?ave the simplest way? >The first one-liner assumes the dates are sorted in ascending order. > > >ave(olddata$date, olddata$ID, FUN = \(x) x[1L]) >#> [1] 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 10 10 > > >If the dates are not sorted, > > >ave(olddata$date, olddata$ID, FUN = \(x) min(x)) > > > >Hope this helps, > >Rui Barradas > >-- Sent from my phone. Please excuse my brevity.
Seemingly Similar Threads
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- R Processing dataframe by group - equivalent to SAS by group processing with a first. and retain statments
- Identify first row of each ID within a data frame, create a variable first =1 for the first row and first=0 of all other rows