Chris Beeley
2010-Aug-24 17:19 UTC
[R] How to remove rows based on frequency of factor and then difference date scores
Hello- A basic question which has nonetheless floored me entirely. I have a dataset which looks like this: Type ID Date Value A 1 16/09/2020 8 A 1 23/09/2010 9 B 3 18/8/2010 7 B 1 13/5/2010 6 There are two Types, which correspond to different individuals in different conditions, and loads of ID labels (1:50) corresponding to the different individuals in each condition, and measurements at different times (from 1 to 10 measurements) for each individual. I want to perform the following operations: 1) Delete all individuals for whom only one measurement is available. In the dataset above, you can see that I want to delete the row Type B ID 3, and Type B ID 1, but without deleting the Type A ID 1 data because there is more than one measurement for Type A ID 1 (but not for Type B ID1) 2) Produce difference scores for each of the Dates, so each individual (Type A ID1 and all the others for whom more than one measurement exists) starts at Date "1" and goes up in integers according to how many days have elapsed. I just know there's some incredibly cunning R-ish way of doing this but after many hours of fiddling I have had to admit defeat. I would be very grateful for any words of advice. Many thanks, Chris Beeley, Institute of Mental Health, UK
David Winsemius
2010-Aug-24 17:53 UTC
[R] How to remove rows based on frequency of factor and then difference date scores
On Aug 24, 2010, at 1:19 PM, Chris Beeley wrote:> Hello- > > A basic question which has nonetheless floored me entirely. I have a > dataset which looks like this: > > Type ID Date Value > A 1 16/09/2020 8 > A 1 23/09/2010 9 > B 3 18/8/2010 7 > B 1 13/5/2010 6 > > There are two Types, which correspond to different individuals in > different conditions, and loads of ID labels (1:50) corresponding to > the different individuals in each condition, and measurements at > different times (from 1 to 10 measurements) for each individual. > > I want to perform the following operations: > > 1) Delete all individuals for whom only one measurement is available. > In the dataset above, you can see that I want to delete the row Type B > ID 3, and Type B ID 1, but without deleting the Type A ID 1 data > because there is more than one measurement for Type A ID 1 (but not > for Type B ID1) > > 2) Produce difference scores for each of the Dates, so each individual > (Type A ID1 and all the others for whom more than one measurement > exists) starts at Date "1" and goes up in integers according to how > many days have elapsed. > > I just know there's some incredibly cunning R-ish way of doing this > but after many hours of fiddling I have had to admit defeat.Not sure about terribly cunning. Let's assume your dataframe was read in with stringsAsFactors=FALSE and is called txt.df: > txt.df$dt2 <- as.Date(txt.df$Date, format="%d/%m/%Y") > txt.df Type ID Date Value dt2 1 A 1 16/09/2020 8 2020-09-16 2 A 1 23/09/2010 9 2010-09-23 3 B 3 18/8/2010 7 2010-08-18 4 B 1 13/5/2010 6 2010-05-13 > txt.df$nn <- ave(txt.df$ID,txt.df$ID, FUN=length) > txt.df Type ID Date Value dt2 nn 1 A 1 16/09/2020 8 2020-09-16 3 2 A 1 23/09/2010 9 2010-09-23 3 3 B 3 18/8/2010 7 2010-08-18 1 4 B 1 13/5/2010 6 2010-05-13 3 > txt.df[ -which( txt.df$nn <=1), ] Type ID Date Value dt2 nn 1 A 1 16/09/2020 8 2020-09-16 3 2 A 1 23/09/2010 9 2010-09-23 3 4 B 1 13/5/2010 6 2010-05-13 3 # Task #1 accomplished > tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) $`1` Time differences in days [1] 0 3646 3779 $`3` Time difference of 0 days > unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) ) 11 12 13 3 0 3646 3779 0 > txt.df$diffdays <- unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) ) > txt.df Type ID Date Value dt2 nn diffdays 1 A 1 16/09/2020 8 2020-09-16 3 0 2 A 1 23/09/2010 9 2010-09-23 3 3646 3 B 3 18/8/2010 7 2010-08-18 1 3779 4 B 1 13/5/2010 6 2010-05-13 3 0 >> > I would be very grateful for any words of advice. > > Many thanks, > Chris Beeley, > Institute of Mental Health, UK > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Abhijit Dasgupta, PhD
2010-Aug-24 17:56 UTC
[R] How to remove rows based on frequency of factor and then difference date scores
An answer to 1) > x = data.frame(Type=c('A','A','B','B'), ID=c(1,1,3,1), Date = c('16/09/2010','23/09/2010','18/8/2010','13/5/2010'), Value=c(8,9,7,6)) > x Type ID Date Value 1 A 1 16/09/2010 8 2 A 1 23/09/2010 9 3 B 3 18/8/2010 7 4 B 1 13/5/2010 6 > x$Date = as.Date(x$Date,format='%d/%m/%Y') > library(plyr) > x$uniqueID = paste(x$Type, x$ID, sep='') > nobs = daply(x, ~uniqueID, nrow) > keep = names(nobs)[nobs>1] > newx = x[x$uniqueID %in% keep,] An answer to 2) > require(plyr) > ddply(newx, ~uniqueID, transform, newDate = as.numeric(Date - min(Date)+1)) On 08/24/2010 01:19 PM, Chris Beeley wrote:> Hello- > > A basic question which has nonetheless floored me entirely. I have a > dataset which looks like this: > > Type ID Date Value > A 1 16/09/2020 8 > A 1 23/09/2010 9 > B 3 18/8/2010 7 > B 1 13/5/2010 6 > > There are two Types, which correspond to different individuals in > different conditions, and loads of ID labels (1:50) corresponding to > the different individuals in each condition, and measurements at > different times (from 1 to 10 measurements) for each individual. > > I want to perform the following operations: > > 1) Delete all individuals for whom only one measurement is available. > In the dataset above, you can see that I want to delete the row Type B > ID 3, and Type B ID 1, but without deleting the Type A ID 1 data > because there is more than one measurement for Type A ID 1 (but not > for Type B ID1) > > 2) Produce difference scores for each of the Dates, so each individual > (Type A ID1 and all the others for whom more than one measurement > exists) starts at Date "1" and goes up in integers according to how > many days have elapsed. > > I just know there's some incredibly cunning R-ish way of doing this > but after many hours of fiddling I have had to admit defeat. > > I would be very grateful for any words of advice. > > Many thanks, > Chris Beeley, > Institute of Mental Health, UK > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Abhijit Dasgupta, PhD Director and Principal Statistician ARAASTAT Ph: 301.385.3067 E: adasgupta at araastat.com W: http://www.araastat.com