Hello- I have a dataset which basically looks like this: Location Sex Date Time Verbal Self harm Violence_objects Violence A 1 1-4-2007 1800 3 0 1 3 A 1 1-4-2007 1230 2 1 2 4 D 2 2-4-2007 1100 0 4 0 0 ... I've put a dput of the first section of the data at the end of this email. Basically I have these data for several days across all of the dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on until 31-12-2009. The last four variables which you can see at the end of the email are my dependent variables, they are different types of violent and self harming behaviour shown by patients in a psychiatric hospital. What I want to do is: A) sum each of the dependent variables for each of the dates (so e.g. in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and 3+4=7 for each of the variables) B) do this sum, but only in each location this time (location is the first variable)- so the sum for 1-4-2007 in location A, sum for 1-4-2007 in location B, and so on and so on. Because this is divided across locations, some dates will have no data going into them and will return 0 sums. Crucially I still want these dates to appear- so e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 0 0, then 23-5-2008 0 0 0 0 again, and etc. I've had several abortive attempts and done some Googling but have got nowhere. I'd greatly appreciate any advice. Many thanks, Chris Beeley (Institute of Mental Health, UK) structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class = "factor"), Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07" ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L, 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, 7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00", "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00", "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00", "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00", "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00", "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00", "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00", "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00" ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA, 0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L, NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names = c("Location", "Sex", "Date", "Time", "verbal", "self.harm", "violence_objects", "violence"), class = "data.frame", row.names = c(NA, -73L))
Petr PIKAL
2010-Jul-02 15:55 UTC
[R] Odp: Problem with aggregating data across time points
Hi did you try aggregate?> aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)Group.1 verbal self.harm violence_objects violence 1 0 0 0 0 2 01/04/07 25 15 3 9 3 02/04/07 24 6 8 13 4 03/04/07 17 13 0 10> aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)BTW, why do you have empty rows? Regards Petr> Hello- > > I have a dataset which basically looks like this: > > Location Sex Date Time Verbal Self harm > Violence_objects Violence > A 1 1-4-2007 1800 3 0 > 1 3 > A 1 1-4-2007 1230 2 1 > 2 4 > D 2 2-4-2007 1100 0 4 > 0 0 > ... > > I've put a dput of the first section of the data at the end of this > email. Basically I have these data for several days across all of the > dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on > until 31-12-2009. The last four variables which you can see at the end > of the email are my dependent variables, they are different types of > violent and self harming behaviour shown by patients in a psychiatric > hospital. > > What I want to do is: > > A) sum each of the dependent variables for each of the dates (so e.g. > in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and > 3+4=7 for each of the variables) > > B) do this sum, but only in each location this time (location is the > first variable)- so the sum for 1-4-2007 in location A, sum for > 1-4-2007 in location B, and so on and so on. Because this is divided > across locations, some dates will have no data going into them and > will return 0 sums. Crucially I still want these dates to appear- so > e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 > 0 0, then 23-5-2008 0 0 0 0 again, and etc. > > I've had several abortive attempts and done some Googling but have got > nowhere. I'd greatly appreciate any advice. > > Many thanks, > Chris Beeley > (Institute of Mental Health, UK) > > > structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, > 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, > 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, > 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, > 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, > 4L, 4L, 4L), .Label = c("", "A", "B", "C", "D", "E", "F"), class ="factor"),> Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, > 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, > 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, > 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, > 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L > ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, > 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, > 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, > 4L, 4L, 4L, 4L), .Label = c("", "01/04/07", "02/04/07", "03/04/07" > ), class = "factor"), Time = structure(c(1L, 28L, 1L, 1L, > 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, > 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, > 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, > 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, > 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, > 7L, 12L, 1L, 15L), .Label = c("", " ", "02:24:00", "03:44:00", > "04:30:00", "07:00:00", "08:35:00", "09:20:00", "09:30:00", > "10:00:00", "10:15:00", "10:45:00", "11:00:00", "11:20:00", > "11:30:00", "11:35:00", "11:50:00", "12:00:00", "12:25:00", > "12:30:00", "12:45:00", "15:00:00", "15:15:00", "15:30:00", > "15:35:00", "17:15:00", "17:50:00", "18:00:00", "19:00:00", > "19:30:00", "19:50:00", "20:00:00", "20:30:00", "20:55:00", > "22:15:00", "22:30:00", "22:35:00", "22:40:00", "23:10:00" > ), class = "factor"), verbal = c(NA, 3L, NA, NA, NA, 3L, > 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, > NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, > NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, > 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, > 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, > NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, > 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, > 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, > NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, > 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, > 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, > NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, > 0L, 0L, NA, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, > 4L, 0L, 4L, NA, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, 0L, 0L, NA, > 0L, 0L, NA, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L), violence = c(NA, > 0L, NA, NA, NA, 0L, 1L, NA, NA, 3L, 0L, NA, 0L, 1L, 1L, 1L, > NA, 1L, 1L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, > 3L, 3L, NA, 2L, NA, NA, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, > 0L, 3L, 0L, NA, 0L, 0L, NA, 2L, 0L, NA, 0L, NA, 0L, 0L, NA, > 0L, 0L, NA, 0L, 0L, 0L, NA, 3L, 3L, 2L, 0L, 0L)), .Names =c("Location",> "Sex", "Date", "Time", "verbal", "self.harm", "violence_objects", > "violence"), class = "data.frame", row.names = c(NA, -73L)) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Allan Engelhardt
2010-Jul-02 16:09 UTC
[R] Problem with aggregating data across time points
On 02/07/10 16:21, Chris Beeley wrote:> Hello- > > I have a dataset which basically looks like this: > > Location Sex Date Time Verbal Self harm > Violence_objects Violence > A 1 1-4-2007 1800 3 0 > 1 3 > A 1 1-4-2007 1230 2 1 > 2 4 > D 2 2-4-2007 1100 0 4 > 0 0 > ... > > I've put a dput of the first section of the data at the end of this > email. [...] > > What I want to do is: > > A) sum each of the dependent variables for each of the dates (so e.g. > in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and > 3+4=7 for each of the variables) >If 'data' is the data at the end of your email, then> aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date, data = data, FUN = sum)Date verbal self.harm violence_objects violence 1 01/04/07 25 15 3 9 2 02/04/07 24 6 8 13 3 03/04/07 17 13 0 10 is one approach. Read help("aggregate") and don't forget the na.action= argument.> B) do this sum, but only in each location this time (location is the > first variable)- so the sum for 1-4-2007 in location A, sum for > 1-4-2007 in location B, and so on and so on. Because this is divided >The basic approach could be> aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date + Location, data = data, FUN = sum)Date Location verbal self.harm violence_objects violence 1 01/04/07 A 7 1 0 3 2 02/04/07 A 8 2 0 1 3 03/04/07 A 0 0 0 2 4 01/04/07 B 3 2 0 1 5 02/04/07 B 4 2 0 0 6 03/04/07 B 4 0 0 3 7 01/04/07 C 4 2 3 2 8 02/04/07 C 0 0 4 2 9 03/04/07 C 1 1 0 5 10 01/04/07 D 7 6 0 3 11 02/04/07 D 0 0 0 9 12 03/04/07 D 4 11 0 0 13 01/04/07 E 4 3 0 0 14 02/04/07 E 4 0 4 0 15 03/04/07 E 8 1 0 0 16 01/04/07 F 0 1 0 0 17 02/04/07 F 8 2 0 1> across locations, some dates will have no data going into them and > will return 0 sums. Crucially I still want these dates to appear- so > e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 > 0 0, then 23-5-2008 0 0 0 0 again, and etc. >Why? But variations on> data2<- data[!(as.numeric(data$Date)==3& data$Location=="B"),] # For example > z<- with(data2, tapply(verbal, list(Date,Location), FUN=sum)) > z[is.na(z)]<- 0 > print(z)A B C D E F 0 0 0 0 0 0 0 01/04/07 0 7 3 4 7 4 0 02/04/07 0 8 0 0 0 4 8 03/04/07 0 0 4 1 4 8 0 will perhaps work for you. Hope this helps Allan
Possibly Parallel Threads
- data summary and some automated t.tests.
- Heatmap in R and/or ggplot2
- lattice --- different properties of lines corresponding to type=c("l", "a") respectively
- DPLYR Multiple Mutate Statements On Same DataFrame
- problem in plotting numeric x by POSIXt class with lattice