Dave Hewitt
2008-Jan-24 04:31 UTC
[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?
I need to do a somewhat complex rearrangement of an existing dataframe that involves an "on-the-fly" conditional summary as well. I've tried to make the example as simple as possible. Code to produce the object being acted on is at the end of the message. I hope this isn't too much for a single posting. Any guidance will be much appreciated. The input example looks like this: survey year count location var1 var2 1 1 2 A1 21.2 1 2 1 0 A1 15.6 1 3 1 29 A1 12.1 1 1 1 11 B3 9 2 3 1 5 B3 4.8 2 1 2 7 A2 20.1 4 2 2 2 A2 19.2 4 I need to rearrange it to look like this (the order of the columns is not important): location year survey1 survey2 survey3 var1summ var2 A1 1 1 0 2 16.3 1 B3 1 2 NA 1 6.9 2 A2 2 1 1 NA 19.7 4 The operations/conditions that I need are: (A) The observations in "count" for each survey at each location in a given year need to be reduced according to the following rules and then placed in new columns "survey1", "survey2", "survey3" : -- when count = 0, surveyX = 0 -- when 0 < count < 11, surveyX = 1 -- when 11 < count, surveyX = 2 (B) The observations in var1 need to be summarized (here, by taking the mean) across all surveys for a given location in a given year and placed in "var1summ" (C) Observations in var2 need to be summarized in a similar way, except that they don't change across surveys for a given location in a given year (so mean, median, etc. would work, but a more general solution would be nice for variables that are categorical) (D) Surveys (rows) can be missing for a given location in a given year. I'd like to retain a column for each survey in the new dataframe and denote surveys that were missing as NA. E.g., survey2 is missing for location B3 in year 1. ** Object creation: x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1", "A2", "B3"), class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1, 19.2), var2 = c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey", "year", "count", "location", "var1", "var2"), class = "data.frame", row.names = c(NA, -7L)) Thanks, Dave Hewitt Fisheries Science Virginia Institute of Marine Science, Gloucester Point, VA, USA
Henrique Dallazuanna
2008-Jan-24 18:51 UTC
[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?
Try this: x$survey1[x$count==0] <- 0 x$survey1[x$count %in% 1:11] <- 1 x$survey1[x$count > 11] <- 2 x.wide <- reshape(x, v.names="survey1", idvar="location", timevar="survey1", direction="wide") x.wide <- x.wide[order(x.wide$location),] x.wide$var1 <- tapply(x$var1, x$location, mean) x.wide On 24/01/2008, Dave Hewitt <dhewitt at vims.edu> wrote:> I need to do a somewhat complex rearrangement of an existing dataframe that > involves an "on-the-fly" conditional summary as well. I've tried to make > the example as simple as possible. Code to produce the object being acted > on is at the end of the message. I hope this isn't too much for a single > posting. Any guidance will be much appreciated. > > The input example looks like this: > > survey year count location var1 var2 > 1 1 2 A1 21.2 1 > 2 1 0 A1 15.6 1 > 3 1 29 A1 12.1 1 > 1 1 11 B3 9 2 > 3 1 5 B3 4.8 2 > 1 2 7 A2 20.1 4 > 2 2 2 A2 19.2 4 > > I need to rearrange it to look like this (the order of the columns is not > important): > > location year survey1 survey2 survey3 var1summ var2 > A1 1 1 0 2 16.3 1 > B3 1 2 NA 1 6.9 2 > A2 2 1 1 NA 19.7 4 > > The operations/conditions that I need are: > > (A) The observations in "count" for each survey at each location in a given > year need to be reduced according to the following rules and then placed in > new columns "survey1", "survey2", "survey3" : > -- when count = 0, surveyX = 0 > -- when 0 < count < 11, surveyX = 1 > -- when 11 < count, surveyX = 2 > > (B) The observations in var1 need to be summarized (here, by taking the > mean) across all surveys for a given location in a given year and placed in > "var1summ" > > (C) Observations in var2 need to be summarized in a similar way, except > that they don't change across surveys for a given location in a given year > (so mean, median, etc. would work, but a more general solution would be > nice for variables that are categorical) > > (D) Surveys (rows) can be missing for a given location in a given year. I'd > like to retain a column for each survey in the new dataframe and denote > surveys that were missing as NA. E.g., survey2 is missing for location B3 > in year 1. > > > ** Object creation: > x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L, > 1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location > = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1", "A2", "B3"), > class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1, 19.2), var2 > c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey", "year", "count", > "location", "var1", "var2"), class = "data.frame", row.names = c(NA, -7L)) > > Thanks, > Dave Hewitt > Fisheries Science > Virginia Institute of Marine Science, Gloucester Point, VA, USA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
David Hewitt
2008-Jan-24 21:55 UTC
[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?
> Try this: > > x$survey1[x$count==0] <- 0 > x$survey1[x$count %in% 1:11] <- 1 > x$survey1[x$count > 11] <- 2 > x.wide <- reshape(x, v.names="survey1", idvar="location", > timevar="survey1", direction="wide") > x.wide <- x.wide[order(x.wide$location),] > x.wide$var1 <- tapply(x$var1, x$location, mean) > x.wide >Thanks a lot Henrique. Only needed two simple changes: (1) Made the range in the second assignment 1:10 and the operator in the second > 10 to conform with the rules I needed. (2) Changed "timevar" from "survey1" to "survey". Using "survey1" reordered the columns for the survey observations, and it was much easier to have survey1 first, survey2 second, and survey3 third. ----- David Hewitt Virginia Institute of Marine Science http://www.vims.edu/fish/students/dhewitt/ -- View this message in context: http://www.nabble.com/Reshaping-a-dataframe-with-conditional-summary-of-columns---apply-or-reshape--tp15058600p15075986.html Sent from the R help mailing list archive at Nabble.com.
David Hewitt
2008-Jan-24 22:08 UTC
[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?
>> x$survey1[x$count==0] <- 0 >> x$survey1[x$count %in% 1:11] <- 1 >> x$survey1[x$count > 11] <- 2 >> x.wide <- reshape(x, v.names="survey1", idvar="location", >> timevar="survey1", direction="wide") >> x.wide <- x.wide[order(x.wide$location),] >> x.wide$var1 <- tapply(x$var1, x$location, mean) >> x.wide > > Thanks a lot Henrique. Only needed two simple changes: > (1) Made the range in the second assignment 1:10 and the operator in the > second > 10 to conform with the rules I needed. > (2) Changed "timevar" from "survey1" to "survey". Using "survey1" > reordered the columns for the survey observations, and it was much easier > to have survey1 first, survey2 second, and survey3 third. >Oops, spoke too soon. Should have put this in the example, but the locations can be replicated in subsequent years (observations from the same place through time), so reshape drops all but the first. Any ideas? ----- David Hewitt Virginia Institute of Marine Science http://www.vims.edu/fish/students/dhewitt/ -- View this message in context: http://www.nabble.com/Reshaping-a-dataframe-with-conditional-summary-of-columns---apply-or-reshape--tp15058600p15076251.html Sent from the R help mailing list archive at Nabble.com.