thr3ads.net - R help - [R] Reshaping a dataframe with conditional summary of columns

If this information is useful, please help other people find it:
Share via:

Dave Hewitt

2008-Jan-24 04:31 UTC

[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

I need to do a somewhat complex rearrangement of an existing dataframe that 
involves an "on-the-fly" conditional summary as well. I've tried
to make
the example as simple as possible. Code to produce the object being acted 
on is at the end of the message. I hope this isn't too much for a single 
posting. Any guidance will be much appreciated.

The input example looks like this:

survey	year	count	location	var1	var2
1	1	2	A1	21.2	1
2	1	0	A1	15.6	1
3	1	29	A1	12.1	1
1	1	11	B3	9	2
3	1	5	B3	4.8	2
1	2	7	A2	20.1	4
2	2	2	A2	19.2	4

I need to rearrange it to look like this (the order of the columns is not 
important):

location	year	survey1	survey2	survey3	var1summ	var2
A1	1	1	0	2	16.3	1
B3	1	2	NA	1	6.9	2
A2	2	1	1	NA	19.7	4

The operations/conditions that I need are:

(A) The observations in "count" for each survey at each location in a
given
year need to be reduced according to the following rules and then placed in 
new columns "survey1", "survey2", "survey3" :
-- when  count = 0,  surveyX = 0
-- when  0 < count < 11,  surveyX = 1
-- when  11 < count,  surveyX = 2

(B) The observations in var1 need to be summarized (here, by taking the 
mean) across all surveys for a given location in a given year and placed in 
"var1summ"

(C) Observations in var2 need to be summarized in a similar way, except 
that they don't change across surveys for a given location in a given year 
(so mean, median, etc. would work, but a more general solution would be 
nice for variables that are categorical)

(D) Surveys (rows) can be missing for a given location in a given year. I'd 
like to retain a column for each survey in the new dataframe and denote 
surveys that were missing as NA. E.g., survey2 is missing for location B3 
in year 1.


** Object creation:
x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L, 
1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location 
= structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1",
"A2", "B3"),
class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1, 19.2),
var2 =
c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey", "year",
"count",
"location", "var1", "var2"), class =
"data.frame", row.names = c(NA, -7L))

Thanks,
Dave Hewitt
Fisheries Science
Virginia Institute of Marine Science, Gloucester Point, VA, USA

Henrique Dallazuanna

2008-Jan-24 18:51 UTC

head link

[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

Try this:

x$survey1[x$count==0] <- 0
x$survey1[x$count %in% 1:11] <- 1
x$survey1[x$count > 11] <- 2
 x.wide <- reshape(x, v.names="survey1",
idvar="location",
timevar="survey1", direction="wide")
x.wide <- x.wide[order(x.wide$location),]
x.wide$var1 <- tapply(x$var1, x$location, mean)
x.wide



On 24/01/2008, Dave Hewitt <dhewitt at vims.edu>
wrote:> I need to do a somewhat complex rearrangement of an existing dataframe that
> involves an "on-the-fly" conditional summary as well. I've
tried to make
> the example as simple as possible. Code to produce the object being acted
> on is at the end of the message. I hope this isn't too much for a
single
> posting. Any guidance will be much appreciated.
>
> The input example looks like this:
>
> survey  year    count   location        var1    var2
> 1       1       2       A1      21.2    1
> 2       1       0       A1      15.6    1
> 3       1       29      A1      12.1    1
> 1       1       11      B3      9       2
> 3       1       5       B3      4.8     2
> 1       2       7       A2      20.1    4
> 2       2       2       A2      19.2    4
>
> I need to rearrange it to look like this (the order of the columns is not
> important):
>
> location        year    survey1 survey2 survey3 var1summ        var2
> A1      1       1       0       2       16.3    1
> B3      1       2       NA      1       6.9     2
> A2      2       1       1       NA      19.7    4
>
> The operations/conditions that I need are:
>
> (A) The observations in "count" for each survey at each location
in a given
> year need to be reduced according to the following rules and then placed in
> new columns "survey1", "survey2", "survey3" :
> -- when  count = 0,  surveyX = 0
> -- when  0 < count < 11,  surveyX = 1
> -- when  11 < count,  surveyX = 2
>
> (B) The observations in var1 need to be summarized (here, by taking the
> mean) across all surveys for a given location in a given year and placed in
> "var1summ"
>
> (C) Observations in var2 need to be summarized in a similar way, except
> that they don't change across surveys for a given location in a given
year
> (so mean, median, etc. would work, but a more general solution would be
> nice for variables that are categorical)
>
> (D) Surveys (rows) can be missing for a given location in a given year.
I'd
> like to retain a column for each survey in the new dataframe and denote
> surveys that were missing as NA. E.g., survey2 is missing for location B3
> in year 1.
>
>
> ** Object creation:
> x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L,
> 1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location
> = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1",
"A2", "B3"),
> class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1,
19.2), var2 > c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey",
"year", "count",
> "location", "var1", "var2"), class =
"data.frame", row.names = c(NA, -7L))
>
> Thanks,
> Dave Hewitt
> Fisheries Science
> Virginia Institute of Marine Science, Gloucester Point, VA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

David Hewitt

2008-Jan-24 21:55 UTC

head link

[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?

> Try this:
> 
> x$survey1[x$count==0] <- 0
> x$survey1[x$count %in% 1:11] <- 1
> x$survey1[x$count > 11] <- 2
>  x.wide <- reshape(x, v.names="survey1",
idvar="location",
> timevar="survey1", direction="wide")
> x.wide <- x.wide[order(x.wide$location),]
> x.wide$var1 <- tapply(x$var1, x$location, mean)
> x.wide
> 
Thanks a lot Henrique. Only needed two simple changes:
(1) Made the range in the second assignment 1:10 and the operator in the
second > 10 to conform with the rules I needed.
(2) Changed "timevar" from "survey1" to "survey".
Using "survey1" reordered
the columns for the survey observations, and it was much easier to have
survey1 first, survey2 second, and survey3 third.


-----
David Hewitt
Virginia Institute of Marine Science
http://www.vims.edu/fish/students/dhewitt/
-- 
View this message in context:
http://www.nabble.com/Reshaping-a-dataframe-with-conditional-summary-of-columns---apply-or-reshape--tp15058600p15075986.html
Sent from the R help mailing list archive at Nabble.com.

David Hewitt

2008-Jan-24 22:08 UTC

head link

[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?

>> x$survey1[x$count==0] <- 0
>> x$survey1[x$count %in% 1:11] <- 1
>> x$survey1[x$count > 11] <- 2
>>  x.wide <- reshape(x, v.names="survey1",
idvar="location",
>> timevar="survey1", direction="wide")
>> x.wide <- x.wide[order(x.wide$location),]
>> x.wide$var1 <- tapply(x$var1, x$location, mean)
>> x.wide
> 
> Thanks a lot Henrique. Only needed two simple changes:
> (1) Made the range in the second assignment 1:10 and the operator in the
> second > 10 to conform with the rules I needed.
> (2) Changed "timevar" from "survey1" to
"survey". Using "survey1"
> reordered the columns for the survey observations, and it was much easier
> to have survey1 first, survey2 second, and survey3 third.
> 
Oops, spoke too soon. Should have put this in the example, but the locations
can be replicated in subsequent years (observations from the same place
through time), so reshape drops all but the first. Any ideas?


-----
David Hewitt
Virginia Institute of Marine Science
http://www.vims.edu/fish/students/dhewitt/
-- 
View this message in context:
http://www.nabble.com/Reshaping-a-dataframe-with-conditional-summary-of-columns---apply-or-reshape--tp15058600p15076251.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jan 2008 - Reshaping a dataframe with conditional summary of columns - apply or reshape?

[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?

[R] Re shaping a dataframe with conditional summary of columns - apply or reshape?

Possibly Parallel Threads