thr3ads.net - R help - [R] getting data into correct format for summarizing ... reshape, aggregate, or... [Sep 2008]

If this information is useful, please help other people find it:
Share via:

stephen sefick

2008-Sep-15 16:14 UTC

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

I would like to reformat this data frame into something that I can
produce some descriptive statistics.  I have been playing around with
the reshape package and maybe this is not the best way to proceed.  I
would like to use RiverMile and constituent as the grouping variables
to get the summary statistics:

198a    198b
mean   mean
sd       sd
...        ...

etc. for all of these.
I have tried reshape and aggregate and I am sure that I am missing something...

below is a naive attempt at making a data frame with the columns in
the correct class-  This can be improved also.  There are NA in the
real data set, but I didn't know how to randomly intersperse NA in a
created matrix.  I hope this makes sense.  If it doesn't I will go
back to the drawing board and try and clarify this.

value <- rnorm(30)
RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10),
rep(198, length.out=10))
constituent <- c (rep("a", length.out=5), rep("b",
length.out=5),
rep("a", length.out=5), rep("b", length.out=5),
rep("a",
length.out=5), rep("b", length.out=5))
df <- cbind(as.integer(RiverMile), as.factor(constituent), as.numeric(value))
df.1 <- as.data.frame(df)
df.1[,"V1"] <- as.integer(df.1[,"V1"])
df.1[,"V2"] <- as.factor(df.1[,"V2"])
df.1[,"V3"] <- as.numeric(df.1[,"V3"])
colnames(df.1) <- c("RiverMile", "constituent",
"value")


-- 
Stephen Sefick
Research Scientist
Southeastern Natural Sciences Academy

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods. We are mammals, and have not exhausted the
annoying little problems of being mammals.

	-K. Mullis

Sebastian P. Luque

2008-Sep-15 16:39 UTC

head link

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

On Mon, 15 Sep 2008 12:14:40 -0400,
"stephen sefick" <ssefick at gmail.com> wrote:
> I would like to reformat this data frame into something that I can
> produce some descriptive statistics.  I have been playing around with
> the reshape package and maybe this is not the best way to proceed.  I
> would like to use RiverMile and constituent as the grouping variables
> to get the summary statistics:
> 198a 198b mean mean sd sd ...  ...
> etc. for all of these.  I have tried reshape and aggregate and I am
> sure that I am missing something...
df <- data.frame(RiverMile=c(rep(215, 10), rep(202, 10), rep(198, 10)),
                 constituent=gl(2, 5, 30, labels=letters[1:2]),
                 value=rnorm(30))

by(df, list(df[[1]], df[[2]]), summary) # or build your summary function
---<---------------cut here---------------end---------------->---

?

-- 
Seb

Gabor Grothendieck

2008-Sep-15 16:41 UTC

head link

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

Try this:
> library(doBy)
> # make RiverMile a factor
> df.1f <- transform(df.1, RiverMile = as.factor(RiverMile))
> summaryBy(value ~., df.1f, FUN = c(mean, sd))  RiverMile constituent  value.mean  value.sd
1       198           1 -0.06015032 0.8690358
2       198           2 -0.38923255 0.5147604
3       202           1  0.35731576 0.8280943
4       202           2  1.00463813 0.9272342
5       215           1  0.18249485 1.1861883
6       215           2 -0.10863353 0.7564736


On Mon, Sep 15, 2008 at 12:14 PM, stephen sefick <ssefick at gmail.com>
wrote:> I would like to reformat this data frame into something that I can
> produce some descriptive statistics.  I have been playing around with
> the reshape package and maybe this is not the best way to proceed.  I
> would like to use RiverMile and constituent as the grouping variables
> to get the summary statistics:
>
> 198a    198b
> mean   mean
> sd       sd
> ...        ...
>
> etc. for all of these.
> I have tried reshape and aggregate and I am sure that I am missing
something...
>
> below is a naive attempt at making a data frame with the columns in
> the correct class-  This can be improved also.  There are NA in the
> real data set, but I didn't know how to randomly intersperse NA in a
> created matrix.  I hope this makes sense.  If it doesn't I will go
> back to the drawing board and try and clarify this.
>
> value <- rnorm(30)
> RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10),
> rep(198, length.out=10))
> constituent <- c (rep("a", length.out=5), rep("b",
length.out=5),
> rep("a", length.out=5), rep("b", length.out=5),
rep("a",
> length.out=5), rep("b", length.out=5))
> df <- cbind(as.integer(RiverMile), as.factor(constituent),
as.numeric(value))
> df.1 <- as.data.frame(df)
> df.1[,"V1"] <- as.integer(df.1[,"V1"])
> df.1[,"V2"] <- as.factor(df.1[,"V2"])
> df.1[,"V3"] <- as.numeric(df.1[,"V3"])
> colnames(df.1) <- c("RiverMile", "constituent",
"value")
>
>
> --
> Stephen Sefick
> Research Scientist
> Southeastern Natural Sciences Academy
>
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods. We are mammals, and have not exhausted the
> annoying little problems of being mammals.
>
>        -K. Mullis
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

John Kane

2008-Sep-15 16:48 UTC

head link

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

I think your problem is coming from the cbind.  You are forcing the data into a
matrix not a data.frame. Neither aggregate or cast will work on that matrix.

Do a str(df1) or class(df1) and you will see what is happening

Try this using the reshape package.  Note the code runs but I have not verified
the results. The function approach comes from Hadley's vignette at
had.co.nz/reshape/introduction.pdf .
===================================================================== 

df1 <- data.frame(RiverMile, constituent, value)
cast(df1, RiverMile + constituent ~ ., function(x) c(means= mean(x),SD=sd(x)))
====================================================================

--- On Mon, 9/15/08, stephen sefick <ssefick at gmail.com> wrote:
> From: stephen sefick <ssefick at gmail.com>
> Subject: [R] getting data into correct format for summarizing ... reshape,
aggregate, or...
> To: "R-help Mailing List" <r-help at r-project.org>
> Received: Monday, September 15, 2008, 12:14 PM
> I would like to reformat this data frame into something that
> I can
> produce some descriptive statistics.  I have been playing
> around with
> the reshape package and maybe this is not the best way to
> proceed.  I
> would like to use RiverMile and constituent as the grouping
> variables
> to get the summary statistics:
> 
> 198a    198b
> mean   mean
> sd       sd
> ...        ...
> 
> etc. for all of these.
> I have tried reshape and aggregate and I am sure that I am
> missing something...
> 
> below is a naive attempt at making a data frame with the
> columns in
> the correct class-  This can be improved also.  There are
> NA in the
> real data set, but I didn't know how to randomly
> intersperse NA in a
> created matrix.  I hope this makes sense.  If it
> doesn't I will go
> back to the drawing board and try and clarify this.
> 
> value <- rnorm(30)
> RiverMile <- c(rep(215, length.out=10), rep(202,
> length.out=10),
> rep(198, length.out=10))
> constituent <- c (rep("a", length.out=5),
> rep("b", length.out=5),
> rep("a", length.out=5), rep("b",
> length.out=5), rep("a",
> length.out=5), rep("b", length.out=5))
> df <- cbind(as.integer(RiverMile),
> as.factor(constituent), as.numeric(value))
> df.1 <- as.data.frame(df)
> df.1[,"V1"] <-
> as.integer(df.1[,"V1"])
> df.1[,"V2"] <-
> as.factor(df.1[,"V2"])
> df.1[,"V3"] <-
> as.numeric(df.1[,"V3"])
> colnames(df.1) <- c("RiverMile",
> "constituent", "value")
> 
> 
> -- 
> Stephen Sefick
> Research Scientist
> Southeastern Natural Sciences Academy
> 
> Let's not spend our time and resources thinking about
> things that are
> so little or so large that all they really do for us is
> puff us up and
> make us feel like gods. We are mammals, and have not
> exhausted the
> annoying little problems of being mammals.
> 
> 	-K. Mullis
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.

      __________________________________________________________________
[[elided Yahoo spam]]

Petr PIKAL

2008-Sep-16 07:25 UTC

head link

[R] Odp: getting data into correct format for summarizing ... reshape, aggregate, or...

Hi

Another possibility is to use split - sapply construction

sapply(split(df.1[,3],  list(df.1$RiverMile, df.1$constituent)), summary)

Regards

Petr Pikal
petr.pikal at precheza.cz
724008364, 581252140, 581252257


r-help-bounces at r-project.org napsal dne 15.09.2008 18:14:40:
> I would like to reformat this data frame into something that I can
> produce some descriptive statistics.  I have been playing around with
> the reshape package and maybe this is not the best way to proceed.  I
> would like to use RiverMile and constituent as the grouping variables
> to get the summary statistics:
> 
> 198a    198b
> mean   mean
> sd       sd
> ...        ...
> 
> etc. for all of these.
> I have tried reshape and aggregate and I am sure that I am missing 
something...> 
> below is a naive attempt at making a data frame with the columns in
> the correct class-  This can be improved also.  There are NA in the
> real data set, but I didn't know how to randomly intersperse NA in a
> created matrix.  I hope this makes sense.  If it doesn't I will go
> back to the drawing board and try and clarify this.
> 
> value <- rnorm(30)
> RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10),
> rep(198, length.out=10))
> constituent <- c (rep("a", length.out=5), rep("b",
length.out=5),
> rep("a", length.out=5), rep("b", length.out=5),
rep("a",
> length.out=5), rep("b", length.out=5))
> df <- cbind(as.integer(RiverMile), as.factor(constituent), 
as.numeric(value))> df.1 <- as.data.frame(df)
> df.1[,"V1"] <- as.integer(df.1[,"V1"])
> df.1[,"V2"] <- as.factor(df.1[,"V2"])
> df.1[,"V3"] <- as.numeric(df.1[,"V3"])
> colnames(df.1) <- c("RiverMile", "constituent",
"value")
> 
> 
> -- 
> Stephen Sefick
> Research Scientist
> Southeastern Natural Sciences Academy
> 
> Let's not spend our time and resources thinking about things that are
> so little or so large that all they really do for us is puff us up and
> make us feel like gods. We are mammals, and have not exhausted the
> annoying little problems of being mammals.
> 
>    -K. Mullis
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Sep 2008 - getting data into correct format for summarizing ... reshape, aggregate, or...

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

[R] getting data into correct format for summarizing ... reshape, aggregate, or...

[R] Odp: getting data into correct format for summarizing ... reshape, aggregate, or...

Possibly Parallel Threads