stephen sefick
2008-Sep-15 16:14 UTC
[R] getting data into correct format for summarizing ... reshape, aggregate, or...
I would like to reformat this data frame into something that I can produce some descriptive statistics. I have been playing around with the reshape package and maybe this is not the best way to proceed. I would like to use RiverMile and constituent as the grouping variables to get the summary statistics: 198a 198b mean mean sd sd ... ... etc. for all of these. I have tried reshape and aggregate and I am sure that I am missing something... below is a naive attempt at making a data frame with the columns in the correct class- This can be improved also. There are NA in the real data set, but I didn't know how to randomly intersperse NA in a created matrix. I hope this makes sense. If it doesn't I will go back to the drawing board and try and clarify this. value <- rnorm(30) RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10), rep(198, length.out=10)) constituent <- c (rep("a", length.out=5), rep("b", length.out=5), rep("a", length.out=5), rep("b", length.out=5), rep("a", length.out=5), rep("b", length.out=5)) df <- cbind(as.integer(RiverMile), as.factor(constituent), as.numeric(value)) df.1 <- as.data.frame(df) df.1[,"V1"] <- as.integer(df.1[,"V1"]) df.1[,"V2"] <- as.factor(df.1[,"V2"]) df.1[,"V3"] <- as.numeric(df.1[,"V3"]) colnames(df.1) <- c("RiverMile", "constituent", "value") -- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis
Sebastian P. Luque
2008-Sep-15 16:39 UTC
[R] getting data into correct format for summarizing ... reshape, aggregate, or...
On Mon, 15 Sep 2008 12:14:40 -0400, "stephen sefick" <ssefick at gmail.com> wrote:> I would like to reformat this data frame into something that I can > produce some descriptive statistics. I have been playing around with > the reshape package and maybe this is not the best way to proceed. I > would like to use RiverMile and constituent as the grouping variables > to get the summary statistics:> 198a 198b mean mean sd sd ... ...> etc. for all of these. I have tried reshape and aggregate and I am > sure that I am missing something...df <- data.frame(RiverMile=c(rep(215, 10), rep(202, 10), rep(198, 10)), constituent=gl(2, 5, 30, labels=letters[1:2]), value=rnorm(30)) by(df, list(df[[1]], df[[2]]), summary) # or build your summary function ---<---------------cut here---------------end---------------->--- ? -- Seb
Gabor Grothendieck
2008-Sep-15 16:41 UTC
[R] getting data into correct format for summarizing ... reshape, aggregate, or...
Try this:> library(doBy) > # make RiverMile a factor > df.1f <- transform(df.1, RiverMile = as.factor(RiverMile)) > summaryBy(value ~., df.1f, FUN = c(mean, sd))RiverMile constituent value.mean value.sd 1 198 1 -0.06015032 0.8690358 2 198 2 -0.38923255 0.5147604 3 202 1 0.35731576 0.8280943 4 202 2 1.00463813 0.9272342 5 215 1 0.18249485 1.1861883 6 215 2 -0.10863353 0.7564736 On Mon, Sep 15, 2008 at 12:14 PM, stephen sefick <ssefick at gmail.com> wrote:> I would like to reformat this data frame into something that I can > produce some descriptive statistics. I have been playing around with > the reshape package and maybe this is not the best way to proceed. I > would like to use RiverMile and constituent as the grouping variables > to get the summary statistics: > > 198a 198b > mean mean > sd sd > ... ... > > etc. for all of these. > I have tried reshape and aggregate and I am sure that I am missing something... > > below is a naive attempt at making a data frame with the columns in > the correct class- This can be improved also. There are NA in the > real data set, but I didn't know how to randomly intersperse NA in a > created matrix. I hope this makes sense. If it doesn't I will go > back to the drawing board and try and clarify this. > > value <- rnorm(30) > RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10), > rep(198, length.out=10)) > constituent <- c (rep("a", length.out=5), rep("b", length.out=5), > rep("a", length.out=5), rep("b", length.out=5), rep("a", > length.out=5), rep("b", length.out=5)) > df <- cbind(as.integer(RiverMile), as.factor(constituent), as.numeric(value)) > df.1 <- as.data.frame(df) > df.1[,"V1"] <- as.integer(df.1[,"V1"]) > df.1[,"V2"] <- as.factor(df.1[,"V2"]) > df.1[,"V3"] <- as.numeric(df.1[,"V3"]) > colnames(df.1) <- c("RiverMile", "constituent", "value") > > > -- > Stephen Sefick > Research Scientist > Southeastern Natural Sciences Academy > > Let's not spend our time and resources thinking about things that are > so little or so large that all they really do for us is puff us up and > make us feel like gods. We are mammals, and have not exhausted the > annoying little problems of being mammals. > > -K. Mullis > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
John Kane
2008-Sep-15 16:48 UTC
[R] getting data into correct format for summarizing ... reshape, aggregate, or...
I think your problem is coming from the cbind. You are forcing the data into a matrix not a data.frame. Neither aggregate or cast will work on that matrix. Do a str(df1) or class(df1) and you will see what is happening Try this using the reshape package. Note the code runs but I have not verified the results. The function approach comes from Hadley's vignette at had.co.nz/reshape/introduction.pdf . ===================================================================== df1 <- data.frame(RiverMile, constituent, value) cast(df1, RiverMile + constituent ~ ., function(x) c(means= mean(x),SD=sd(x))) ==================================================================== --- On Mon, 9/15/08, stephen sefick <ssefick at gmail.com> wrote:> From: stephen sefick <ssefick at gmail.com> > Subject: [R] getting data into correct format for summarizing ... reshape, aggregate, or... > To: "R-help Mailing List" <r-help at r-project.org> > Received: Monday, September 15, 2008, 12:14 PM > I would like to reformat this data frame into something that > I can > produce some descriptive statistics. I have been playing > around with > the reshape package and maybe this is not the best way to > proceed. I > would like to use RiverMile and constituent as the grouping > variables > to get the summary statistics: > > 198a 198b > mean mean > sd sd > ... ... > > etc. for all of these. > I have tried reshape and aggregate and I am sure that I am > missing something... > > below is a naive attempt at making a data frame with the > columns in > the correct class- This can be improved also. There are > NA in the > real data set, but I didn't know how to randomly > intersperse NA in a > created matrix. I hope this makes sense. If it > doesn't I will go > back to the drawing board and try and clarify this. > > value <- rnorm(30) > RiverMile <- c(rep(215, length.out=10), rep(202, > length.out=10), > rep(198, length.out=10)) > constituent <- c (rep("a", length.out=5), > rep("b", length.out=5), > rep("a", length.out=5), rep("b", > length.out=5), rep("a", > length.out=5), rep("b", length.out=5)) > df <- cbind(as.integer(RiverMile), > as.factor(constituent), as.numeric(value)) > df.1 <- as.data.frame(df) > df.1[,"V1"] <- > as.integer(df.1[,"V1"]) > df.1[,"V2"] <- > as.factor(df.1[,"V2"]) > df.1[,"V3"] <- > as.numeric(df.1[,"V3"]) > colnames(df.1) <- c("RiverMile", > "constituent", "value") > > > -- > Stephen Sefick > Research Scientist > Southeastern Natural Sciences Academy > > Let's not spend our time and resources thinking about > things that are > so little or so large that all they really do for us is > puff us up and > make us feel like gods. We are mammals, and have not > exhausted the > annoying little problems of being mammals. > > -K. Mullis > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.__________________________________________________________________ [[elided Yahoo spam]]
Petr PIKAL
2008-Sep-16 07:25 UTC
[R] Odp: getting data into correct format for summarizing ... reshape, aggregate, or...
Hi Another possibility is to use split - sapply construction sapply(split(df.1[,3], list(df.1$RiverMile, df.1$constituent)), summary) Regards Petr Pikal petr.pikal at precheza.cz 724008364, 581252140, 581252257 r-help-bounces at r-project.org napsal dne 15.09.2008 18:14:40:> I would like to reformat this data frame into something that I can > produce some descriptive statistics. I have been playing around with > the reshape package and maybe this is not the best way to proceed. I > would like to use RiverMile and constituent as the grouping variables > to get the summary statistics: > > 198a 198b > mean mean > sd sd > ... ... > > etc. for all of these. > I have tried reshape and aggregate and I am sure that I am missingsomething...> > below is a naive attempt at making a data frame with the columns in > the correct class- This can be improved also. There are NA in the > real data set, but I didn't know how to randomly intersperse NA in a > created matrix. I hope this makes sense. If it doesn't I will go > back to the drawing board and try and clarify this. > > value <- rnorm(30) > RiverMile <- c(rep(215, length.out=10), rep(202, length.out=10), > rep(198, length.out=10)) > constituent <- c (rep("a", length.out=5), rep("b", length.out=5), > rep("a", length.out=5), rep("b", length.out=5), rep("a", > length.out=5), rep("b", length.out=5)) > df <- cbind(as.integer(RiverMile), as.factor(constituent),as.numeric(value))> df.1 <- as.data.frame(df) > df.1[,"V1"] <- as.integer(df.1[,"V1"]) > df.1[,"V2"] <- as.factor(df.1[,"V2"]) > df.1[,"V3"] <- as.numeric(df.1[,"V3"]) > colnames(df.1) <- c("RiverMile", "constituent", "value") > > > -- > Stephen Sefick > Research Scientist > Southeastern Natural Sciences Academy > > Let's not spend our time and resources thinking about things that are > so little or so large that all they really do for us is puff us up and > make us feel like gods. We are mammals, and have not exhausted the > annoying little problems of being mammals. > > -K. Mullis > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.