Gerit Offermann
2008-Nov-20 14:16 UTC
[R] summary statistics into table/data base, many factors to analyse
Dear list, I reduced my data to the following: x <- c(1,4,2,6,8,3,4,2,4,5,1,3) y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2)) z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3)) I can produce the statistical summary just fine. s1 <- tapply(x, y, summary) d1 <- tapply(x, y, sd) s2 <- tapply(x, z, summary) d2 <- tapply(x, z, sd) First thing: I have 100 plus factors to analyse. Theirs names are f1001 to f1381 (about). Is there a way to avoid having to write these lines 100 plus times? Second thing: How can I put the standard deviation and the summary statistics into one output? Third thing: In the end I want to write the summary statistics into a data base (Access). It would be fantastic if I could achieve a table such as: factor level Min. 1st Qu. Median Mean 3rd Qu. Max. SDev. y 1 1.000 2.000 3.000 3.833 5.500 8.000 2.714160 y 2 1.000 3.000 3.500 3.333 4.000 5.000 1.366260 z 1 1.0 3.5 6.0 5.0 7.0 8.0 3.6055513 . . . I tried to unlist the matrices, but it did not help much. it <- NULL # "it" - Iterationen for (i in 1:nlevels(z)){ it[[i]] <- unlist(s1[[i]])} Help to any of the three points is greatly appreciated. Cheers, Gerit --
Gabor Grothendieck
2008-Nov-20 14:32 UTC
[R] summary statistics into table/data base, many factors to analyse
Look at summaryBy in the doBy package. On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann <gerit.offermann at gmx.de> wrote:> Dear list, > > I reduced my data to the following: > > x <- c(1,4,2,6,8,3,4,2,4,5,1,3) > y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2)) > z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3)) > > I can produce the statistical summary just fine. > s1 <- tapply(x, y, summary) > d1 <- tapply(x, y, sd) > s2 <- tapply(x, z, summary) > d2 <- tapply(x, z, sd) > > First thing: > I have 100 plus factors to analyse. Theirs names are f1001 to f1381 (about). > Is there a way to avoid having to write these lines 100 plus times? > > Second thing: > How can I put the standard deviation and the summary statistics into one output? > > Third thing: > In the end I want to write the summary statistics into a data base (Access). It would be fantastic if I could achieve a table such as: > > factor level Min. 1st Qu. Median Mean 3rd Qu. Max. SDev. > y 1 1.000 2.000 3.000 3.833 5.500 8.000 2.714160 > y 2 1.000 3.000 3.500 3.333 4.000 5.000 1.366260 > z 1 1.0 3.5 6.0 5.0 7.0 8.0 3.6055513 > . > . > . > > I tried to unlist the matrices, but it did not help much. > it <- NULL # "it" - Iterationen > > for (i in 1:nlevels(z)){ > it[[i]] <- unlist(s1[[i]])} > > > Help to any of the three points is greatly appreciated. > > Cheers, > Gerit > -- > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jorge Ivan Velez
2008-Nov-20 15:06 UTC
[R] summary statistics into table/data base, many factors to analyse
Dear Gerit, Here is a start using a data set which first column is numeric and the rest are factors 'f1', 'f2',....,'f1381' (I'm using only 3): # Data set x <- c(1,4,2,6,8,3,4,2,4,5,1,3) y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2)) z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3)) mydata=data.frame(x,y,z) mydata # Function foo=function(FACTOR) do.call(rbind,tapply(x,FACTOR,function(w) c(summary(w),SD=sd(w)))) # Calculations res=apply(mydata[,-1],2,foo) res2=do.call(rbind,res) rnames=rownames(res2) rownames(res2)<-NULL # Output final=data.frame(Factor=rep(names(res),lapply(res,function(x) nrow(x))),Levels=rnames,res2) colnames(final)=c('Factor','Level',c('Min.','1st.Qu.','Median','Mean','3rd.Qu.','Max.','SD')) final See ?tapply and ?do.call for details. HTH, Jorge On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann <gerit.offermann@gmx.de>wrote:> Dear list, > > I reduced my data to the following: > > x <- c(1,4,2,6,8,3,4,2,4,5,1,3) > y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2)) > z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3)) > > I can produce the statistical summary just fine. > s1 <- tapply(x, y, summary) > d1 <- tapply(x, y, sd) > s2 <- tapply(x, z, summary) > d2 <- tapply(x, z, sd) > > First thing: > I have 100 plus factors to analyse. Theirs names are f1001 to f1381 > (about). > Is there a way to avoid having to write these lines 100 plus times? > > Second thing: > How can I put the standard deviation and the summary statistics into one > output? > > Third thing: > In the end I want to write the summary statistics into a data base > (Access). It would be fantastic if I could achieve a table such as: > > factor level Min. 1st Qu. Median Mean 3rd Qu. Max. SDev. > y 1 1.000 2.000 3.000 3.833 5.500 8.000 2.714160 > y 2 1.000 3.000 3.500 3.333 4.000 5.000 1.366260 > z 1 1.0 3.5 6.0 5.0 7.0 8.0 > 3.6055513 > . > . > . > > I tried to unlist the matrices, but it did not help much. > it <- NULL # "it" - Iterationen > > for (i in 1:nlevels(z)){ > it[[i]] <- unlist(s1[[i]])} > > > Help to any of the three points is greatly appreciated. > > Cheers, > Gerit > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Gerit Offermann
2008-Nov-21 10:50 UTC
[R] summary statistics into table/data base, many factors to analyse
Dear list, thanks to your help I managed to find means of analysing my data. However, the whole data set contains 264 variables. Of which some are factors, others are not. The factors tend to be grouped, e.g. data$f1304 to data$f1484 and data$f3204 to data$5408. But there are other types of variables in the data set as well, e.g. data$f1504. Not every spot is taken, i.e data$f1345 to data$1399 might not exist in the data set. The solution "summaryBy" works for cross analysis, of which there is a handful. So I am not worried here. The solution from Jorge is fine. However, I am trying to get my head around how to efficiently reduce my data set to the dependet variable and the factors such that the solution is applicable. Having to type each variable into my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003... is an obvious option, but does not seem to be the most efficient one. Are there better ways to go about? Thanks, Gerit -- Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
Petr PIKAL
2008-Nov-21 13:44 UTC
[R] summary statistics into table/data base, many factors to analyse
Hi r-help-bounces at r-project.org napsal dne 21.11.2008 11:50:52:> Dear list, > > thanks to your help I managed to find means of analysing my data. > > However, the whole data set contains 264 variables. Of which some are > factors, others are not. The factors tend to be grouped, e.g. > data$f1304 to data$f1484 and data$f3204 to data$5408. > > But there are other types of variables in the data set as well, > e.g. data$f1504. > > Not every spot is taken, i.e data$f1345 to data$1399 might not exist > in the data set. > > The solution "summaryBy" works for cross analysis, of which there is > a handful. So I am not worried here. > > The solution from Jorge is fine. > However, I am trying to get my head around how to efficiently > reduce my data set to the dependet variable and the factors such that > the solution is applicable. > > Having to type each variable into > my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003... > is an obvious option, but does not seem to be the most efficient one.Maybe not so obvious. How did you get your data into R? By some read.* command? Then it shall be data frame with appropriate column type. see str(mydata) and you can choose only columns you really want by mydata[, select.some.columns] If your data is a list (see Intro manual for data types and its properties), then the transformation to data frame depends partly on how it looks like and if it has the same number of values. do.call("cbind", mydata) shall combine all vectors in mydata however it will convert them to unique type as cbind produce matrix which has to have only one type of data. If all variables have same length do.call("data.frame", mydata) will produce data frame and all variables shall be preserved in their respective type. Regards Petr> > Are there better ways to go about? > > Thanks, > Gerit > -- > Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL > f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2008-Nov-22 08:57 UTC
[R] summary statistics into table/data base, many factors to analyse
On Fri, Nov 21, 2008 at 5:50 AM, Gerit Offermann <gerit.offermann at gmx.de> wrote:> Dear list, > > thanks to your help I managed to find means of analysing my data. > > However, the whole data set contains 264 variables. Of which some are > factors, others are not. The factors tend to be grouped, e.g. > data$f1304 to data$f1484 and data$f3204 to data$5408. > > But there are other types of variables in the data set as well, > e.g. data$f1504. > > Not every spot is taken, i.e data$f1345 to data$1399 might not exist > in the data set.We can compute on the names like this (using the builtin anscombe data set to get just columns y1, x1, x2, x3, x4). Try this: # display anscombe data set anscombe # names.x are names that start with x names.x <- grep("^x", names(anscombe), value = TRUE) anscombe[, c("y1", names.x)]> > The solution "summaryBy" works for cross analysis, of which there is > a handful. So I am not worried here. > > The solution from Jorge is fine. > However, I am trying to get my head around how to efficiently > reduce my data set to the dependet variable and the factors such that > the solution is applicable. > > Having to type each variable into > my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003... > is an obvious option, but does not seem to be the most efficient one. > > Are there better ways to go about? > > Thanks, > Gerit > -- > Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL > f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >