thr3ads.net - R help - [R] summary statistics into table/data base, many factors to analyse [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Gerit Offermann

2008-Nov-20 14:16 UTC

[R] summary statistics into table/data base, many factors to analyse

Dear list,

I reduced my data to the following:

x <- c(1,4,2,6,8,3,4,2,4,5,1,3)
y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))

I can produce the statistical summary just fine.
s1 <- tapply(x, y, summary)
d1 <- tapply(x, y, sd)
s2 <- tapply(x, z, summary)
d2 <- tapply(x, z, sd)

First thing:
I have 100 plus factors to analyse. Theirs names are f1001 to f1381 (about).
Is there a way to avoid having to write these lines 100 plus times?

Second thing:
How can I put the standard deviation and the summary statistics into one output?

Third thing:
In the end I want to write the summary statistics into a data base (Access). It
would be fantastic if I could achieve a table such as:

factor  level  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   SDev.
y         1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
y         2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
z         1   1.0       3.5       6.0      5.0       7.0      8.0     3.6055513
.
.
.

I tried to unlist the matrices, but it did not help much.
it <- NULL # "it" - Iterationen

for (i in 1:nlevels(z)){
     it[[i]] <- unlist(s1[[i]])}
	 

Help to any of the three points is greatly appreciated.

Cheers,
Gerit
--

Gabor Grothendieck

2008-Nov-20 14:32 UTC

head link

[R] summary statistics into table/data base, many factors to analyse

Look at summaryBy in the doBy package.

On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann <gerit.offermann at
gmx.de> wrote:> Dear list,
>
> I reduced my data to the following:
>
> x <- c(1,4,2,6,8,3,4,2,4,5,1,3)
> y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
> z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))
>
> I can produce the statistical summary just fine.
> s1 <- tapply(x, y, summary)
> d1 <- tapply(x, y, sd)
> s2 <- tapply(x, z, summary)
> d2 <- tapply(x, z, sd)
>
> First thing:
> I have 100 plus factors to analyse. Theirs names are f1001 to f1381
(about).
> Is there a way to avoid having to write these lines 100 plus times?
>
> Second thing:
> How can I put the standard deviation and the summary statistics into one
output?
>
> Third thing:
> In the end I want to write the summary statistics into a data base
(Access). It would be fantastic if I could achieve a table such as:
>
> factor  level  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   SDev.
> y         1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
> y         2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
> z         1   1.0       3.5       6.0      5.0       7.0      8.0    
3.6055513
> .
> .
> .
>
> I tried to unlist the matrices, but it did not help much.
> it <- NULL # "it" - Iterationen
>
> for (i in 1:nlevels(z)){
>     it[[i]] <- unlist(s1[[i]])}
>
>
> Help to any of the three points is greatly appreciated.
>
> Cheers,
> Gerit
> --
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Jorge Ivan Velez

2008-Nov-20 15:06 UTC

head link

[R] summary statistics into table/data base, many factors to analyse

Dear Gerit,
Here is a start using a data set which first column is numeric and the rest
are factors 'f1', 'f2',....,'f1381' (I'm using only
3):

# Data set
x <- c(1,4,2,6,8,3,4,2,4,5,1,3)
y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))
mydata=data.frame(x,y,z)
mydata

# Function
foo=function(FACTOR) do.call(rbind,tapply(x,FACTOR,function(w)
c(summary(w),SD=sd(w))))

# Calculations
res=apply(mydata[,-1],2,foo)
res2=do.call(rbind,res)
rnames=rownames(res2)
rownames(res2)<-NULL

# Output
final=data.frame(Factor=rep(names(res),lapply(res,function(x)
nrow(x))),Levels=rnames,res2)
colnames(final)=c('Factor','Level',c('Min.','1st.Qu.','Median','Mean','3rd.Qu.','Max.','SD'))
final

See ?tapply and ?do.call for details.

HTH,

Jorge



On Thu, Nov 20, 2008 at 9:16 AM, Gerit Offermann
<gerit.offermann@gmx.de>wrote:
> Dear list,
>
> I reduced my data to the following:
>
> x <- c(1,4,2,6,8,3,4,2,4,5,1,3)
> y <- as.factor(c(2,2,1,1,1,2,2,1,1,2,1,2))
> z <- as.factor(c(1,2,2,1,1,2,2,3,3,3,3,3))
>
> I can produce the statistical summary just fine.
> s1 <- tapply(x, y, summary)
> d1 <- tapply(x, y, sd)
> s2 <- tapply(x, z, summary)
> d2 <- tapply(x, z, sd)
>
> First thing:
> I have 100 plus factors to analyse. Theirs names are f1001 to f1381
> (about).
> Is there a way to avoid having to write these lines 100 plus times?
>
> Second thing:
> How can I put the standard deviation and the summary statistics into one
> output?
>
> Third thing:
> In the end I want to write the summary statistics into a data base
> (Access). It would be fantastic if I could achieve a table such as:
>
> factor  level  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   SDev.
> y         1   1.000   2.000   3.000   3.833   5.500   8.000  2.714160
> y         2   1.000   3.000   3.500   3.333   4.000   5.000  1.366260
> z         1   1.0       3.5       6.0      5.0       7.0      8.0
> 3.6055513
> .
> .
> .
>
> I tried to unlist the matrices, but it did not help much.
> it <- NULL # "it" - Iterationen
>
> for (i in 1:nlevels(z)){
>     it[[i]] <- unlist(s1[[i]])}
>
>
> Help to any of the three points is greatly appreciated.
>
> Cheers,
> Gerit
> --
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Gerit Offermann

2008-Nov-21 10:50 UTC

head link

[R] summary statistics into table/data base, many factors to analyse

Dear list,

thanks to your help I managed to find means of analysing my data.

However, the whole data set contains 264 variables. Of which some are
factors, others are not. The factors tend to be grouped, e.g. 
data$f1304 to data$f1484 and data$f3204 to data$5408. 

But there are other types of variables in the data set as well, 
e.g. data$f1504. 

Not every spot is taken, i.e data$f1345 to data$1399 might not exist
in the data set. 

The solution "summaryBy" works for cross analysis, of which there is
a handful. So I am not worried here.

The solution from Jorge is fine. 
However, I am trying to get my head around how to efficiently
reduce my data set to the dependet variable and the factors such that
the solution is applicable.

Having to type each variable into
my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003...
is an obvious option, but does not seem to be the most efficient one.

Are there better ways to go about?

Thanks,
Gerit
-- 
Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL 
f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

Petr PIKAL

2008-Nov-21 13:44 UTC

head link

[R] summary statistics into table/data base, many factors to analyse

Hi

r-help-bounces at r-project.org napsal dne 21.11.2008 11:50:52:
> Dear list,
> 
> thanks to your help I managed to find means of analysing my data.
> 
> However, the whole data set contains 264 variables. Of which some are
> factors, others are not. The factors tend to be grouped, e.g. 
> data$f1304 to data$f1484 and data$f3204 to data$5408. 
> 
> But there are other types of variables in the data set as well, 
> e.g. data$f1504. 
> 
> Not every spot is taken, i.e data$f1345 to data$1399 might not exist
> in the data set. 
> 
> The solution "summaryBy" works for cross analysis, of which there
is
> a handful. So I am not worried here.
> 
> The solution from Jorge is fine. 
> However, I am trying to get my head around how to efficiently
> reduce my data set to the dependet variable and the factors such that
> the solution is applicable.
> 
> Having to type each variable into
> my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003...
> is an obvious option, but does not seem to be the most efficient one.
Maybe not so obvious. 
How did you get your data into R? By some read.* command? Then it shall be 
data frame with appropriate column type.

see str(mydata)

and you can choose only columns you really want by

mydata[, select.some.columns]

If your data is a list (see Intro manual for data types and its 
properties), then the transformation to data frame depends partly on how 
it looks like and if it has the same number of values.

do.call("cbind", mydata) shall combine all vectors in mydata however
it
will convert them to unique type as cbind produce matrix which has to have 
only one type of data.

If all variables have same length

do.call("data.frame", mydata)

will produce data frame and all variables shall be preserved in their 
respective type.

Regards
Petr

> 
> Are there better ways to go about?
> 
> Thanks,
> Gerit
> -- 
> Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL 
> f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2008-Nov-22 08:57 UTC

head link

[R] summary statistics into table/data base, many factors to analyse

On Fri, Nov 21, 2008 at 5:50 AM, Gerit Offermann <gerit.offermann at
gmx.de> wrote:> Dear list,
>
> thanks to your help I managed to find means of analysing my data.
>
> However, the whole data set contains 264 variables. Of which some are
> factors, others are not. The factors tend to be grouped, e.g.
> data$f1304 to data$f1484 and data$f3204 to data$5408.
>
> But there are other types of variables in the data set as well,
> e.g. data$f1504.
>
> Not every spot is taken, i.e data$f1345 to data$1399 might not exist
> in the data set.
We can compute on the names like this (using the builtin anscombe
data set to get just columns y1, x1, x2, x3, x4).  Try this:

# display anscombe data set
anscombe

# names.x are names that start with x
names.x <- grep("^x", names(anscombe), value = TRUE)
anscombe[, c("y1", names.x)]
>
> The solution "summaryBy" works for cross analysis, of which there
is
> a handful. So I am not worried here.
>
> The solution from Jorge is fine.
> However, I am trying to get my head around how to efficiently
> reduce my data set to the dependet variable and the factors such that
> the solution is applicable.
>
> Having to type each variable into
> my.reduced.data <- cbind(my.data$f1001, my.data$1002, my.data$1003...
> is an obvious option, but does not seem to be the most efficient one.
>
> Are there better ways to go about?
>
> Thanks,
> Gerit
> --
> Sensationsangebot nur bis 30.11: GMX FreeDSL - Telefonanschluss + DSL
> f?r nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Nov 2008 - summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

[R] summary statistics into table/data base, many factors to analyse

Apparently Analagous Threads