thr3ads.net - R help - [R] Need help on dataframe [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Simonas Kecorius

2013-Jan-05 13:33 UTC

[R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID  V1  V2  V3  V4 ........................ V71
 1    6     5    3     2  ........................  3
 2    3     2    2     1  ........................  1
 3    6     5    3     2  ........................  3
 4    12   15  3     2  ........................  100
........................................................
........................................................
288 10  20  30   30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1                              V2                V3                 V4
........................... V71
mean from 1 to 7       same as V1    same as V1
mean from 8 to 14     same as V1    same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!

-- 
Simonas Kecorius
**

	[[alternative HTML version deleted]]

arun

2013-Jan-05 16:43 UTC

head link

[R] Need help on dataframe

HI,

May be this helps:
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2 
2??? 3??? 2??? 2??? 1? 
3??? 6??? 5??? 3??? 2 
4??? 12? 15? 3??? 2? 
5??? 6??? 8??? 3??? 2 
6??? 3??? 2??? 4??? 1? 
7??? 6??? 5??? 3??? 3 
8??? 12? 15? 3??? 1? 
9??? 6??? 5??? 3??? 3 
10??? 3??? 2??? 7??? 5? 
11??? 6??? 5??? 8??? 2 
12??? 12? 19? 3??? 2? 
13??? 6??? 5??? 3??? 2 
14??? 3??? 4??? 2??? 1? 
15??? 6??? 5??? 6??? 2 
16??? 12? 15? 5??? 2? 
17??? 6??? 5??? 5??? 2 
18??? 3??? 2??? 8??? 1? 
19??? 6??? 5??? 3??? 9 
20??? 12? 15? 3??? 10? 
21??? 6??? 5??? 3??? 2 
22??? 3??? 2??? 2??? 11? 
23??? 6??? 5??? 3??? 4 
24??? 12? 15? 9??? 2 
",sep="",header=TRUE,stringsAsFactors=FALSE) 

dat1$newID<-rep(1:(nrow(dat1)/12),each=12) #if nrow(dat1)/12 is integer

?with(dat1,aggregate(cbind(V1,V2,V3,V4),by=list(newID),mean))
#? Group.1?? V1?????? V2?????? V3?????? V4
#1?????? 1 6.75 7.333333 3.750000 2.166667
#2?????? 2 6.75 6.916667 4.333333 4.000000

#or
aggregate(.~newID,data=dat1[,-1],mean)
#? newID?? V1?????? V2?????? V3?????? V4
#1???? 1 6.75 7.333333 3.750000 2.166667
#2???? 2 6.75 6.916667 4.333333 4.000000


A.K.



----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!

-- 
Simonas Kecorius
**

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

2013-Jan-05 16:49 UTC

head link

[R] Need help on dataframe

Hi,

One more way:
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2
2??? 3??? 2??? 2??? 1 
3??? 6??? 5??? 3??? 2
4??? 12? 15? 3??? 2 
5??? 6??? 8??? 3??? 2
6??? 3??? 2??? 4??? 1 
7??? 6??? 5??? 3??? 3
8??? 12? 15? 3??? 1 
9??? 6??? 5??? 3??? 3
10??? 3??? 2??? 7??? 5 
11??? 6??? 5??? 8??? 2
12??? 12? 19? 3??? 2 
13??? 6??? 5??? 3??? 2
14??? 3??? 4??? 2??? 1 
15??? 6??? 5??? 6??? 2
16??? 12? 15? 5??? 2 
17??? 6??? 5??? 5??? 2
18??? 3??? 2??? 8??? 1 
19??? 6??? 5??? 3??? 9
20??? 12? 15? 3??? 10 
21??? 6??? 5??? 3??? 2
22??? 3??? 2??? 2??? 11 
23??? 6??? 5??? 3??? 4
24??? 12? 15? 9??? 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-aggregate(.~1:nrow(dat1)%/%13,data=dat1[,-1],mean)
?names(res)[1]<-"group"
?res
#? group?? V1?????? V2?????? V3?????? V4
#1???? 0 6.75 7.333333 3.750000 2.166667
#2???? 1 6.75 6.916667 4.333333 4.000000
A.K.



----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!

-- 
Simonas Kecorius
**

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

John Kane

2013-Jan-05 17:23 UTC

head link

[R] Need help on dataframe

Well, a rather simple-minded, brute force approach would be to add a factor
variable to the data frame and use aggregate on it.

I am sure there are better ways but this will work.

EXAMPLE
###
xx  <-  data.frame(aa =1:24, 
                     b = matrix(sample(c(1,2,3,4,5,6), 72,  replace = TRUE),
nrow= 24))
  dd  <-rep(c("a","b"), each= 12)
  
  xx  <-  cbind(dd, xx)
  
  aggregate(xx[,3:5], list(xx$dd), mean)

################

By the way, when supplying data samples a good way is to use the dput command.
Try ?dput for information
John Kane
Kingston ON Canada

> -----Original Message-----
> From: simolas2008 at gmail.com
> Sent: Sat, 5 Jan 2013 15:33:03 +0200
> To: r-help at r-project.org
> Subject: [R] Need help on dataframe
> 
> Dear R users, I came up to a problem by taking means (or other summary
> statistics) of a big dataframe.
> 
> Suppose we do have a dataframe:
> 
> ID  V1  V2  V3  V4 ........................ V71
>  1    6     5    3     2  ........................  3
>  2    3     2    2     1  ........................  1
>  3    6     5    3     2  ........................  3
>  4    12   15  3     2  ........................  100
> ........................................................
> ........................................................
> 288 10  20  30   30 .......................... 499
> 
> I need to find out the way, how to calculate a mean of every 12 lines to
> get:
> 
> V1                              V2                V3                 V4
> ........................... V71
> mean from 1 to 7       same as V1    same as V1
> mean from 8 to 14     same as V1    same as V1
> etc.
> 
> I can do it column by column using:
> 
> y.ts <- ts(y$V1, frequency=12)
> aggregate(y.ts, FUN=mean)
> 
> Bu this is a hardcore... Can anyone suggest a better way to compute all
> the
> dataframe at once and get a result as matrix?
> 
> Thank you in advance!
> 
> --
> Simonas Kecorius
> **
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and
family!
Visit http://www.inbox.com/photosharing to find out more!

arun

2013-Jan-05 19:32 UTC

head link

[R] Need help on dataframe

HI,
Sorry, there was a mistake, which I noticed after seeing David's post.
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2
2??? 3??? 2??? 2??? 1
3??? 6??? 5??? 3??? 2
4??? 12? 15? 3??? 2
5??? 6??? 8??? 3??? 2
6??? 3??? 2??? 4??? 1
7??? 6??? 5??? 3??? 3
8??? 12? 15? 3??? 1
9??? 6??? 5??? 3??? 3
10??? 3??? 2??? 7??? 5
11??? 6??? 5??? 8??? 2
12??? 12? 19? 3??? 2
13??? 6??? 5??? 3??? 2
14??? 3??? 4??? 2??? 1
15??? 6??? 5??? 6??? 2
16??? 12? 15? 5??? 2
17??? 6??? 5??? 5??? 2
18??? 3??? 2??? 8??? 1
19??? 6??? 5??? 3??? 9
20??? 12? 15? 3??? 10
21??? 6??? 5??? 3??? 2
22??? 3??? 2??? 2??? 11
23??? 6??? 5??? 3??? 4
24??? 12? 15? 9??? 2
25??? 6??? 5??? 3??? 2
26??? 3??? 2??? 2??? 1
27?? 6??? 5??? 3??? 2
28??? 12? 15? 3??? 2
29??? 6??? 8??? 3??? 2
30??? 3??? 2??? 4??? 1
31??? 6??? 5??? 3??? 3
32??? 12? 15? 3??? 1
33??? 6??? 5??? 3??? 3
34??? 3??? 2??? 7??? 5
35??? 6??? 5??? 8??? 2
36??? 12? 19? 3??? 2
37??? 6??? 5??? 3??? 2
38??? 3??? 4??? 2??? 1
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-aggregate(.~(1:nrow(dat1)-1)%/%12,data=dat1[,-1],mean)
? names(res)[1]<-"group"
A.K.





----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!

-- 
Simonas Kecorius
**

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Mario Bourgoin

2013-Jan-05 21:40 UTC

head link

[R] Need help on dataframe

Given the data:
dat1<-read.table(header=TRUE,text="
ID  V1  V2  V3  V4
1    6    5    3    2
2    3    2    2    1
3    6    5    3    2
4    12  15  3    2
5    6    8    3    2
6    3    2    4    1
7    6    5    3    3
8    12  15  3    1
9    6    5    3    3
10    3    2    7    5
11    6    5    8    2
12    12  19  3    2
13    6    5    3    2
14    3    4    2    1
15    6    5    6    2
16    12  15  5    2
17    6    5    5    2
18    3    2    8    1
19    6    5    3    9
20    12  15  3    10
21    6    5    3    2
22    3    2    2    11
23    6    5    3    4
24    12  15  9    2
25    6    5    3    2
26    3    2    2    1
27   6    5    3    2
28    12  15  3    2
29    6    8    3    2
30    3    2    4    1
31    6    5    3    3
32    12  15  3    1
33    6    5    3    3
34    3    2    7    5
35    6    5    8    2
36    12  19  3    2
37    6    5    3    2
38    3    4    2    1"))

The following seems to be rather quick (0.394 ms per group of up to 12 rows
in 1201 rows):
do.call(rbind,by(dat1,floor((dat1$ID-1)/12),colMeans))
    ID       V1       V2       V3       V4
0  6.0 6.272727 6.272727 3.818182 2.181818
1 17.5 6.750000 7.250000 3.833333 4.000000
2 29.5 6.750000 7.000000 4.250000 2.166667
3 37.0 7.000000 9.333333 2.666667 1.666667

If you know the number of rows is a multiple of 12:
dat2<-read.table(header=TRUE,text="
ID  V1  V2  V3  V4
1    6    5    3    2
2    3    2    2    1
3    6    5    3    2
4    12  15  3    2
5    6    8    3    2
6    3    2    4    1
7    6    5    3    3
8    12  15  3    1
9    6    5    3    3
10    3    2    7    5
11    6    5    8    2
12    12  19  3    2
13    6    5    3    2
14    3    4    2    1
15    6    5    6    2
16    12  15  5    2
17    6    5    5    2
18    3    2    8    1
19    6    5    3    9
20    12  15  3    10
21    6    5    3    2
22    3    2    2    11
23    6    5    3    4
24    12  15  9    2
25    6    5    3    2
26    3    2    2    1
27   6    5    3    2
28    12  15  3    2
29    6    8    3    2
30    3    2    4    1
31    6    5    3    3
32    12  15  3    1
33    6    5    3    3
34    3    2    7    5
35    6    5    8    2
36    12  19  3    2"))

This is marginally better (0.378 ms per group of 12 rows in 1212 rows, or
4% less time):
do.call(rbind,by(dat2,rep(1:(NROW(dat2)/12),each=12),colMeans))
    ID   V1       V2       V3       V4
1  6.5 6.75 7.333333 3.750000 2.166667
2 18.5 6.75 6.916667 4.333333 4.000000
3 30.5 6.75 7.333333 3.750000 2.166667

Best,
Mario


On Sat, Jan 5, 2013 at 8:33 AM, Simonas Kecorius
<simolas2008@gmail.com>wrote:
> Dear R users, I came up to a problem by taking means (or other summary
> statistics) of a big dataframe.
>
> Suppose we do have a dataframe:
>
> ID  V1  V2  V3  V4 ........................ V71
>  1    6     5    3     2  ........................  3
>  2    3     2    2     1  ........................  1
>  3    6     5    3     2  ........................  3
>  4    12   15  3     2  ........................  100
> ........................................................
> ........................................................
> 288 10  20  30   30 .......................... 499
>
> I need to find out the way, how to calculate a mean of every 12 lines to
> get:
>
> V1                              V2                V3                 V4
> ........................... V71
> mean from 1 to 7       same as V1    same as V1
> mean from 8 to 14     same as V1    same as V1
> etc.
>
> I can do it column by column using:
>
> y.ts <- ts(y$V1, frequency=12)
> aggregate(y.ts, FUN=mean)
>
> Bu this is a hardcore... Can anyone suggest a better way to compute all the
> dataframe at once and get a result as matrix?
>
> Thank you in advance!
>
> --
> Simonas Kecorius
> **
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jan 2013 - Need help on dataframe

[R] Need help on dataframe

[R] Need help on dataframe

[R] Need help on dataframe

[R] Need help on dataframe

[R] Need help on dataframe

[R] Need help on dataframe

Reasonably Related Threads