Hi,
Regarding your first comment, you didn't provide any reproducible example.
So I created one with SCHOOLID's as alphabets.? According to your original
post, you had a read dataset with 36000 SCHOOLIDs.? Suppose, if I created the
SCHOOLIDs using:
?length(outer(LETTERS,1:2000,paste,sep=""))
#[1] 52000
#Please note that I am creating only 6 columns as an example
set.seed(42)
rev1 <- data.frame(SCHOOLID =
sample(outer(LETTERS,1:1000,paste,sep=""),36e3, replace=TRUE),
matrix(sample(180, 36e3*5,replace=TRUE), ncol=5, dimnames=list(NULL,
c("MATH", "AGE", "STO2Q01", "BFMJ",
"BMMJ"))),stringsAsFactors=FALSE)? ? ?
?dim(rev1)
#[1] 36000???? 6
res1 <- aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
?dim(res1)
#[1] 26010???? 6
?head(res1,2)
# SCHOOLID? MATH AGE STO2Q01 BFMJ BMMJ
#1?????? A1 107.5? 30??? 41.5?? 75? 149
#2???? A100 159.5 132?? 107.0?? 66?? 15
colMeans(rev1[rev1$SCHOOLID=="A1",-1])
#?? MATH???? AGE STO2Q01??? BFMJ??? BMMJ
#? 107.5??? 30.0??? 41.5??? 75.0?? 149.0
#I am not following the second statement.? Please provide a reproducible example
using ?dput().
May be you want results in this form:
rev2 <- data.frame(SCHOOLID=rev1[,1], sapply(rev1[-1],function(x) ave(x,
rev1[,1], FUN= mean, na.rm=TRUE)))
A.K.
I'm sorry, but it does not :(
It gives results maximum only for first 26 schools (according to the number of
letters in the alphabet). And according to the result it counts not an avreage
values of the factors.
On Sunday, June 1, 2014 8:37 PM, arun <smartpink111 at yahoo.com> wrote:
Hi,
May be this helps:
set.seed(42)
rev1 <- data.frame(SCHOOLID=sample(LETTERS[1:4],20,replace=TRUE),
matrix(sample(25, 20*5,replace=TRUE), ncol=5, dimnames=list(NULL,
c("MATH", "AGE", "STO2Q01", "BFMJ",
"BMMJ"))),stringsAsFactors=FALSE)?
res1 <- aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]), mean,na.rm=TRUE)
res1
#if you need to change the names
res2 <- setNames(aggregate(rev1[,-1], list(SCHOOLID=rev1[,1]),
mean,na.rm=TRUE), c("SCHOOLID", paste(colnames(rev1)[-1],
"MEAN",sep="_")))
res2
A.K.
Hello! I have a problem, I want to calculate conditional mean for my dataset.
First, I attach it:
rev<-read.csv("MATH1.csv", header=T, sep=";",
dec=",")
attach(rev)
I have 650000 observations (test score) and 36000 groups (schoolid)
I need to calculate the mean for every group (schoolid) for the all my variables
(MATH, AGE, ST02Q01,BFMJ,BMMJ. Actually, I have 34 varables, I just don't
want to list them here)? and then to create new variables for obtained new
columns, because I want to estimate a new regression for the new obtained
average values.
The following method is not appropriate for me, because it gives me in result a
table with schoolid and the average for one variables, and I don't know how
to extract the MATH coulmn with average values from the table with results to
the worklist separately(environment).
aggregate( MATH~SCHOOLID, rev, mean)
How can I solve this problem? Thank for help!