Dear all,
given I have data in a data.frame which indicate the number of people in
a
specific year at a specific age:
n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
age=sample(1:12, size=n, replace=FALSE),
no=sample(1:10, size=n, replace=FALSE))
Now I would like to make a matrix with (in this simple example)
10 columns (for the years) and 12 rows (for the ages). In each cell,
I would like to put the correct number of individuals.
So far I was doing this as follows:
mymatrix <- matrix(0, ncol=10, nrow=12)
for (year in unique(mydf$yr)) {
for (age in unique(mydf$age)) {
if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
} else {
mymatrix[age,year] <- 0
}
}
}
This is fairly fast in such a simple setting.
But with more years and ages (and for roughly 300 datasets) this becomes
pretty slow. And in addition, this is not really elegant R-code.
Can somebody point me into the direction how I can do that in a more
elegant
way, possibly avoiding the loops?
Thanks,
Roland
+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}
just try
mymatrix <- matrix(0, 12, 10)
mymatrix[cbind(mydf$age, mydf$yr)] <- mydf$no
mymatrix
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Rau, Roland" <Rau at demogr.mpg.de>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, December 08, 2005 9:50 AM
Subject: [R] Reshaping data
> Dear all,
>
> given I have data in a data.frame which indicate the number of
> people in
> a
> specific year at a specific age:
>
> n <- 10
> mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
> age=sample(1:12, size=n, replace=FALSE),
> no=sample(1:10, size=n, replace=FALSE))
>
> Now I would like to make a matrix with (in this simple example)
> 10 columns (for the years) and 12 rows (for the ages). In each cell,
> I would like to put the correct number of individuals.
>
> So far I was doing this as follows:
>
> mymatrix <- matrix(0, ncol=10, nrow=12)
> for (year in unique(mydf$yr)) {
> for (age in unique(mydf$age)) {
> if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
> mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
> } else {
> mymatrix[age,year] <- 0
> }
> }
> }
>
> This is fairly fast in such a simple setting.
> But with more years and ages (and for roughly 300 datasets) this
> becomes
> pretty slow. And in addition, this is not really elegant R-code.
>
> Can somebody point me into the direction how I can do that in a more
> elegant
> way, possibly avoiding the loops?
>
> Thanks,
> Roland
>
> +++++
> This mail has been sent through the MPI for Demographic
> Rese...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Hi, thank you very much for your fast reply. It worked fine. In the meantime, I also had now an idea using a function from the apply-family (see below for the code). The more I use R, the more I get the impression that either "the apply-family" or outer() can solve most of my data-transformation questions/problems. Is this a typical learning experience? Best, Roland> -----Original Message----- > From: Dimitris Rizopoulos > just try > > mymatrix <- matrix(0, 12, 10) > mymatrix[cbind(mydf$age, mydf$yr)] <- mydf$no > mymatrix### generating the data n <- 10 mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE), age=sample(1:12, size=n, replace=FALSE), no=sample(1:10, size=n, replace=FALSE)) ### newmatrix <- tapply(X=mydf$no, INDEX=list(year=mydf$age, age=mydf$yr), FUN=sum) newmatrix[is.na(newmatrix)] <- 0 +++++ This mail has been sent through the MPI for Demographic Rese...{{dropped}}
"Rau, Roland" <Rau at demogr.mpg.de> writes:> Dear all, > > given I have data in a data.frame which indicate the number of people in > a > specific year at a specific age: > > n <- 10 > mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE), > age=sample(1:12, size=n, replace=FALSE), > no=sample(1:10, size=n, replace=FALSE)) > > Now I would like to make a matrix with (in this simple example) > 10 columns (for the years) and 12 rows (for the ages). In each cell, > I would like to put the correct number of individuals. > > So far I was doing this as follows: > > mymatrix <- matrix(0, ncol=10, nrow=12) > for (year in unique(mydf$yr)) { > for (age in unique(mydf$age)) { > if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) { > mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age] > } else { > mymatrix[age,year] <- 0 > } > } > } > > This is fairly fast in such a simple setting. > But with more years and ages (and for roughly 300 datasets) this becomes > pretty slow. And in addition, this is not really elegant R-code. > > Can somebody point me into the direction how I can do that in a more > elegant > way, possibly avoiding the loops?This almost gets you there: with(mydf, tapply(no,list(age,yr), sum)) except that it puts NA where you want 0, which you could fix with m <- with(mydf, tapply(no,list(age,yr), sum)) m[is.na(m)] <- 0 m Other options include matrix indexing: with(mydf, { M <- matrix(0,12,10) M[cbind(age,yr)]<-no }) or (tada...) the reshape() function, esp. if you want a data frame as output. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907