thr3ads.net - R help - [R] Reshaping data [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Rau, Roland

2005-Dec-08 08:50 UTC

[R] Reshaping data

Dear all,

given I have data in a data.frame which indicate the number of people in
a 
specific year at a specific age:

n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
                   age=sample(1:12, size=n, replace=FALSE),
                   no=sample(1:10, size=n, replace=FALSE))

Now I would like to make a matrix with (in this simple example)
10 columns (for the years) and 12 rows (for the ages). In each cell,
I would like to put the correct number of individuals.

So far I was doing this as follows:

mymatrix <- matrix(0, ncol=10, nrow=12)
for (year in unique(mydf$yr)) {
  for (age in unique(mydf$age)) {
    if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
      mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
    } else {
      mymatrix[age,year] <- 0
    }
  }
}

This is fairly fast in such a simple setting.
But with more years and ages (and for roughly 300 datasets) this becomes
pretty slow. And in addition, this is not really elegant R-code.

Can somebody point me into the direction how I can do that in a more
elegant
way, possibly avoiding the loops?

Thanks,
Roland

+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

Dimitris Rizopoulos

2005-Dec-08 09:07 UTC

head link

[R] Reshaping data

just try

mymatrix <- matrix(0, 12, 10)
mymatrix[cbind(mydf$age, mydf$yr)] <- mydf$no
mymatrix


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Rau, Roland" <Rau at demogr.mpg.de>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, December 08, 2005 9:50 AM
Subject: [R] Reshaping data

> Dear all,
>
> given I have data in a data.frame which indicate the number of 
> people in
> a
> specific year at a specific age:
>
> n <- 10
> mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
>                   age=sample(1:12, size=n, replace=FALSE),
>                   no=sample(1:10, size=n, replace=FALSE))
>
> Now I would like to make a matrix with (in this simple example)
> 10 columns (for the years) and 12 rows (for the ages). In each cell,
> I would like to put the correct number of individuals.
>
> So far I was doing this as follows:
>
> mymatrix <- matrix(0, ncol=10, nrow=12)
> for (year in unique(mydf$yr)) {
>  for (age in unique(mydf$age)) {
>    if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
>      mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
>    } else {
>      mymatrix[age,year] <- 0
>    }
>  }
> }
>
> This is fairly fast in such a simple setting.
> But with more years and ages (and for roughly 300 datasets) this 
> becomes
> pretty slow. And in addition, this is not really elegant R-code.
>
> Can somebody point me into the direction how I can do that in a more
> elegant
> way, possibly avoiding the loops?
>
> Thanks,
> Roland
>
> +++++
> This mail has been sent through the MPI for Demographic 
> Rese...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Rau, Roland

2005-Dec-08 09:29 UTC

head link

[R] Reshaping data

Hi, 

thank you very much for your fast reply. It worked fine.
In the meantime, I also had now an idea using a function from the
apply-family (see below for the code).

The more I use R, the more I get the impression that either "the
apply-family" or outer() can solve most of my data-transformation
questions/problems. Is this a typical learning experience?

Best,
Roland
> -----Original Message-----
> From: Dimitris Rizopoulos 
> just try
> 
> mymatrix <- matrix(0, 12, 10)
> mymatrix[cbind(mydf$age, mydf$yr)] <- mydf$no
> mymatrix
### generating the data
n <- 10
mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
                   age=sample(1:12, size=n, replace=FALSE),
                   no=sample(1:10, size=n, replace=FALSE))
### 
newmatrix <- tapply(X=mydf$no, INDEX=list(year=mydf$age, age=mydf$yr),
FUN=sum)
newmatrix[is.na(newmatrix)] <- 0

+++++
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

Peter Dalgaard

2005-Dec-08 09:34 UTC

head link

[R] Reshaping data

"Rau, Roland" <Rau at demogr.mpg.de> writes:
> Dear all,
> 
> given I have data in a data.frame which indicate the number of people in
> a 
> specific year at a specific age:
> 
> n <- 10
> mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE),
>                    age=sample(1:12, size=n, replace=FALSE),
>                    no=sample(1:10, size=n, replace=FALSE))
> 
> Now I would like to make a matrix with (in this simple example)
> 10 columns (for the years) and 12 rows (for the ages). In each cell,
> I would like to put the correct number of individuals.
> 
> So far I was doing this as follows:
> 
> mymatrix <- matrix(0, ncol=10, nrow=12)
> for (year in unique(mydf$yr)) {
>   for (age in unique(mydf$age)) {
>     if (length(mydf$no[mydf$yr==year & mydf$age==age]) > 0) {
>       mymatrix[age,year] <- mydf$no[mydf$yr==year & mydf$age==age]
>     } else {
>       mymatrix[age,year] <- 0
>     }
>   }
> }
> 
> This is fairly fast in such a simple setting.
> But with more years and ages (and for roughly 300 datasets) this becomes
> pretty slow. And in addition, this is not really elegant R-code.
> 
> Can somebody point me into the direction how I can do that in a more
> elegant
> way, possibly avoiding the loops?
This almost gets you there:

with(mydf, tapply(no,list(age,yr), sum))

except that it puts NA where you want 0, which you could fix with

 m <- with(mydf, tapply(no,list(age,yr), sum))
 m[is.na(m)] <- 0
 m

Other options include matrix indexing:

with(mydf, {
  M <- matrix(0,12,10)
  M[cbind(age,yr)]<-no
})

or (tada...) the reshape() function, esp. if you want a data frame as
output.
-- 
   O__  ---- Peter Dalgaard             ??ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Possibly Parallel Threads

Search for more reasonably related threads

R help - Dec 2005 - Reshaping data

[R] Reshaping data

[R] Reshaping data

[R] Reshaping data

[R] Reshaping data

Possibly Parallel Threads