I created a small example to show something that I do a lot of. "scale" data by month and return a data.frame with the output. "id" represents repeated observations over "time" and I want to scale the "slope" variable. The "out" variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply seems ideal, but despite playing with the baseball examples quite a bit I can't figure out how to get it to work with my sample dataset. TIA for any help, Roger Here is the sample code: dat <- data.frame(id=rep(letters[1:5],3), time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) dat for (i in 1:3) { mat <- dat[dat$time==i, ] outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) if (i==1) { out <- outi } else { out <- rbind(out, outi) } } out Here is the sample output:> dat <- data.frame(id=rep(letters[1:5],3),time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)> datid time slope 1 a 1 1 2 b 1 2 3 c 1 3 4 d 1 4 5 e 1 5 6 a 2 6 7 b 2 7 8 c 2 8 9 d 2 9 10 e 2 10 11 a 3 11 12 b 3 12 13 c 3 13 14 d 3 14 15 e 3 15> for (i in 1:3) {+ mat <- dat[dat$time==i, ] + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out .... [TRUNCATED]> outmat.time mat.id slope 1 1 a -1.2649111 2 1 b -0.6324555 3 1 c 0.0000000 4 1 d 0.6324555 5 1 e 1.2649111 6 2 a -1.2649111 7 2 b -0.6324555 8 2 c 0.0000000 9 2 d 0.6324555 10 2 e 1.2649111 11 3 a -1.2649111 12 3 b -0.6324555 13 3 c 0.0000000 14 3 d 0.6324555 15 3 e 1.2649111>*************************************************************** This message is for the named person's use only. It may\...{{dropped:20}}
On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:> I created a small example to show something that I do a lot of. "scale" > data by month and return a data.frame with the output. "id" represents > repeated observations over "time" and I want to scale the "slope" > variable. The "out" variable shows the output I want. My for..loop > does the job but is probably very slow versus other methods. ddply > seems ideal, but despite playing with the baseball examples quite a bit > I can't figure out how to get it to work with my sample dataset. > > TIA for any help, Roger > > Here is the sample code: > > dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > dat > > for (i in 1:3) { > mat <- dat[dat$time==i, ] > outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > if (i==1) { > out <- outi > } else { > out <- rbind(out, outi) > } > } > out > > Here is the sample output: > >> dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > >> dat > id time slope > 1 a 1 1 > 2 b 1 2 > 3 c 1 3 > 4 d 1 4 > 5 e 1 5 > 6 a 2 6 > 7 b 2 7 > 8 c 2 8 > 9 d 2 9 > 10 e 2 10 > 11 a 3 11 > 12 b 3 12 > 13 c 3 13 > 14 d 3 14 > 15 e 3 15 > >> for (i in 1:3) { > + mat <- dat[dat$time==i, ] > + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > + if (i==1) { > + out .... [TRUNCATED] > >> out > mat.time mat.id slope > 1 1 a -1.2649111 > 2 1 b -0.6324555 > 3 1 c 0.0000000 > 4 1 d 0.6324555 > 5 1 e 1.2649111 > 6 2 a -1.2649111 > 7 2 b -0.6324555 > 8 2 c 0.0000000 > 9 2 d 0.6324555 > 10 2 e 1.2649111 > 11 3 a -1.2649111 > 12 3 b -0.6324555 > 13 3 c 0.0000000 > 14 3 d 0.6324555 > 15 3 e 1.2649111 >> > ***************************************************************Roger, seems like you might want: See ?ave> cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))id time slope slope 1 a 1 1 -1.2649111 2 b 1 2 -0.6324555 3 c 1 3 0.0000000 4 d 1 4 0.6324555 5 e 1 5 1.2649111 6 a 2 6 -1.2649111 7 b 2 7 -0.6324555 8 c 2 8 0.0000000 9 d 2 9 0.6324555 10 e 2 10 1.2649111 11 a 3 11 -1.2649111 12 b 3 12 -0.6324555 13 c 3 13 0.0000000 14 d 3 14 0.6324555 15 e 3 15 1.2649111 HTH, Marc Schwartz
On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger <roger.bos at rothschild.com> wrote:> I created a small example to show something that I do a lot of. ?"scale" > data by month and return a data.frame with the output. ?"id" represents > repeated observations over "time" and I want to scale the "slope" > variable. ?The "out" variable shows the output I want. ?My for..loop > does the job but is probably very slow versus other methods. ?ddply > seems ideal, but despite playing with the baseball examples quite a bit > I can't figure out how to get it to work with my sample dataset. > > TIA for any help, Roger > > Here is the sample code: > > dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > dat > > for (i in 1:3) { > ? ?mat <- dat[dat$time==i, ] > ? ?outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > ? ?if (i==1) { > ? ? ? ?out <- outi > ? ?} else { > ? ? ? ?out <- rbind(out, outi) > ? ?} > } > out > > Here is the sample output: > >> dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > >> dat > ? id time slope > 1 ? a ? ?1 ? ? 1 > 2 ? b ? ?1 ? ? 2 > 3 ? c ? ?1 ? ? 3 > 4 ? d ? ?1 ? ? 4 > 5 ? e ? ?1 ? ? 5 > 6 ? a ? ?2 ? ? 6 > 7 ? b ? ?2 ? ? 7 > 8 ? c ? ?2 ? ? 8 > 9 ? d ? ?2 ? ? 9 > 10 ?e ? ?2 ? ?10 > 11 ?a ? ?3 ? ?11 > 12 ?b ? ?3 ? ?12 > 13 ?c ? ?3 ? ?13 > 14 ?d ? ?3 ? ?14 > 15 ?e ? ?3 ? ?15 > >> for (i in 1:3) { > + ? ? mat <- dat[dat$time==i, ] > + ? ? outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > + ? ? if (i==1) { > + ? ? ? ? out ?.... [TRUNCATED] > >> out > ? mat.time mat.id ? ? ?slope > 1 ? ? ? ? 1 ? ? ?a -1.2649111 > 2 ? ? ? ? 1 ? ? ?b -0.6324555 > 3 ? ? ? ? 1 ? ? ?c ?0.0000000 > 4 ? ? ? ? 1 ? ? ?d ?0.6324555 > 5 ? ? ? ? 1 ? ? ?e ?1.2649111 > 6 ? ? ? ? 2 ? ? ?a -1.2649111 > 7 ? ? ? ? 2 ? ? ?b -0.6324555 > 8 ? ? ? ? 2 ? ? ?c ?0.0000000 > 9 ? ? ? ? 2 ? ? ?d ?0.6324555 > 10 ? ? ? ?2 ? ? ?e ?1.2649111 > 11 ? ? ? ?3 ? ? ?a -1.2649111 > 12 ? ? ? ?3 ? ? ?b -0.6324555 > 13 ? ? ? ?3 ? ? ?c ?0.0000000 > 14 ? ? ? ?3 ? ? ?d ?0.6324555 > 15 ? ? ? ?3 ? ? ?e ?1.2649111 >>Try ave: transform(dat, slope = ave(slope, time, FUN = scale)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
A ddply solution is dat.out <- ddply(dat, .(time), transform, slope = scale(slope)) but this is not faster than the loop, and slower than the ave() solution:> system.time(+ for (i in 1:3) { + mat <- dat[dat$time==i, ] + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) + if (i==1) { + out <- outi + } else { + out <- rbind(out, outi) + } + } + ) user system elapsed 0.024 0.000 0.025> > system.time(+ dat.out <- ddply(dat, .(time), transform, slope = scale(slope)) + ) user system elapsed 0.032 0.000 0.031> > > system.time(+ cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale)) + ) user system elapsed 0.008 0.000 0.007>On Thu, Aug 26, 2010 at 4:33 PM, Bos, Roger <roger.bos@rothschild.com>wrote:> I created a small example to show something that I do a lot of. "scale" > data by month and return a data.frame with the output. "id" represents > repeated observations over "time" and I want to scale the "slope" > variable. The "out" variable shows the output I want. My for..loop > does the job but is probably very slow versus other methods. ddply > seems ideal, but despite playing with the baseball examples quite a bit > I can't figure out how to get it to work with my sample dataset. > > TIA for any help, Roger > > Here is the sample code: > > dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > dat > > for (i in 1:3) { > mat <- dat[dat$time==i, ] > outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > if (i==1) { > out <- outi > } else { > out <- rbind(out, outi) > } > } > out > > Here is the sample output: > > > dat <- data.frame(id=rep(letters[1:5],3), > time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15) > > > dat > id time slope > 1 a 1 1 > 2 b 1 2 > 3 c 1 3 > 4 d 1 4 > 5 e 1 5 > 6 a 2 6 > 7 b 2 7 > 8 c 2 8 > 9 d 2 9 > 10 e 2 10 > 11 a 3 11 > 12 b 3 12 > 13 c 3 13 > 14 d 3 14 > 15 e 3 15 > > > for (i in 1:3) { > + mat <- dat[dat$time==i, ] > + outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope)) > + if (i==1) { > + out .... [TRUNCATED] > > > out > mat.time mat.id slope > 1 1 a -1.2649111 > 2 1 b -0.6324555 > 3 1 c 0.0000000 > 4 1 d 0.6324555 > 5 1 e 1.2649111 > 6 2 a -1.2649111 > 7 2 b -0.6324555 > 8 2 c 0.0000000 > 9 2 d 0.6324555 > 10 2 e 1.2649111 > 11 3 a -1.2649111 > 12 3 b -0.6324555 > 13 3 c 0.0000000 > 14 3 d 0.6324555 > 15 3 e 1.2649111 > > > *************************************************************** > > This message is for the named person's use only. It ma...{{dropped:22}}