Hi All, I am interested in aggregating a data frame based on 2 categories--mean effect size (r) for each 'id's' 'mod1'. The 'with' function works well when aggregating on one category (e.g., based on 'id' below) but doesnt work if I try 2 categories. How can this be accomplished? # sample data id<-c(1,1,1,rep(4:12)) n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) mod2<-c(1,2,15,rep(3,9)) datas<-data.frame(id,n,r,mod1,mod2) # one category works perfect: with(datas, aggregate(list(r = r), by = list(id = id),mean)) id r 1 1 0.5233333 2 4 0.6400000 3 5 0.4900000 4 6 -0.0400000 5 7 0.4900000 6 8 0.3300000 7 9 0.5800000 8 10 0.1800000 9 11 0.6000000 10 12 0.2100000 # trying with 2 categories: with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 = mod1)),mean)) Error in FUN(X[[1L]], ...) : arguments must have same length Thank you, AC
This seems to work fine (notice the missing 'c(...)'; why did you think you needed it);> with(datas, aggregate(list(r = r), by = list(id = id, mod1 mod1),mean))id mod1 r 1 1 1 0.980 2 4 1 0.640 3 7 1 0.490 4 10 1 0.180 5 1 2 0.295 6 5 2 0.490 7 8 2 0.330 8 11 2 0.600 9 6 3 -0.040 10 9 3 0.580 11 12 3 0.210>On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre@wisc.edu> wrote:> Hi All, > > I am interested in aggregating a data frame based on 2 > categories--mean effect size (r) for each 'id's' 'mod1'. The > 'with' function works well when aggregating on one category (e.g., > based on 'id' below) but doesnt work if I try 2 categories. How can > this be accomplished? > > # sample data > > id<-c(1,1,1,rep(4:12)) > n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) > r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) > mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) > mod2<-c(1,2,15,rep(3,9)) > datas<-data.frame(id,n,r,mod1,mod2) > > # one category works perfect: > > with(datas, aggregate(list(r = r), by = list(id = id),mean)) > > id r > 1 1 0.5233333 > 2 4 0.6400000 > 3 5 0.4900000 > 4 6 -0.0400000 > 5 7 0.4900000 > 6 8 0.3300000 > 7 9 0.5800000 > 8 10 0.1800000 > 9 11 0.6000000 > 10 12 0.2100000 > > # trying with 2 categories: > > with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 > mod1)),mean)) > > Error in FUN(X[[1L]], ...) : arguments must have same length > > Thank you, > > AC > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]
OK, this is great, Jim. Last question: How about if I want the 1 copy of each id to be selected randomly versus taking the first one? Thank you, AC> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman at gmail.com> wrote: >> I am not sure what you mean by eliminating a row.? Now if you want only one >> copy of each 'id', and it is the first one, the you can use 'duplicated': >> >>> x <- with(datas,? aggregate(list(r = r),? by = list(id = id, mod1 >>> mod1),mean)) >>> x >> ?? id mod1????? r >> 1?? 1??? 1? 0.980 >> 2?? 4??? 1? 0.640 >> 3?? 7??? 1? 0.490 >> 4? 10??? 1? 0.180 >> 5?? 1??? 2? 0.295 >> 6?? 5??? 2? 0.490 >> 7?? 8??? 2? 0.330 >> 8? 11??? 2? 0.600 >> 9?? 6??? 3 -0.040 >> 10? 9??? 3? 0.580 >> 11 12??? 3? 0.210 >>> subset(x, !duplicated(id)) >> ?? id mod1???? r >> 1?? 1??? 1? 0.98 >> 2?? 4??? 1? 0.64 >> 3?? 7??? 1? 0.49 >> 4? 10??? 1? 0.18 >> 6?? 5??? 2? 0.49 >> 7?? 8??? 2? 0.33 >> 8? 11??? 2? 0.60 >> 9?? 6??? 3 -0.04 >> 10? 9??? 3? 0.58 >> 11 12??? 3? 0.21 >> >> >> On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre at wisc.edu> wrote: >>> >>> Perfect! Thanks Jim. >>> >>> Do you know how I could then reduce the data even further? >>> Specifically, reducing it to 1 id per row? In this dataset, id 1 would >>> have one row eliminated. >>> Assume the data is much larger and cannot be deleted by visual >>> inspection and elimination one row at a time. >>> >>> >>> Thank you, >>> >>> AC >>> >>> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman at gmail.com> wrote: >>> > This seems to work fine (notice the missing 'c(...)'; why did you think >>> > you >>> > needed it); >>> > >>> >>? with(datas,? aggregate(list(r = r),? by = list(id = id, mod1 >>> >> mod1),mean)) >>> > ?? id mod1????? r >>> > 1?? 1??? 1? 0.980 >>> > 2?? 4??? 1? 0.640 >>> > 3?? 7??? 1? 0.490 >>> > 4? 10??? 1? 0.180 >>> > 5?? 1??? 2? 0.295 >>> > 6?? 5??? 2? 0.490 >>> > 7?? 8??? 2? 0.330 >>> > 8? 11??? 2? 0.600 >>> > 9?? 6??? 3 -0.040 >>> > 10? 9??? 3? 0.580 >>> > 11 12??? 3? 0.210 >>> >> >>> > >>> > >>> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre at wisc.edu> wrote: >>> >> >>> >> Hi All, >>> >> >>> >> I am interested in aggregating a data frame based on 2 >>> >> categories--mean effect size (r) for each 'id's' 'mod1'. The >>> >> 'with' function works well when aggregating on one category (e.g., >>> >> based on 'id' below) but doesnt work if I try 2 categories. How can >>> >> this be accomplished? >>> >> >>> >> # sample data >>> >> >>> >> id<-c(1,1,1,rep(4:12)) >>> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) >>> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) >>> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) >>> >> mod2<-c(1,2,15,rep(3,9)) >>> >> datas<-data.frame(id,n,r,mod1,mod2) >>> >> >>> >> # one category works perfect: >>> >> >>> >> with(datas, ?aggregate(list(r = r), ?by = list(id = id),mean)) >>> >> >>> >> ?id ? ? ? ? ?r >>> >> 1 ? 1 ?0.5233333 >>> >> 2 ? 4 ?0.6400000 >>> >> 3 ? 5 ?0.4900000 >>> >> 4 ? 6 -0.0400000 >>> >> 5 ? 7 ?0.4900000 >>> >> 6 ? 8 ?0.3300000 >>> >> 7 ? 9 ?0.5800000 >>> >> 8 ?10 ?0.1800000 >>> >> 9 ?11 ?0.6000000 >>> >> 10 12 ?0.2100000 >>> >> >>> >> # trying with 2 categories: >>> >> >>> >> ?with(datas, ?aggregate(list(r = r), ?by = list(c(id = id, mod1 >>> >> mod1)),mean)) >>> >> >>> >> Error in FUN(X[[1L]], ...) : arguments must have same length >>> >> >>> >> Thank you, >>> >> >>> >> AC >>> >> >>> >> ______________________________________________ >>> >> R-help at r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >>> >> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > >>> > -- >>> > Jim Holtman >>> > Cincinnati, OH >>> > +1 513 646 9390 >>> > >>> > What is the problem that you are trying to solve? >>> > >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >
This will do it. You can see two different values for id=1:> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 mod1),mean)) > xid mod1 r 1 1 1 0.980 2 4 1 0.640 3 7 1 0.490 4 10 1 0.180 5 1 2 0.295 6 5 2 0.490 7 8 2 0.330 8 11 2 0.600 9 6 3 -0.040 10 9 3 0.580 11 12 3 0.210> # choose random duplicate to use > do.call(rbind, lapply(split(x, x$id), function(.data).data[sample(nrow(.data), 1),])) id mod1 r 1 1 1 0.98 4 4 1 0.64 5 5 2 0.49 6 6 3 -0.04 7 7 1 0.49 8 8 2 0.33 9 9 3 0.58 10 10 1 0.18 11 11 2 0.60 12 12 3 0.21> > # choose random duplicate to use - try to see if a different one comes up > do.call(rbind, lapply(split(x, x$id), function(.data).data[sample(nrow(.data), 1),])) id mod1 r 1 1 2 0.295 4 4 1 0.640 5 5 2 0.490 6 6 3 -0.040 7 7 1 0.490 8 8 2 0.330 9 9 3 0.580 10 10 1 0.180 11 11 2 0.600 12 12 3 0.210> >On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acdelre@gmail.com> wrote:> OK, this is great, Jim. Last question: How about if I want the 1 copy > of each id to be selected randomly versus taking the first one? > > AC > > On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman@gmail.com> wrote: > > I am not sure what you mean by eliminating a row. Now if you want only > one > > copy of each 'id', and it is the first one, the you can use 'duplicated': > > > >> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 > >> mod1),mean)) > >> x > > id mod1 r > > 1 1 1 0.980 > > 2 4 1 0.640 > > 3 7 1 0.490 > > 4 10 1 0.180 > > 5 1 2 0.295 > > 6 5 2 0.490 > > 7 8 2 0.330 > > 8 11 2 0.600 > > 9 6 3 -0.040 > > 10 9 3 0.580 > > 11 12 3 0.210 > >> subset(x, !duplicated(id)) > > id mod1 r > > 1 1 1 0.98 > > 2 4 1 0.64 > > 3 7 1 0.49 > > 4 10 1 0.18 > > 6 5 2 0.49 > > 7 8 2 0.33 > > 8 11 2 0.60 > > 9 6 3 -0.04 > > 10 9 3 0.58 > > 11 12 3 0.21 > > > > > > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre@wisc.edu> wrote: > >> > >> Perfect! Thanks Jim. > >> > >> Do you know how I could then reduce the data even further? > >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would > >> have one row eliminated. > >> Assume the data is much larger and cannot be deleted by visual > >> inspection and elimination one row at a time. > >> > >> > >> Thank you, > >> > >> AC > >> > >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman@gmail.com> > wrote: > >> > This seems to work fine (notice the missing 'c(...)'; why did you > think > >> > you > >> > needed it); > >> > > >> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 > >> >> mod1),mean)) > >> > id mod1 r > >> > 1 1 1 0.980 > >> > 2 4 1 0.640 > >> > 3 7 1 0.490 > >> > 4 10 1 0.180 > >> > 5 1 2 0.295 > >> > 6 5 2 0.490 > >> > 7 8 2 0.330 > >> > 8 11 2 0.600 > >> > 9 6 3 -0.040 > >> > 10 9 3 0.580 > >> > 11 12 3 0.210 > >> >> > >> > > >> > > >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre@wisc.edu> wrote: > >> >> > >> >> Hi All, > >> >> > >> >> I am interested in aggregating a data frame based on 2 > >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The > >> >> 'with' function works well when aggregating on one category (e.g., > >> >> based on 'id' below) but doesnt work if I try 2 categories. How can > >> >> this be accomplished? > >> >> > >> >> # sample data > >> >> > >> >> id<-c(1,1,1,rep(4:12)) > >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8) > >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21) > >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3))) > >> >> mod2<-c(1,2,15,rep(3,9)) > >> >> datas<-data.frame(id,n,r,mod1,mod2) > >> >> > >> >> # one category works perfect: > >> >> > >> >> with(datas, aggregate(list(r = r), by = list(id = id),mean)) > >> >> > >> >> id r > >> >> 1 1 0.5233333 > >> >> 2 4 0.6400000 > >> >> 3 5 0.4900000 > >> >> 4 6 -0.0400000 > >> >> 5 7 0.4900000 > >> >> 6 8 0.3300000 > >> >> 7 9 0.5800000 > >> >> 8 10 0.1800000 > >> >> 9 11 0.6000000 > >> >> 10 12 0.2100000 > >> >> > >> >> # trying with 2 categories: > >> >> > >> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 > >> >> mod1)),mean)) > >> >> > >> >> Error in FUN(X[[1L]], ...) : arguments must have same length > >> >> > >> >> Thank you, > >> >> > >> >> AC > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > >> >> and provide commented, minimal, self-contained, reproducible code. > >> > > >> > > >> > > >> > -- > >> > Jim Holtman > >> > Cincinnati, OH > >> > +1 513 646 9390 > >> > > >> > What is the problem that you are trying to solve? > >> > > > > > > > > > -- > > Jim Holtman > > Cincinnati, OH > > +1 513 646 9390 > > > > What is the problem that you are trying to solve? > > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]]