Hello! I have a data frame like this one: mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"), brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"), value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116)) (mydf) What I need to get is a data frame like the one below - cities as rows, brands as columns, and the sums of the "value" within each city/brand combination in the body of the data frame: city x y z a 3 23 336 b 7 42 231 I have written a code that involves multiple loops and subindexing - but it's taking too long. I am sure there must be a more efficient way of doing it. Thanks a lot for your hints! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com
Hadley's reshape package (google for it) can do this. There's a nice intro on the site. > library(reshape) > cast(melt(mydf, measure.vars = "value"), city ~ brand, fun.aggregate = sum) city x y z 1 a 3 23 450 2 b 12 42 231 Although the numbers differ slightly? I've heard of the reshape2 package, but have no idea if that's replaced the reshape package yet. --Erik Dimitri Liakhovitski wrote:> Hello! > > I have a data frame like this one: > > mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"), > brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"), > value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116)) > (mydf) > > What I need to get is a data frame like the one below - cities as > rows, brands as columns, and the sums of the "value" within each > city/brand combination in the body of the data frame: > > city x y z > a 3 23 336 > b 7 42 231 > > > I have written a code that involves multiple loops and subindexing - > but it's taking too long. > I am sure there must be a more efficient way of doing it. > > Thanks a lot for your hints! > >
Try this: xtabs(value ~ city + brand, mydf) On Wed, Nov 3, 2010 at 6:23 PM, Dimitri Liakhovitski < dimitri.liakhovitski@gmail.com> wrote:> Hello! > > I have a data frame like this one: > > > mydf<-data.frame(city=c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b"), > brand=c("x","x","y","y","z","z","z","z","x","x","x","y","y","y","z","z"), > value=c(1,2,11,12,111,112,113,114,3,4,5,13,14,15,115,116)) > (mydf) > > What I need to get is a data frame like the one below - cities as > rows, brands as columns, and the sums of the "value" within each > city/brand combination in the body of the data frame: > > city x y z > a 3 23 336 > b 7 42 231 > > > I have written a code that involves multiple loops and subindexing - > but it's taking too long. > I am sure there must be a more efficient way of doing it. > > Thanks a lot for your hints! > > > -- > Dimitri Liakhovitski > Ninah Consulting > www.ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Possibly Parallel Threads
- Modifying a data frame based on a vector that contains column numbers
- Creating a "shifted" month (one that starts not on the first of each month but on another date)
- Suppressing printing in the function
- lookup in R - possible to avoid loops?
- preventing repeat in "paste"