Yi
2010-Jun-29 19:05 UTC
[R] How to delete the replicate rows by summing up the numeric columns
Hi, folks, I am sorry that I did not state the problem correctly yesterday. Please let me address the problem by the following codes: first=c('u','b','e','k','j','c','u','f','c','e') second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England') third=1:10 data=data.frame(first,second,third) ## You may understand values in the first column are the unique codes for those in the second column. ####So 'u' is only for usa. Replicate values appear the same rows for the first and second columns. ### Now I want to delete replicate rows with the same values in first (sceond) rows ####and sum up values in the third column for the same values. mm=melt(data,id='first') sum=cast(mm,first~variable,sum) ### This does not work. ###I tried another way to do this mm= melt(data, id='first',measure='third') sum=cast(mm,first~variable,sum) ## But then the problem is how to 'merge' the result with the second column in the dataset. The expected dataframe is like this: (I showed a wrong expected dataframe yesterday.) first second third 1 u usa 8 2 b Brazil 2 3 e England 13 4 k Korea 4 5 j Japan 5 6 c China 15 8 f France 8 Thanks in advance. [[alternative HTML version deleted]]
David Winsemius
2010-Jun-29 21:58 UTC
[R] How to delete the replicate rows by summing up the numeric columns
On Jun 29, 2010, at 3:05 PM, Yi wrote:> Hi, folks, > > I am sorry that I did not state the problem correctly yesterday. > > Please let me address the problem by the following codes: > > first=c('u','b','e','k','j','c','u','f','c','e') > second > > c > ('usa > ','Brazil > ','England','Korea','Japan','China','usa','France','China','England') > third=1:10 > data=data.frame(first,second,third) > > ## You may understand values in the first column are the unique > codes for > those in the second column. > ####So 'u' is only for usa. Replicate values appear the same rows > for the > first and second columns. > ### Now I want to delete replicate rows with the same values in first > (sceond) rows > ####and sum up values in the third column for the same values. > > mm=melt(data,id='first') > sum=cast(mm,first~variable,sum) ### This does not work. > > ###I tried another way to do this > mm= melt(data, id='first',measure='third') > sum=cast(mm,first~variable,sum) > > ## But then the problem is how to 'merge' the result with the second > column > in the dataset.> data$summed <- ave(data$third, data$first, FUN=sum) #computed sums within groups defined by "first" > data[!duplicated(data$first), c("first", "second", "summed")] #remove duplicates and leave out "third" first second summed 1 u usa 8 2 b Brazil 2 3 e England 13 4 k Korea 4 5 j Japan 5 6 c China 15 8 f France 8> > > The expected dataframe is like this: > > (I showed a wrong expected dataframe yesterday.) > > first second third > 1 u usa 8 > 2 b Brazil 2 > 3 e England 13 > 4 k Korea 4 > 5 j Japan 5 > 6 c China 15 > 8 f France 8 > > Thanks in advance. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Nikhil Kaza
2010-Jun-29 22:19 UTC
[R] How to delete the replicate rows by summing up the numeric columns
require(reshape) cast(data, first+second~ ., sum) Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.list at gmail.com On Jun 29, 2010, at 3:05 PM, Yi wrote:> first=c('u','b','e','k','j','c','u','f','c','e') > second > > c > ('usa > ','Brazil > ','England','Korea','Japan','China','usa','France','China','England') > third=1:10 > data=data.frame(first,second,third)
Dennis Murphy
2010-Jun-30 01:04 UTC
[R] How to delete the replicate rows by summing up the numeric columns
Hi: If you can deal with alphabetic order, the following seems to work: v <- aggregate(third ~ first, data = data, FUN = sum) v$second <- levels(data$second) v[, c(1, 3, 2)] first second third 1 b Brazil 2 2 c China 15 3 e England 13 4 f France 8 5 j Japan 5 6 k Korea 4 7 u usa 8 v$second works in this case because the levels are ordered and all are used when inserted in v. That's not a guarantee in more complicated problems and frankly, this one is a kludge. A plyr version would be v <- ddply(data, .(first), summarise, third = sum(third), second = second) v[!duplicated(v$first), c(1, 3, 2)] first second third 1 b Brazil 2 2 c China 15 4 e England 13 6 f France 8 7 j Japan 5 8 k Korea 4 9 u usa 8 The advantage of ddply over aggregate in this case is that ddply allows one to insert second as an 'identity' of sorts; however, the result contains duplicate rows, so we need to remove them in the second statement. Using melt and cast from the reshape package, mm <- melt(data, id = c('first', 'second')) (ms <- cast(mm, first + second ~ . , sum)) first second (all) 1 b Brazil 2 2 c China 15 3 e England 13 4 f France 8 5 j Japan 5 6 k Korea 4 7 u usa 8 names(ms)[3] <- 'third' This seems to be the cleanest version of the three in terms of getting both ID variables into the final result. HTH, Dennis On Tue, Jun 29, 2010 at 12:05 PM, Yi <liuyi.feier@gmail.com> wrote:> Hi, folks, > > I am sorry that I did not state the problem correctly yesterday. > > Please let me address the problem by the following codes: > > first=c('u','b','e','k','j','c','u','f','c','e') > > second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England') > third=1:10 > data=data.frame(first,second,third) > > ## You may understand values in the first column are the unique codes for > those in the second column. > ####So 'u' is only for usa. Replicate values appear the same rows for the > first and second columns. > ### Now I want to delete replicate rows with the same values in first > (sceond) rows > ####and sum up values in the third column for the same values. > > mm=melt(data,id='first') > sum=cast(mm,first~variable,sum) ### This does not work. > > ###I tried another way to do this > mm= melt(data, id='first',measure='third') > sum=cast(mm,first~variable,sum) > > ## But then the problem is how to 'merge' the result with the second column > in the dataset. > > > The expected dataframe is like this: > > (I showed a wrong expected dataframe yesterday.) > > first second third > 1 u usa 8 > 2 b Brazil 2 > 3 e England 13 > 4 k Korea 4 > 5 j Japan 5 > 6 c China 15 > 8 f France 8 > > Thanks in advance. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]