Yi
2010-Jun-29 19:05 UTC
[R] How to delete the replicate rows by summing up the numeric columns
Hi, folks,
I am sorry that I did not state the problem correctly yesterday.
Please let me address the problem by the following codes:
first=c('u','b','e','k','j','c','u','f','c','e')
second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
third=1:10
data=data.frame(first,second,third)
## You may understand values in the first column are the unique codes for
those in the second column.
####So 'u' is only for usa. Replicate values appear the same rows for
the
first and second columns.
### Now I want to delete replicate rows with the same values in first
(sceond) rows
####and sum up values in the third column for the same values.
mm=melt(data,id='first')
sum=cast(mm,first~variable,sum) ### This does not work.
###I tried another way to do this
mm= melt(data, id='first',measure='third')
sum=cast(mm,first~variable,sum)
## But then the problem is how to 'merge' the result with the second
column
in the dataset.
The expected dataframe is like this:
(I showed a wrong expected dataframe yesterday.)
first second third
1 u usa 8
2 b Brazil 2
3 e England 13
4 k Korea 4
5 j Japan 5
6 c China 15
8 f France 8
Thanks in advance.
[[alternative HTML version deleted]]
David Winsemius
2010-Jun-29 21:58 UTC
[R] How to delete the replicate rows by summing up the numeric columns
On Jun 29, 2010, at 3:05 PM, Yi wrote:> Hi, folks, > > I am sorry that I did not state the problem correctly yesterday. > > Please let me address the problem by the following codes: > > first=c('u','b','e','k','j','c','u','f','c','e') > second > > c > ('usa > ','Brazil > ','England','Korea','Japan','China','usa','France','China','England') > third=1:10 > data=data.frame(first,second,third) > > ## You may understand values in the first column are the unique > codes for > those in the second column. > ####So 'u' is only for usa. Replicate values appear the same rows > for the > first and second columns. > ### Now I want to delete replicate rows with the same values in first > (sceond) rows > ####and sum up values in the third column for the same values. > > mm=melt(data,id='first') > sum=cast(mm,first~variable,sum) ### This does not work. > > ###I tried another way to do this > mm= melt(data, id='first',measure='third') > sum=cast(mm,first~variable,sum) > > ## But then the problem is how to 'merge' the result with the second > column > in the dataset.> data$summed <- ave(data$third, data$first, FUN=sum) #computed sums within groups defined by "first" > data[!duplicated(data$first), c("first", "second", "summed")] #remove duplicates and leave out "third" first second summed 1 u usa 8 2 b Brazil 2 3 e England 13 4 k Korea 4 5 j Japan 5 6 c China 15 8 f France 8> > > The expected dataframe is like this: > > (I showed a wrong expected dataframe yesterday.) > > first second third > 1 u usa 8 > 2 b Brazil 2 > 3 e England 13 > 4 k Korea 4 > 5 j Japan 5 > 6 c China 15 > 8 f France 8 > > Thanks in advance. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Nikhil Kaza
2010-Jun-29 22:19 UTC
[R] How to delete the replicate rows by summing up the numeric columns
require(reshape) cast(data, first+second~ ., sum) Nikhil Kaza Asst. Professor, City and Regional Planning University of North Carolina nikhil.list at gmail.com On Jun 29, 2010, at 3:05 PM, Yi wrote:> first=c('u','b','e','k','j','c','u','f','c','e') > second > > c > ('usa > ','Brazil > ','England','Korea','Japan','China','usa','France','China','England') > third=1:10 > data=data.frame(first,second,third)
Dennis Murphy
2010-Jun-30 01:04 UTC
[R] How to delete the replicate rows by summing up the numeric columns
Hi:
If you can deal with alphabetic order, the following seems to work:
v <- aggregate(third ~ first, data = data, FUN = sum)
v$second <- levels(data$second)
v[, c(1, 3, 2)]
first second third
1 b Brazil 2
2 c China 15
3 e England 13
4 f France 8
5 j Japan 5
6 k Korea 4
7 u usa 8
v$second works in this case because the levels are ordered and all are used
when inserted in v. That's not a guarantee in more complicated problems and
frankly, this one is a kludge.
A plyr version would be
v <- ddply(data, .(first), summarise, third = sum(third), second = second)
v[!duplicated(v$first), c(1, 3, 2)]
first second third
1 b Brazil 2
2 c China 15
4 e England 13
6 f France 8
7 j Japan 5
8 k Korea 4
9 u usa 8
The advantage of ddply over aggregate in this case is that ddply allows one
to insert second as an 'identity' of sorts; however, the result contains
duplicate rows, so we need to remove them in the second statement.
Using melt and cast from the reshape package,
mm <- melt(data, id = c('first', 'second'))
(ms <- cast(mm, first + second ~ . , sum))
first second (all)
1 b Brazil 2
2 c China 15
3 e England 13
4 f France 8
5 j Japan 5
6 k Korea 4
7 u usa 8
names(ms)[3] <- 'third'
This seems to be the cleanest version of the three in terms of getting both
ID variables into the final result.
HTH,
Dennis
On Tue, Jun 29, 2010 at 12:05 PM, Yi <liuyi.feier@gmail.com> wrote:
> Hi, folks,
>
> I am sorry that I did not state the problem correctly yesterday.
>
> Please let me address the problem by the following codes:
>
>
first=c('u','b','e','k','j','c','u','f','c','e')
>
>
second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)
>
> ## You may understand values in the first column are the unique codes for
> those in the second column.
> ####So 'u' is only for usa. Replicate values appear the same rows
for the
> first and second columns.
> ### Now I want to delete replicate rows with the same values in first
> (sceond) rows
> ####and sum up values in the third column for the same values.
>
> mm=melt(data,id='first')
> sum=cast(mm,first~variable,sum) ### This does not work.
>
> ###I tried another way to do this
> mm= melt(data, id='first',measure='third')
> sum=cast(mm,first~variable,sum)
>
> ## But then the problem is how to 'merge' the result with the
second column
> in the dataset.
>
>
> The expected dataframe is like this:
>
> (I showed a wrong expected dataframe yesterday.)
>
> first second third
> 1 u usa 8
> 2 b Brazil 2
> 3 e England 13
> 4 k Korea 4
> 5 j Japan 5
> 6 c China 15
> 8 f France 8
>
> Thanks in advance.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]