thr3ads.net - R help - [R] How to delete the replicate rows by summing up the numeric columns [Jun 2010]

If this information is useful, please help other people find it:
Share via:

2010-Jun-29 19:05 UTC

[R] How to delete the replicate rows by summing up the numeric columns

Hi, folks,

I am sorry that I did not state the problem correctly yesterday.

Please let me address the problem by the following codes:

first=c('u','b','e','k','j','c','u','f','c','e')
second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
third=1:10
data=data.frame(first,second,third)

## You may understand values in the first column are the unique codes for
those in the second column.
####So 'u' is only for usa. Replicate values appear the same rows for
the
first and second columns.
### Now I want to delete replicate rows with the same values in first
(sceond) rows
####and sum up values in the third column for the same values.

mm=melt(data,id='first')
sum=cast(mm,first~variable,sum) ### This does not work.

###I tried another way to do this
mm= melt(data, id='first',measure='third')
sum=cast(mm,first~variable,sum)

## But then the problem is how to 'merge' the result with the second
column
in the dataset.


The expected dataframe is like this:

(I showed a wrong expected dataframe yesterday.)

     first   second  third
1      u     usa      8
2      b   Brazil     2
3      e  England   13
4      k   Korea     4
5      j   Japan      5
6      c   China     15
8      f  France     8

Thanks in advance.

	[[alternative HTML version deleted]]

David Winsemius

2010-Jun-29 21:58 UTC

head link

[R] How to delete the replicate rows by summing up the numeric columns

On Jun 29, 2010, at 3:05 PM, Yi wrote:
> Hi, folks,
>
> I am sorry that I did not state the problem correctly yesterday.
>
> Please let me address the problem by the following codes:
>
>
first=c('u','b','e','k','j','c','u','f','c','e')
> second
> > c
> ('usa
> ','Brazil
>
','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)
>
> ## You may understand values in the first column are the unique  
> codes for
> those in the second column.
> ####So 'u' is only for usa. Replicate values appear the same rows  
> for the
> first and second columns.
> ### Now I want to delete replicate rows with the same values in first
> (sceond) rows
> ####and sum up values in the third column for the same values.
>
> mm=melt(data,id='first')
> sum=cast(mm,first~variable,sum) ### This does not work.
>
> ###I tried another way to do this
> mm= melt(data, id='first',measure='third')
> sum=cast(mm,first~variable,sum)
>
> ## But then the problem is how to 'merge' the result with the
second
> column
> in the dataset.
 > data$summed <- ave(data$third, data$first, FUN=sum)
#computed sums within groups defined by "first"
 > data[!duplicated(data$first), c("first", "second",
"summed")]
#remove duplicates and leave out "third"

   first  second summed
1     u     usa      8
2     b  Brazil      2
3     e England     13
4     k   Korea      4
5     j   Japan      5
6     c   China     15
8     f  France      8
>
>
> The expected dataframe is like this:
>
> (I showed a wrong expected dataframe yesterday.)
>
>     first   second  third
> 1      u     usa      8
> 2      b   Brazil     2
> 3      e  England   13
> 4      k   Korea     4
> 5      j   Japan      5
> 6      c   China     15
> 8      f  France     8
>
> Thanks in advance.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Nikhil Kaza

2010-Jun-29 22:19 UTC

head link

[R] How to delete the replicate rows by summing up the numeric columns

require(reshape)
cast(data, first+second~ ., sum)


Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.list at gmail.com

On Jun 29, 2010, at 3:05 PM, Yi wrote:
>
first=c('u','b','e','k','j','c','u','f','c','e')
> second
> > c
> ('usa
> ','Brazil
>
','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)

Dennis Murphy

2010-Jun-30 01:04 UTC

head link

[R] How to delete the replicate rows by summing up the numeric columns

Hi:

If you can deal with alphabetic order, the following seems to work:

v <- aggregate(third ~ first, data = data, FUN = sum)
v$second <- levels(data$second)
v[, c(1, 3, 2)]
  first  second third
1     b  Brazil     2
2     c   China    15
3     e England    13
4     f  France     8
5     j   Japan     5
6     k   Korea     4
7     u     usa     8

v$second works in this case because the levels are ordered and all are used
when inserted in v. That's not a guarantee in more complicated problems and
frankly, this one is a kludge.

A plyr version would be

v <- ddply(data, .(first), summarise, third = sum(third), second = second)
v[!duplicated(v$first), c(1, 3, 2)]
  first   second third
1     b  Brazil     2
2     c   China    15
4     e England    13
6     f  France     8
7     j   Japan     5
8     k   Korea     4
9     u     usa     8

The advantage of ddply over aggregate in this case is that ddply allows one
to insert second as an 'identity' of sorts; however, the result contains
duplicate rows, so we need to remove them in the second statement.

Using melt and cast from the reshape package,
mm <- melt(data, id = c('first', 'second'))
(ms <- cast(mm, first + second ~ . , sum))
  first  second (all)
1     b  Brazil     2
2     c   China    15
3     e England    13
4     f  France     8
5     j   Japan     5
6     k   Korea     4
7     u     usa     8

names(ms)[3] <- 'third'

This seems to be the cleanest version of the three in terms of getting both
ID variables into the final result.

HTH,
Dennis

On Tue, Jun 29, 2010 at 12:05 PM, Yi <liuyi.feier@gmail.com> wrote:
> Hi, folks,
>
> I am sorry that I did not state the problem correctly yesterday.
>
> Please let me address the problem by the following codes:
>
>
first=c('u','b','e','k','j','c','u','f','c','e')
>
>
second=c('usa','Brazil','England','Korea','Japan','China','usa','France','China','England')
> third=1:10
> data=data.frame(first,second,third)
>
> ## You may understand values in the first column are the unique codes for
> those in the second column.
> ####So 'u' is only for usa. Replicate values appear the same rows
for the
> first and second columns.
> ### Now I want to delete replicate rows with the same values in first
> (sceond) rows
> ####and sum up values in the third column for the same values.
>
> mm=melt(data,id='first')
> sum=cast(mm,first~variable,sum) ### This does not work.
>
> ###I tried another way to do this
> mm= melt(data, id='first',measure='third')
> sum=cast(mm,first~variable,sum)
>
> ## But then the problem is how to 'merge' the result with the
second column
> in the dataset.
>
>
> The expected dataframe is like this:
>
> (I showed a wrong expected dataframe yesterday.)
>
>     first   second  third
> 1      u     usa      8
> 2      b   Brazil     2
> 3      e  England   13
> 4      k   Korea     4
> 5      j   Japan      5
> 6      c   China     15
> 8      f  France     8
>
> Thanks in advance.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jun 2010 - How to delete the replicate rows by summing up the numeric columns

[R] How to delete the replicate rows by summing up the numeric columns

[R] How to delete the replicate rows by summing up the numeric columns

[R] How to delete the replicate rows by summing up the numeric columns

[R] How to delete the replicate rows by summing up the numeric columns

Seemingly Similar Threads