Hi, I am new to R so I would appreciate any help. I have some data that has passenger flight data between city pairs. The way I got the data, there are multiple rows of data for each city pair; the number of passengers needs to be summed to get a TOTAL annual passenger count for each city pair. So my question is: how do I create a new table (or data frame) that selectively sums My initial thought would be to iterate through each row with the following logic: 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add them to the table 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the passengers (and do not add a new row) Is this logical? If so, I think I just need some help on syntax (or do I use a script?). Thanks. The first few rows of data look like this: -- View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html Sent from the R help mailing list archive at Nabble.com.
It would be good to follow the posting guide and at least supply a sample of the data. Most likely 'tapply' is one way of doing it: tapply(df$passenger, list(df$orig, df$dest), sum) On Mon, Oct 24, 2011 at 11:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:> Hi, I am new to R so I would appreciate any help. I have some data that has > passenger flight data between city pairs. The way I got the data, there are > multiple rows of data for each city pair; the number of passengers needs to > be summed to get a TOTAL annual passenger count for each city pair. > > So my question is: how do I create a new table (or data frame) that > selectively sums > > My initial thought would be to iterate through each row with the following > logic: > > 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add > them to the table > 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the > passengers (and do not add a new row) > > Is this logical? If so, I think I just need some help on syntax (or do I use > a script?). Thanks. > > The first few rows of data look like this: > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
See the count() function in the plyr package; it does fast summation. Something like library('plyr') count(passengerData, c('ORIGIN_WAC', 'DEST_WAC'), 'npassengers') HTH, Dennis On Mon, Oct 24, 2011 at 8:27 AM, asindc <siirilaa at eastwestcenter.org> wrote:> Hi, I am new to R so I would appreciate any help. I have some data that has > passenger flight data between city pairs. The way I got the data, there are > multiple rows of data for each city pair; the number of passengers needs to > be summed to get a TOTAL annual passenger count for each city pair. > > So my question is: how do I create a new table (or data frame) that > selectively sums > > My initial thought would be to iterate through each row with the following > logic: > > 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add > them to the table > 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the > passengers (and do not add a new row) > > Is this logical? If so, I think I just need some help on syntax (or do I use > a script?). Thanks. > > The first few rows of data look like this: > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-tp3933512p3933512.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >