I am attempting to clean up some land use building data and need to join some buildings together making sure not to double count GIS slivers. The first data.frame is the original, the 2nd adds all the acres for each identical bldgid. I now want to a) throw out all but one of the the cases where the Years and ImpValue are Identical, b) Sum the impvalues based on: 1) The Years are identical and the ImpValue are not 2)The ImpValues are identical and the Years are not Resulting in the 3rd data frame. Please consider the following DF=cbind(Acres,Bldgid,Year,ImpValue) DF<-as.data.frame(DF) DF Acres Bldgid Year ImpValue 1 100.00 1 1946 1000 2 101.00 2 1952 1400 3 100.00 3 1922 1300 4 130.00 4 1910 900 5 156.00 5 1955 5000 6 0.50 5 1955 1200 7 293.00 6 1999 500 8 300.00 7 1990 9000 9 0.09 7 1991 9000 10 100.00 8 2000 1000 11 12.50 8 2000 1000 #Aggregate acres where identical ids dupbuild<-aggregate(DF$Acres,DF["Bldgid"],sum) colnames(dupbuild)[2]<-"Acres" #Add aggregated Acres to DF DF$Acres<-dupbuild$Acres[match(DF$Bldgid,dupbuild$Bldgid)] DF Acres dgid Year ImpValue 1 100.00 1 1946 1000 2 101.00 2 1952 1400 3 100.00 3 1922 1300 4 130.00 4 1910 900 5 156.50 5 1955 5000 6 156.50 5 1955 1200 7 293.00 6 1999 500 8 300.09 7 1990 9000 9 300.09 7 1991 9000 10 112.50 8 2000 1000 11 112.50 8 2000 1000 #desired outcome data frame Acres dgid Year ImpValue 1 100.00 1 1946 1000 2 101.00 2 1952 1400 3 100.00 3 1922 1300 4 130.00 4 1910 900 5 156.50 5 1955 7200 #combined 5 & 6 7 293.00 6 1999 500 8 300.09 7 1990 18000 #combined 8 & 9 10 112.50 8 2000 1000 #one case thrown out So in this case the Impvalue are added together for rows 5 & 6 (from dataframe example 2) b/c the years are identical and the Impvalue is not, rows 8 and 9 have their Impvalue summed because there Years are identical but the improvement value is not, and one of the cases is thrown out of rows 10 & 11 because they have identical years and ImpValue. When rows are joined the Year value is no longer important but to remain consistent i would like to keep the earliest (lowest) year. There will be instances in the actual data where there are more than 2 cases to consider if that makes any coding difference as i didnt include any in my example data. It would also be useful to include a new column keeping track of how many joined bldgids . I think i can figure that one out though. Hope this is all clear. Thanks for the guidance and insights JR -- View this message in context: http://old.nabble.com/Summing-rows-based-on-criteria-tp26157755p26157755.html Sent from the R help mailing list archive at Nabble.com.