Hello, I have sample data set that looks like: YEAR MONTH DAY CONTINUE SPL TIMEFISH TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID CONVUNIT 1992 1 26 1 SP0073928 8 H 7 25 4 NA 1000000 02163399054 161 1992 1 26 1 SP0073928 8 H 7 25 4 NA 1000000 02163399054 8 1992 1 26 2 SP0004228 8 H 7 25 4 NA 1000000 02163399054 161 1992 1 26 2 SP0004228 8 H 7 25 4 NA 1000000 02163399054 8 1992 1 25 NA SP0052652 8 H 7 25 4 NA 1000000 02163399057 85 1992 1 26 NA SP0037940 8 H 7 25 4 NA 1000000 02163399058 70 1992 1 27 NA SP0072357 8 H 7 25 4 NA 1000000 02163399059 15 1992 1 27 NA SP0072357 8 H 7 25 4 NA 1000000 02163399059 20 1992 1 27 NA SP0026324 8 H 7 25 4 NA 1000000 02163399060 8 1992 1 28 1 SP0072357 8 H 7 25 4 NA 1000000 02163399062 200 How can I use unique to extract the rows that have repeated tripid's only, not a unique value for each variable but only for TRIPID. I then want to condense the unique values by summing the CONVUNIT for each unique value of TRIPID. I posted a similar question last week and received a sufficient answer of how to do this without using uniqe. The solution below worked just fine on this sample data set but the full data set has 446,000 rows of data and my computer and R simply cannot handle this follwing code on data this large. conds<-by(Step4,Step4$TRIPID,function(x) replace(x[1,],"CONVUNIT",sum(x$CONVUNIT))) Step5<-do.call(rbind,conds) Thank you, Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 cameron.guenther at myfwc.com
On May 10, 2006, at 4:02 PM, Guenther, Cameron wrote:> How can I use unique to extract the rows that have repeated tripid's > only, not a unique value for each variable but only for TRIPID. I > then > want to condense the unique values by summing the CONVUNIT for each > unique value of TRIPID.Thanks, Cameron, for this question. This type of manipulation would be relatively simple to do in a RDBMS (e.g. MySQL, PostgreSQL, Oracle, etc.) But I'm curious to see how one would do the same in R. So, if folks send you solutions off-list, please do post them back to the list. Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent
If you only care about the sum of CONVUNIT by each TRIPID then you can use tapply i.e.: step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) result 111 222 333 115 107 123 Is this what you wanted to do? I can't think of anything faster than tapply for your problem. I hope this helps Francisco>From: "Guenther, Cameron" <Cameron.Guenther at MyFWC.com> >To: <r-help at stat.math.ethz.ch> >Subject: [R] Unique? >Date: Wed, 10 May 2006 17:02:33 -0400 > > >Hello, >I have sample data set that looks like: > >YEAR MONTH DAY CONTINUE SPL TIMEFISH >TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID >CONVUNIT >1992 1 26 1 SP0073928 8 >H 7 25 4 NA 1000000 >02163399054 161 >1992 1 26 1 SP0073928 8 >H 7 25 4 NA 1000000 >02163399054 8 >1992 1 26 2 SP0004228 8 >H 7 25 4 NA 1000000 >02163399054 161 >1992 1 26 2 SP0004228 8 >H 7 25 4 NA 1000000 >02163399054 8 >1992 1 25 NA SP0052652 8 >H 7 25 4 NA 1000000 >02163399057 85 >1992 1 26 NA SP0037940 8 >H 7 25 4 NA 1000000 >02163399058 70 >1992 1 27 NA SP0072357 8 >H 7 25 4 NA 1000000 >02163399059 15 >1992 1 27 NA SP0072357 8 >H 7 25 4 NA 1000000 >02163399059 20 >1992 1 27 NA SP0026324 8 >H 7 25 4 NA 1000000 >02163399060 8 >1992 1 28 1 SP0072357 8 >H 7 25 4 NA 1000000 >02163399062 200 > >How can I use unique to extract the rows that have repeated tripid's >only, not a unique value for each variable but only for TRIPID. I then >want to condense the unique values by summing the CONVUNIT for each >unique value of TRIPID. I posted a similar question last week and >received a sufficient answer of how to do this without using uniqe. The >solution below worked just fine on this sample data set but the full >data set has 446,000 rows of data and my computer and R simply cannot >handle this follwing code on data this large. > >conds<-by(Step4,Step4$TRIPID,function(x) >replace(x[1,],"CONVUNIT",sum(x$CONVUNIT))) >Step5<-do.call(rbind,conds) > >Thank you, > >Cameron Guenther, Ph.D. >Associate Research Scientist >FWC/FWRI, Marine Fisheries Research >100 8th Avenue S.E. >St. Petersburg, FL 33701 >(727)896-8626 Ext. 4305 >cameron.guenther at myfwc.com > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html
Hi Cameron You need to be more specific when you ask a question so you can get a better answer. Anyhow, when you say that you want to retain all the other variables do you mean that you want to create a new column in the dataset that contains the calculated sum? If that is the case you can use a construction like: set.seed(1) step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) step4[,"SUM"]=result[match(step4[,"TRIPID"],names(result))] step4 TRIPID CONVUNIT Sum 1 111 36 122 2 222 48 121 3 333 48 129 4 111 42 122 5 222 30 121 6 333 43 129 7 111 44 122 8 222 43 121 9 333 38 129 Cheers Francisco>From: "Guenther, Cameron" <Cameron.Guenther at MyFWC.com> >To: "Francisco J. Zagmutt" <gerifalte28 at hotmail.com> >Subject: RE: [R] Unique? >Date: Thu, 11 May 2006 12:08:31 -0400 > >It is close but not quite what I want. I need to retain all of the >other variables as well. > > >Cameron Guenther, Ph.D. >Associate Research Scientist >FWC/FWRI, Marine Fisheries Research >100 8th Avenue S.E. >St. Petersburg, FL 33701 >(727)896-8626 Ext. 4305 >cameron.guenther at myfwc.com >-----Original Message----- >From: Francisco J. Zagmutt [mailto:gerifalte28 at hotmail.com] >Sent: Wednesday, May 10, 2006 6:06 PM >To: Guenther, Cameron; r-help at stat.math.ethz.ch >Subject: RE: [R] Unique? > >If you only care about the sum of CONVUNIT by each TRIPID then you can >use tapply i.e.: > >step4<-data.frame(TRIPID=rep(c(111,222,333),3),CONVUNIT=rpois(9,40)) >result<-tapply(step4$CONVUNIT,INDEX=step4$TRIPID,FUN=sum) >result >111 222 333 >115 107 123 > >Is this what you wanted to do? I can't think of anything faster than >tapply for your problem. > >I hope this helps > >Francisco > > > > > >From: "Guenther, Cameron" <Cameron.Guenther at MyFWC.com> > >To: <r-help at stat.math.ethz.ch> > >Subject: [R] Unique? > >Date: Wed, 10 May 2006 17:02:33 -0400 > > > > > >Hello, > >I have sample data set that looks like: > > > >YEAR MONTH DAY CONTINUE SPL TIMEFISH > >TIMEUNIT AREA COUNTY DEPTH DEPUNIT GEAR TRIPID > >CONVUNIT > >1992 1 26 1 SP0073928 8 > >H 7 25 4 NA 1000000 > >02163399054 161 > >1992 1 26 1 SP0073928 8 > >H 7 25 4 NA 1000000 > >02163399054 8 > >1992 1 26 2 SP0004228 8 > >H 7 25 4 NA 1000000 > >02163399054 161 > >1992 1 26 2 SP0004228 8 > >H 7 25 4 NA 1000000 > >02163399054 8 > >1992 1 25 NA SP0052652 8 > >H 7 25 4 NA 1000000 > >02163399057 85 > >1992 1 26 NA SP0037940 8 > >H 7 25 4 NA 1000000 > >02163399058 70 > >1992 1 27 NA SP0072357 8 > >H 7 25 4 NA 1000000 > >02163399059 15 > >1992 1 27 NA SP0072357 8 > >H 7 25 4 NA 1000000 > >02163399059 20 > >1992 1 27 NA SP0026324 8 > >H 7 25 4 NA 1000000 > >02163399060 8 > >1992 1 28 1 SP0072357 8 > >H 7 25 4 NA 1000000 > >02163399062 200 > > > >How can I use unique to extract the rows that have repeated tripid's > >only, not a unique value for each variable but only for TRIPID. I then > > >want to condense the unique values by summing the CONVUNIT for each > >unique value of TRIPID. I posted a similar question last week and > >received a sufficient answer of how to do this without using uniqe. > >The solution below worked just fine on this sample data set but the > >full data set has 446,000 rows of data and my computer and R simply > >cannot handle this follwing code on data this large. > > > >conds<-by(Step4,Step4$TRIPID,function(x) > >replace(x[1,],"CONVUNIT",sum(x$CONVUNIT))) > >Step5<-do.call(rbind,conds) > > > >Thank you, > > > >Cameron Guenther, Ph.D. > >Associate Research Scientist > >FWC/FWRI, Marine Fisheries Research > >100 8th Avenue S.E. > >St. Petersburg, FL 33701 > >(727)896-8626 Ext. 4305 > >cameron.guenther at myfwc.com > > > >______________________________________________ > >R-help at stat.math.ethz.ch mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide! > >http://www.R-project.org/posting-guide.html > >