Michael_Nielsen/Syd/Synergy.SYNERGY@synergy.com.au
2002-Jan-16 02:10 UTC
[R] Subsetting data frames without a loop
I KNOW this should be easy, but I'm stuck. My data frame consists of multiple observations from each of a number of stations, and what I would like to do is create another data frame that contains all the variables of the first, but only rows where a certain variable is at its maximum for the station. So, for example:> my.dfstn obs v 1 1 1 0.26400396 2 2 1 -0.79194397 3 3 1 0.11924528 4 4 1 0.42596859 5 5 1 -0.50528235 6 1 2 -1.57524853 7 2 2 0.17762482 8 3 2 -0.83013770 9 4 2 -0.53203400 10 5 2 -2.71397275 11 1 3 0.26902053 12 2 3 2.01147908 13 3 3 0.73301643 14 4 3 -0.67333384 15 5 3 -1.36219773 16 1 4 -2.20342109 17 2 4 0.18941702 18 3 4 0.51492032 19 4 4 0.03597370 20 5 4 -1.43502366 21 1 5 -1.34589392 22 2 5 1.00389195 23 3 5 -0.21233041 24 4 5 -1.35141044 25 5 5 -0.02052348> tapply(v,factor(stn),max)1 2 3 4 5 0.26902053 2.01147908 0.73301643 0.42596859 -0.02052348 so my new data frame should contain (possibly multiple rows per station) stn obs v 1 1 3 0.26902053 2 2 3 2.01147908 3 3 3 0.73301643 4 4 1 0.42596859 5 5 5 -0.02052348 Thanks in advance. Regards, Mike -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Michael_Nielsen/Syd/Synergy.SYNERGY at synergy.com.au wrote:> > I KNOW this should be easy, but I'm stuck. > > My data frame consists of multiple observations from each of a number of > stations, and what I would like to do is create another data frame that > contains all the variables of the first, but only rows where a certain > variable is at its maximum for the station. > > So, for example: > > > my.df > stn obs v > 1 1 1 0.26400396 > 2 2 1 -0.79194397 > 3 3 1 0.11924528 > 4 4 1 0.42596859 > 5 5 1 -0.50528235 > 6 1 2 -1.57524853 > 7 2 2 0.17762482 > 8 3 2 -0.83013770 > 9 4 2 -0.53203400 > 10 5 2 -2.71397275 > 11 1 3 0.26902053 > 12 2 3 2.01147908 > 13 3 3 0.73301643 > 14 4 3 -0.67333384 > 15 5 3 -1.36219773 > 16 1 4 -2.20342109 > 17 2 4 0.18941702 > 18 3 4 0.51492032 > 19 4 4 0.03597370 > 20 5 4 -1.43502366 > 21 1 5 -1.34589392 > 22 2 5 1.00389195 > 23 3 5 -0.21233041 > 24 4 5 -1.35141044 > 25 5 5 -0.02052348 > > > tapply(v,factor(stn),max) > 1 2 3 4 5 > 0.26902053 2.01147908 0.73301643 0.42596859 -0.02052348 > > so my new data frame should contain (possibly multiple rows per station) > > stn obs v > 1 1 3 0.26902053 > 2 2 3 2.01147908 > 3 3 3 0.73301643 > 4 4 1 0.42596859 > 5 5 5 -0.02052348As a first idea: my.df[tapply(v,factor(stn), function(x) which(v==max(x))),] Uwe -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, I've been working on data with a similar issue and ended up with the following code. Given the test data in your post it seems to work. maxvals <- tapply(v,factor(stn),max) subset(my.df, v >= maxvals[stn]) However I've only been using R for a couple of weeks so comments on the appropriateness of this solution or better/faster code please! Regards Jackson -----Original Message----- From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] Sent: 16 January 2002 08:44 To: Michael_Nielsen/Syd/Synergy.SYNERGY at synergy.com.au Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Subsetting data frames without a loop Michael_Nielsen/Syd/Synergy.SYNERGY at synergy.com.au wrote:> > I KNOW this should be easy, but I'm stuck. > > My data frame consists of multiple observations from each of a number of > stations, and what I would like to do is create another data frame that > contains all the variables of the first, but only rows where a certain > variable is at its maximum for the station. > > So, for example: > > > my.df > stn obs v > 1 1 1 0.26400396 > 2 2 1 -0.79194397 > 3 3 1 0.11924528 > 4 4 1 0.42596859 > 5 5 1 -0.50528235 > 6 1 2 -1.57524853 > 7 2 2 0.17762482 > 8 3 2 -0.83013770 > 9 4 2 -0.53203400 > 10 5 2 -2.71397275 > 11 1 3 0.26902053 > 12 2 3 2.01147908 > 13 3 3 0.73301643 > 14 4 3 -0.67333384 > 15 5 3 -1.36219773 > 16 1 4 -2.20342109 > 17 2 4 0.18941702 > 18 3 4 0.51492032 > 19 4 4 0.03597370 > 20 5 4 -1.43502366 > 21 1 5 -1.34589392 > 22 2 5 1.00389195 > 23 3 5 -0.21233041 > 24 4 5 -1.35141044 > 25 5 5 -0.02052348 > > > tapply(v,factor(stn),max) > 1 2 3 4 5 > 0.26902053 2.01147908 0.73301643 0.42596859 -0.02052348 > > so my new data frame should contain (possibly multiple rows per station) > > stn obs v > 1 1 3 0.26902053 > 2 2 3 2.01147908 > 3 3 3 0.73301643 > 4 4 1 0.42596859 > 5 5 5 -0.02052348As a first idea: my.df[tapply(v,factor(stn), function(x) which(v==max(x))),] Uwe -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ************ This email and any accompanying documents are intended only for the named recipient, are confidential and may be privileged. If you are not the intended recipient please notify us immediately by mailto:admin at britannic.co.uk and you must not copy, disclose or otherwise use this message. Unauthorised use is strictly prohibited and may be unlawful. The content of this email represents the view of the individual and not the company. The company reserves the right to monitor the content of all emails in accordance with lawful business practice. Whilst attachments are virus checked before transmission, Britannic Assurance plc does not accept any liability in respect of any virus which is not detected. Britannic Assurance plc, No.3002 is registered in England and maintains its registered office at 1 Wythall Green Way, Wythall, Birmingham B47 6WG. Telephone: 0870 887 0001 Fax: 0870 887 0002 Website: www.britannicassurance.com Britannic Assurance plc, Britannic Unit Linked Assurance Limited, Britannic ISA Managers Limited and Britannic Unit Trust Managers Limited are regulated by the Financial Services Authority. Each of these companies is a member of the Britannic marketing group which only advises on and sells its own life assurance, pension, unit trust and ISA products. ************ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._