I am in the process of switching from SAS over to R. I am working on very large CSV datasets that contain vehicle information. As I am processing the data, I need to select the first (or sometimes the second) record (by date) for any records that have the same license plate number. In SAS, there is a function called 'first.' that can be used on sorted datasets to pull out those first entries for each occurrence of a particular variable (in this case the variable is 'license plate') found in the data. I have spent some time looking around and cannot seem to find an equivalent function in R. Can anyone recommend an efficient technique that would pull this off? I assume the database must first be sorted by vehicle plate and date, and then apply the filter or function. Any help would be greatly appreciated. Thanks, Joe -- View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566260.html Sent from the R help mailing list archive at Nabble.com.
I've attached some functions I've written based on previous questions that have been posted here. Unfortunately, I was too lazy to give credit to previous commenters in my Rd file, and for that I hope they'll forgive me. In any case, please be assured that the functions I've attached are in no way my original work. Benjamin -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of wookie1976 Sent: Tuesday, February 23, 2010 12:40 PM To: r-help at r-project.org Subject: [R] First. Last. Data row selection I am in the process of switching from SAS over to R. I am working on very large CSV datasets that contain vehicle information. As I am processing the data, I need to select the first (or sometimes the second) record (by date) for any records that have the same license plate number. In SAS, there is a function called 'first.' that can be used on sorted datasets to pull out those first entries for each occurrence of a particular variable (in this case the variable is 'license plate') found in the data. I have spent some time looking around and cannot seem to find an equivalent function in R. Can anyone recommend an efficient technique that would pull this off? I assume the database must first be sorted by vehicle plate and date, and then apply the filter or function. Any help would be greatly appreciated. Thanks, Joe -- View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566260.htm l Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ================================== P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & World Report (2009). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you.
wookie1976 wrote:> > I am in the process of switching from SAS over to R. I am working on very > large CSV datasets that contain vehicle information. As I am processing > the data, I need to select the first (or sometimes the second) record (by > date) for any records that have the same license plate number. In SAS, > there is a function called 'first.' that can be used on sorted datasets to > pull out those first entries for each occurrence of a particular variable > (in this case the variable is 'license plate') found in the data. I have > spent some time looking around and cannot seem to find an equivalent > function in R. Can anyone recommend an efficient technique that would > pull this off? I assume the database must first be sorted by vehicle > plate and date, and then apply the filter or function. Any help would be > greatly appreciated. > > Thanks, Joe >For the selection of first and last elements from a list, data frame or matrices, look at the head() or tail() functions. The split() function can be used to subset a data.frame into smaller collections based on factors such as the year or license plate. There is a way to combine the effects of split() with another function such as head() using the base function by() or a function like ddply() from Hadley's plyr package. To give an example, I would require some example data (preferable pasted as the output from dput(), tabularized data tends to get mangled in email and requires reprocessing and reformatting before it can be loaded as an R object). -Charlie -- View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566418.html Sent from the R help mailing list archive at Nabble.com.
These tell you the first and last row for each plate: !duplicated(df$plate) !duplicated(df$plate, fromLast=TRUE) Hope that helps. Steve>>>From: wookie1976 <joe.roeschen@revecorp.com> To:<r-help@r-project.org> Date: 24/Feb/2010 6:54 a.m. Subject: [R] First. Last. Data row selection I am in the process of switching from SAS over to R. I am working on very large CSV datasets that contain vehicle information. As I am processing the data, I need to select the first (or sometimes the second) record (by date) for any records that have the same license plate number. In SAS, there is a function called 'first.' that can be used on sorted datasets to pull out those first entries for each occurrence of a particular variable (in this case the variable is 'license plate') found in the data. I have spent some time looking around and cannot seem to find an equivalent function in R. Can anyone recommend an efficient technique that would pull this off? I assume the database must first be sorted by vehicle plate and date, and then apply the filter or function. Any help would be greatly appreciated. Thanks, Joe -- View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566260.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Steve, Your example seems to work quite well, except I get a summary printout showing all the true and false values.> !duplicated(ladata2$Vin)[1] TRUE TRUE TRUE TRUE What I would like to do is have the true/false values appended to a column at the end of my dataset so that when done, I can selectively either keep or drop rows based on whether they are true or false. I have tried to get this done with several variable creation statements with no luck. Any ideas on how I can modify your code to get the following: Plate, Date, True.False Plate1, 013110, true Plate 1, 010110, false Plate 1, 010109, false Plate 2, 020110, true To everyone else, thanks greatly for your examples! I have learned from each of your suggestions. -- View this message in context: http://n4.nabble.com/First-Last-Data-row-selection-tp1566260p1566801.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- simtest for Dunnett's test
- Use nparcomp function from nparcomp library to run post hoc
- Replacing a character string when finding substring match
- Identifying last record in individual growth data over different time intervalls
- projecting GIS coordinates for analysis with spatstat package