I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. # Values of PID column> jdata[,"PID"][1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column> delete<-c(14772,14744)#Try to delete last two rows, but as you will see, I am not able to drop the last two rows.> jdata[jdata$PID!=delete,"PID"][1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772>Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu Confidentiality Statement: This email message, including any attachments, is for the so...{{dropped}}
On Sun, 2007-03-25 at 22:19 -0400, John Sorkin wrote:> I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. > > > # Values of PID column > > jdata[,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > > #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column > > delete<-c(14772,14744) > > #Try to delete last two rows, but as you will see, I am not able to drop the last two rows. > > jdata[jdata$PID!=delete,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > >John, If you had: delete <- c(14744, 14773) it would likely work, but only in this particular setting where you are comparing two sequential values. That is because you are testing a sequence of two values and the way that you have them above, they are reversed from the order in which the values actually appear. For example: Vec <- 1:10 delete <- 10:9> Vec[Vec != delete][1] 1 2 3 4 5 6 7 8 9 10 However: delete <- 9:10> Vec[Vec != delete][1] 1 2 3 4 5 6 7 8 Note what happens when the values in the source vector are not sequential: Vec <- sample(10)> Vec[1] 5 1 7 3 10 8 2 6 9 4 delete <- 9:10> Vec[Vec != delete][1] 5 1 7 3 10 8 2 6 4 delete <- 10:9> Vec[Vec != delete][1] 5 1 7 3 8 2 6 9 4 You get a result in which the first value in 'delete' is removed, but not the second. When performing a logical comparison of a value to see if it is (or is not) in a set of values, you want to use '%in%': Vec <- 1:10 delete <- 10:9> Vec[!Vec %in% delete][1] 1 2 3 4 5 6 7 8 delete <- 9:10> Vec[!Vec %in% delete][1] 1 2 3 4 5 6 7 8 It also works in the permuted vector:> Vec[!Vec %in% delete][1] 5 1 7 3 8 2 6 4 See ?"%in%" for more information. HTH, Marc Schwartz
> jdataPID 1 14854 2 10481 3 14793 4 14744 5 14772> jdata[jdata[1] != delete, 1][1] 14854 10481 14793 On 3/25/07, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. > > > # Values of PID column > > jdata[,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > > #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column > > delete<-c(14772,14744) > > #Try to delete last two rows, but as you will see, I am not able to drop the last two rows. > > jdata[jdata$PID!=delete,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > > > > > Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the so...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog)
Sorry, John Marc's method is correct. On 3/25/07, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. > > > # Values of PID column > > jdata[,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > > #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column > > delete<-c(14772,14744) > > #Try to delete last two rows, but as you will see, I am not able to drop the last two rows. > > jdata[jdata$PID!=delete,"PID"] > [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 > [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 > [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 > [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 > > > > > Thanks, > John > > John Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > Baltimore VA Medical Center GRECC, > University of Maryland School of Medicine Claude D. Pepper OAIC, > University of Maryland Clinical Nutrition Research Unit, and > Baltimore VA Center Stroke of Excellence > > University of Maryland School of Medicine > Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > jsorkin at grecc.umaryland.edu > > Confidentiality Statement: > This email message, including any attachments, is for the so...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog)
Bill.Venables at csiro.au
2007-Mar-26 04:00 UTC
[R] Problem dropping rows based on values in a column
I think you want delete <- c(14772,14744) jdata <- subset(jdata, !(PID %in% delete)) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin Sent: Monday, 26 March 2007 12:19 PM To: r-help at stat.math.ethz.ch Subject: [R] Problem dropping rows based on values in a column I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. # Values of PID column> jdata[,"PID"][1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column> delete<-c(14772,14744)#Try to delete last two rows, but as you will see, I am not able to drop the last two rows.> jdata[jdata$PID!=delete,"PID"][1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772>Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) jsorkin at grecc.umaryland.edu Confidentiality Statement: This email message, including any attachments, is for the\ s...{{dropped}}