GradStudentDD
2012-Sep-27 19:46 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hi, I have a data set of observations by either one person or a pair of people. I want to only keep the pair observations, and was using the code below until it gave me the error " $ operator is invalid for atomic vectors". I am just beginning to learn R, so I apologize if the code is really rough. Basically I want to keep all the rows in the data set for which the value of "Pairiddups" is TRUE. How do I do it? And how do I get past the error? Thank you so much, Diana PairID<-c(Health2$pairid) duplicated(PairID, incomparables=TRUE, fromLast=TRUE) PairIDdup=duplicated(PairID) cbind(PairID, PairIDdup) PairID[which(PairIDdup)] PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] PairIDs<-cbind(PairID, PairIDDuplicates) colnames(PairIDs)<-c("Pairid","Pairiddups") Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] -- View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Sep-27 20:27 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hello, That way of refering to variables can be troublesome. Try PairIDs[, "Pairiddups"] Hope this helps, Rui Barradas Em 27-09-2012 20:46, GradStudentDD escreveu:> Hi, > > I have a data set of observations by either one person or a pair of people. > I want to only keep the pair observations, and was using the code below > until it gave me the error " $ operator is invalid for atomic vectors". I am > just beginning to learn R, so I apologize if the code is really rough. > > Basically I want to keep all the rows in the data set for which the value of > "Pairiddups" is TRUE. How do I do it? And how do I get past the error? > > Thank you so much, > Diana > > PairID<-c(Health2$pairid) > > duplicated(PairID, incomparables=TRUE, fromLast=TRUE) > > PairIDdup=duplicated(PairID) > cbind(PairID, PairIDdup) > PairID[which(PairIDdup)] > > PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] > PairIDs<-cbind(PairID, PairIDDuplicates) > > colnames(PairIDs)<-c("Pairid","Pairiddups") > > Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2012-Sep-27 20:33 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hello, again. There was another error in the line in question. TRUE does not need quotes. In fact, with quotes you're comparing to a character string, not to a logical value. And the other tip still holds, use as follows in the complete and corrected line below. Health2PairsOnly <- PairIDs[ which(PairIDs[, "Pairiddups"] == TRUE), ] Hope this helps, Rui Barradas Em 27-09-2012 20:46, GradStudentDD escreveu:> Hi, > > I have a data set of observations by either one person or a pair of people. > I want to only keep the pair observations, and was using the code below > until it gave me the error " $ operator is invalid for atomic vectors". I am > just beginning to learn R, so I apologize if the code is really rough. > > Basically I want to keep all the rows in the data set for which the value of > "Pairiddups" is TRUE. How do I do it? And how do I get past the error? > > Thank you so much, > Diana > > PairID<-c(Health2$pairid) > > duplicated(PairID, incomparables=TRUE, fromLast=TRUE) > > PairIDdup=duplicated(PairID) > cbind(PairID, PairIDdup) > PairID[which(PairIDdup)] > > PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] > PairIDs<-cbind(PairID, PairIDDuplicates) > > colnames(PairIDs)<-c("Pairid","Pairiddups") > > Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Simon Knapp
2012-Sep-28 00:35 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
#By using cbind in: PairIDs<-cbind(PairID, PairIDDuplicates) #You create a numeric matrix (the logical #vector PairIDDuplicates gets converted #to numeric - note that your second column #contains 1s and 0s, not Trues and Falses). #Matricies are not subsetable using $, #they are basically a vector with #a dimension attribute - hence your error). #Two ways you could have avoided your error are: # 1) changing the cbind to data.frame PairIDs <- data.frame(PairID, PairIDDuplicates) names(PairIDs) <- c("Pairid","Pairiddups") Health2PairsOnly <- PairIDs[PairIDs$Pairiddups,] # 2) using the dimensions name like: PairIDs<-cbind(PairID, PairIDDuplicates) colnames(PairIDs) <- c("Pairid","Pairiddups") Health2PairsOnly <- PairIDs[PairIDs[,'Pairiddups']==1,] #In the latter you can save a line of code with PairIDs <- data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates) #Note that there is a fair bit of redundancy throughout #your code. A neater way of subsetting your original #data, for instance, would be: PairIDdup <- unique(PairID[duplicated(PairID)]) Health2[PairID %in% PairIDdup,] Have Fun! Simon Knapp On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD <dd7kc at virginia.edu> wrote:> Hi, > > I have a data set of observations by either one person or a pair of people. > I want to only keep the pair observations, and was using the code below > until it gave me the error " $ operator is invalid for atomic vectors". I am > just beginning to learn R, so I apologize if the code is really rough. > > Basically I want to keep all the rows in the data set for which the value of > "Pairiddups" is TRUE. How do I do it? And how do I get past the error? > > Thank you so much, > Diana > > PairID<-c(Health2$pairid) > > duplicated(PairID, incomparables=TRUE, fromLast=TRUE) > > PairIDdup=duplicated(PairID) > cbind(PairID, PairIDdup) > PairID[which(PairIDdup)] > > PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] > PairIDs<-cbind(PairID, PairIDDuplicates) > > colnames(PairIDs)<-c("Pairid","Pairiddups") > > Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]