GradStudentDD
2012-Sep-27  19:46 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hi,
I have a data set of observations by either one person or a pair of people.
I want to only keep the pair observations, and was using the code below
until it gave me the error " $ operator is invalid for atomic
vectors". I am
just beginning to learn R, so I apologize if the code is really rough.
Basically I want to keep all the rows in the data set for which the value of
"Pairiddups" is TRUE. How do I do it? And how do I get past the error?
Thank you so much,
Diana
PairID<-c(Health2$pairid)
duplicated(PairID, incomparables=TRUE, fromLast=TRUE)
PairIDdup=duplicated(PairID)
cbind(PairID, PairIDdup)
PairID[which(PairIDdup)]
PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)]
PairIDs<-cbind(PairID, PairIDDuplicates)
colnames(PairIDs)<-c("Pairid","Pairiddups")
Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]
--
View this message in context:
http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html
Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Sep-27  20:27 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hello, That way of refering to variables can be troublesome. Try PairIDs[, "Pairiddups"] Hope this helps, Rui Barradas Em 27-09-2012 20:46, GradStudentDD escreveu:> Hi, > > I have a data set of observations by either one person or a pair of people. > I want to only keep the pair observations, and was using the code below > until it gave me the error " $ operator is invalid for atomic vectors". I am > just beginning to learn R, so I apologize if the code is really rough. > > Basically I want to keep all the rows in the data set for which the value of > "Pairiddups" is TRUE. How do I do it? And how do I get past the error? > > Thank you so much, > Diana > > PairID<-c(Health2$pairid) > > duplicated(PairID, incomparables=TRUE, fromLast=TRUE) > > PairIDdup=duplicated(PairID) > cbind(PairID, PairIDdup) > PairID[which(PairIDdup)] > > PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] > PairIDs<-cbind(PairID, PairIDDuplicates) > > colnames(PairIDs)<-c("Pairid","Pairiddups") > > Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2012-Sep-27  20:33 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
Hello, again. There was another error in the line in question. TRUE does not need quotes. In fact, with quotes you're comparing to a character string, not to a logical value. And the other tip still holds, use as follows in the complete and corrected line below. Health2PairsOnly <- PairIDs[ which(PairIDs[, "Pairiddups"] == TRUE), ] Hope this helps, Rui Barradas Em 27-09-2012 20:46, GradStudentDD escreveu:> Hi, > > I have a data set of observations by either one person or a pair of people. > I want to only keep the pair observations, and was using the code below > until it gave me the error " $ operator is invalid for atomic vectors". I am > just beginning to learn R, so I apologize if the code is really rough. > > Basically I want to keep all the rows in the data set for which the value of > "Pairiddups" is TRUE. How do I do it? And how do I get past the error? > > Thank you so much, > Diana > > PairID<-c(Health2$pairid) > > duplicated(PairID, incomparables=TRUE, fromLast=TRUE) > > PairIDdup=duplicated(PairID) > cbind(PairID, PairIDdup) > PairID[which(PairIDdup)] > > PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)] > PairIDs<-cbind(PairID, PairIDDuplicates) > > colnames(PairIDs)<-c("Pairid","Pairiddups") > > Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ] > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Keep-rows-in-a-dataset-if-one-value-in-a-column-is-duplicated-tp4644420.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Simon Knapp
2012-Sep-28  00:35 UTC
[R] Keep rows in a dataset if one value in a column is duplicated
#By using cbind in:
PairIDs<-cbind(PairID, PairIDDuplicates)
#You create a numeric matrix (the logical
#vector PairIDDuplicates gets converted
#to numeric - note that your second column
#contains 1s and 0s, not Trues and Falses).
#Matricies are not subsetable using $,
#they are basically a vector with
#a dimension attribute - hence your error).
#Two ways you could have avoided your error are:
# 1) changing the cbind to data.frame
PairIDs <- data.frame(PairID, PairIDDuplicates)
names(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs$Pairiddups,]
# 2) using the dimensions name like:
PairIDs<-cbind(PairID, PairIDDuplicates)
colnames(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs[,'Pairiddups']==1,]
#In the latter you can save a line of code with
PairIDs <- data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates)
#Note that there is a fair bit of redundancy throughout
#your code. A neater way of subsetting your original
#data, for instance, would be:
PairIDdup <- unique(PairID[duplicated(PairID)])
Health2[PairID %in% PairIDdup,]
Have Fun!
Simon Knapp
On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD <dd7kc at virginia.edu>
wrote:> Hi,
>
> I have a data set of observations by either one person or a pair of people.
> I want to only keep the pair observations, and was using the code below
> until it gave me the error " $ operator is invalid for atomic
vectors". I am
> just beginning to learn R, so I apologize if the code is really rough.
>
> Basically I want to keep all the rows in the data set for which the value
of
> "Pairiddups" is TRUE. How do I do it? And how do I get past the
error?
>
> Thank you so much,
> Diana
>
> PairID<-c(Health2$pairid)
>
> duplicated(PairID, incomparables=TRUE, fromLast=TRUE)
>
> PairIDdup=duplicated(PairID)
> cbind(PairID, PairIDdup)
> PairID[which(PairIDdup)]
>
> PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)]
> PairIDs<-cbind(PairID, PairIDDuplicates)
>
> colnames(PairIDs)<-c("Pairid","Pairiddups")
>
> Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]