Hi there I have a data frame with about 65,000 rows and 8 variables. I am trying to get rid of the double entries of a factor variable "ID" so I can get a unique observation for each ID I tried:>dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 >observations..so far so good, but then when I check for duplicates>d_duplicated(dupl2$ID) >summary(as.factor(d))FALSE TRUE 6836 14711 Meaning that I am still getting 14,711 duplicates! I tried changing the ID type to integer and repeated the process but I got dentical results....what am I missing? Thanks!
> From: F Z > > Hi there > > I have a data frame with about 65,000 rows and 8 variables. > I am trying to > get rid of the double entries of a factor variable "ID" so I > can get a > unique observation for each ID > > I tried: > > >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 > >observations..so far so good, but then when I check for duplicates > > >d_duplicated(dupl2$ID) > >summary(as.factor(d)) > FALSE TRUE > 6836 14711 > > Meaning that I am still getting 14,711 duplicates! > > I tried changing the ID type to integer and repeated the > process but I got > dentical results....what am I missing?1. Upgrade your version of R. (That will teach you about using `_' for assignment!) 2. Call generics, not the methods; i.e., unique() instead of unique.data.frame(). 3. You want a data frame where the IDs are unique, not the combination of columns. Use: dupl <- data[unique(ID),] BTW, where did `dupl2' come from? Andy> Thanks! > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
data[!duplicated(data$ID),] will do. Your unique(data[ID,]) removes duplicated rows in data[ID,], assuming the object ID exists. Alec Stephenson Department of Statistics Macquarie University NSW 2109, Australia>>> "F Z" <gerifalte28 at hotmail.com> 06/25/04 12:12pm >>>Hi there I have a data frame with about 65,000 rows and 8 variables. I am trying to get rid of the double entries of a factor variable "ID" so I can get a unique observation for each ID I tried:>dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 >observations..so far so good, but then when I check for duplicates>d_duplicated(dupl2$ID) >summary(as.factor(d))FALSE TRUE 6836 14711 Meaning that I am still getting 14,711 duplicates! I tried changing the ID type to integer and repeated the process but I got dentical results....what am I missing? Thanks! ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Your code cannot possibly work in a recent version of R, so please try the current version (1.9.1). data[ID, ] is what? Why not just call unique() on ID? BTW, if you call methods such as unique.data.frame you are adding possible course of error -- here I suspect data[ID, ] is not what you intend. Please call the generic. On Fri, 25 Jun 2004, F Z wrote:> Hi there > > I have a data frame with about 65,000 rows and 8 variables. I am trying to > get rid of the double entries of a factor variable "ID" so I can get a > unique observation for each ID > > I tried: > > >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 > >observations..so far so good, but then when I check for duplicates > > >d_duplicated(dupl2$ID) > >summary(as.factor(d)) > FALSE TRUE > 6836 14711 > > Meaning that I am still getting 14,711 duplicates! > > I tried changing the ID type to integer and repeated the process but I got > dentical results....what am I missing?-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks to Alec Stevenson, Andy Liaw and Prof. Brian Ripley. I tried Alec's suggestion;>data[!duplicated(data$ID),] d_duplicated(dupl$ID) >summary(as.factor(d))FALSE 21547 #it worked! Thanks again!>From: "Alec Stephenson" <astephen at efs.mq.edu.au> >To: <gerifalte28 at hotmail.com>, <r-help at stat.math.ethz.ch> >Subject: Re: [R] Unique.data.frame...still getting duplicates >Date: Fri, 25 Jun 2004 12:45:26 +1000 > >data[!duplicated(data$ID),] >will do. Your unique(data[ID,]) removes duplicated rows in data[ID,], >assuming the object ID exists. > > > >Alec Stephenson >Department of Statistics >Macquarie University >NSW 2109, Australia > > >>> "F Z" <gerifalte28 at hotmail.com> 06/25/04 12:12pm >>> >Hi there > >I have a data frame with about 65,000 rows and 8 variables. I am >trying to >get rid of the double entries of a factor variable "ID" so I can get a > >unique observation for each ID > >I tried: > > >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 > >observations..so far so good, but then when I check for duplicates > > >d_duplicated(dupl2$ID) > >summary(as.factor(d)) >FALSE TRUE > 6836 14711 > >Meaning that I am still getting 14,711 duplicates! > >I tried changing the ID type to integer and repeated the process but I >got >dentical results....what am I missing? > >Thanks! > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! >http://www.R-project.org/posting-guide.html