thr3ads.net - R help - [R] Unique.data.frame...still getting duplicates [Jun 2004]

If this information is useful, please help other people find it:
Share via:

F Z

2004-Jun-25 02:12 UTC

[R] Unique.data.frame...still getting duplicates

Hi there

I have a data frame with about 65,000 rows and 8 variables.  I am trying to 
get rid of the double entries of a factor variable "ID" so I can get a
unique observation for each ID

I tried:
>dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 
>observations..so far so good, but then when I check for duplicates
>d_duplicated(dupl2$ID)
>summary(as.factor(d))FALSE  TRUE
  6836 14711

Meaning that I am still getting 14,711 duplicates!

I tried changing the ID type to integer and repeated the process but I got 
dentical results....what am I missing?

Thanks!

Liaw, Andy

2004-Jun-25 02:31 UTC

head link

[R] Unique.data.frame...still getting duplicates

> From: F Z
> 
> Hi there
> 
> I have a data frame with about 65,000 rows and 8 variables.  
> I am trying to 
> get rid of the double entries of a factor variable "ID" so I 
> can get a 
> unique observation for each ID
> 
> I tried:
> 
> >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 
> >observations..so far so good, but then when I check for duplicates
> 
> >d_duplicated(dupl2$ID)
> >summary(as.factor(d))
> FALSE  TRUE
>   6836 14711
> 
> Meaning that I am still getting 14,711 duplicates!
> 
> I tried changing the ID type to integer and repeated the 
> process but I got 
> dentical results....what am I missing?
1.  Upgrade your version of R.  (That will teach you about using `_' for
assignment!)

2.  Call generics, not the methods; i.e., unique() instead of
unique.data.frame().

3.  You want a data frame where the IDs are unique, not the combination of
columns.  Use:

    dupl <- data[unique(ID),]

BTW, where did `dupl2' come from?

Andy
 > Thanks!
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Alec Stephenson

2004-Jun-25 02:45 UTC

head link

[R] Unique.data.frame...still getting duplicates

data[!duplicated(data$ID),] 
will do. Your unique(data[ID,]) removes duplicated rows in data[ID,],
assuming the object ID exists.



Alec Stephenson                                               
Department of Statistics
Macquarie University
NSW 2109, Australia 
>>> "F Z" <gerifalte28 at hotmail.com> 06/25/04 12:12pm
>>>Hi there

I have a data frame with about 65,000 rows and 8 variables.  I am
trying to 
get rid of the double entries of a factor variable "ID" so I can get a

unique observation for each ID

I tried:
>dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 
>observations..so far so good, but then when I check for duplicates
>d_duplicated(dupl2$ID)
>summary(as.factor(d))FALSE  TRUE
  6836 14711

Meaning that I am still getting 14,711 duplicates!

I tried changing the ID type to integer and repeated the process but I
got 
dentical results....what am I missing?

Thanks!

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Prof Brian Ripley

2004-Jun-25 06:13 UTC

head link

[R] Unique.data.frame...still getting duplicates

Your code cannot possibly work in a recent version of R, so please try the
current version (1.9.1).

data[ID, ] is what?  Why not just call unique() on ID?

BTW, if you call methods such as unique.data.frame you are adding possible 
course of error -- here I suspect data[ID, ] is not what you intend.
Please call the generic.

On Fri, 25 Jun 2004, F Z wrote:
> Hi there
> 
> I have a data frame with about 65,000 rows and 8 variables.  I am trying to
> get rid of the double entries of a factor variable "ID" so I can
get a
> unique observation for each ID
> 
> I tried:
> 
> >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547 
> >observations..so far so good, but then when I check for duplicates
> 
> >d_duplicated(dupl2$ID)
> >summary(as.factor(d))
> FALSE  TRUE
>   6836 14711
> 
> Meaning that I am still getting 14,711 duplicates!
> 
> I tried changing the ID type to integer and repeated the process but I got 
> dentical results....what am I missing?
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

F Z

2004-Jun-25 14:50 UTC

head link

[R] Unique.data.frame...still getting duplicates

Thanks to Alec Stevenson, Andy Liaw and Prof. Brian Ripley.  I tried Alec's 
suggestion;
>data[!duplicated(data$ID),] d_duplicated(dupl$ID)
>summary(as.factor(d))FALSE
21547 #it worked!

Thanks again!
>From: "Alec Stephenson" <astephen at efs.mq.edu.au>
>To: <gerifalte28 at hotmail.com>, <r-help at stat.math.ethz.ch>
>Subject: Re: [R] Unique.data.frame...still getting duplicates
>Date: Fri, 25 Jun 2004 12:45:26 +1000
>
>data[!duplicated(data$ID),]
>will do. Your unique(data[ID,]) removes duplicated rows in data[ID,],
>assuming the object ID exists.
>
>
>
>Alec Stephenson
>Department of Statistics
>Macquarie University
>NSW 2109, Australia
>
> >>> "F Z" <gerifalte28 at hotmail.com> 06/25/04
12:12pm >>>
>Hi there
>
>I have a data frame with about 65,000 rows and 8 variables.  I am
>trying to
>get rid of the double entries of a factor variable "ID" so I can
get a
>
>unique observation for each ID
>
>I tried:
>
> >dupl_unique.data.frame(data[ID,]) #I obtain a data frame with 21,547
> >observations..so far so good, but then when I check for duplicates
>
> >d_duplicated(dupl2$ID)
> >summary(as.factor(d))
>FALSE  TRUE
>   6836 14711
>
>Meaning that I am still getting 14,711 duplicates!
>
>I tried changing the ID type to integer and repeated the process but I
>got
>dentical results....what am I missing?
>
>Thanks!
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
>http://www.R-project.org/posting-guide.html

Apparently Analagous Threads

Search for more reasonably related threads

R help - Jun 2004 - Unique.data.frame...still getting duplicates

[R] Unique.data.frame...still getting duplicates

[R] Unique.data.frame...still getting duplicates

[R] Unique.data.frame...still getting duplicates

[R] Unique.data.frame...still getting duplicates

[R] Unique.data.frame...still getting duplicates

Apparently Analagous Threads