thr3ads.net - R help - [R] subset [May 2006]

If this information is useful, please help other people find it:
Share via:

Guenther, Cameron

2006-May-16 18:37 UTC

[R] subset

Hello everyone,

I have a large dataset (x) with some rows that have duplicate variables
that I would like to remove.  I find which rows are the duplicates with
X1<-which(duplicated(x)).  That gives me the rows with duplicated
variables.  Now, how can I remove just those rose from the original data
frame.  I think I can create a new data frame without the duplicates
using subset.  I have tried:
Subset(x,!x1) and subset(x,!x[x1,])
I can't seem to find the correct syntax.  Any advice.
Thanks in advance

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
cameron.guenther at myfwc.com

Marc Schwartz (via MN)

2006-May-16 18:49 UTC

head link

[R] subset

On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron
wrote:> Hello everyone,
> 
> I have a large dataset (x) with some rows that have duplicate variables
> that I would like to remove.  I find which rows are the duplicates with
> X1<-which(duplicated(x)).  That gives me the rows with duplicated
> variables.  Now, how can I remove just those rose from the original data
> frame.  I think I can create a new data frame without the duplicates
> using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance
Even easier would be to use unique():

  NewDF < unique(x)

NewDF will contain rows from 'x' with duplicates removed.

See ?unique for more information.

unique(), which has a data.frame method, is basically:

  x[!duplicated(x), , drop = FALSE]

which covers the case where the result may contain a single row and
which remains a data frame.

Note that the above presumes that you want to test all columns in 'x'
for dups.

HTH,

Marc Schwartz

Guenther, Cameron

2006-May-16 18:54 UTC

head link

[R] subset

Marc, 
I have tried unique but unique looks at the entire row.  I have a data
set with a variable TRIPID.  The dataset has 469,000 rows.  In most
cases TRIPID is a unique value.  However, in some cases I have the same
TRIPID value but different values for other variables.  What this
amounts to is an data entry error.  I need to get rid of the repeated
rows that have the same TRIPID but different co-variables.  
Thanks for your help.
Cam 

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
cameron.guenther at myfwc.com
-----Original Message-----
From: Marc Schwartz (via MN) [mailto:mschwartz at mn.rr.com] 
Sent: Tuesday, May 16, 2006 2:50 PM
To: Guenther, Cameron
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] subset

On Tue, 2006-05-16 at 14:37 -0400, Guenther, Cameron
wrote:> Hello everyone,
> 
> I have a large dataset (x) with some rows that have duplicate 
> variables that I would like to remove.  I find which rows are the 
> duplicates with X1<-which(duplicated(x)).  That gives me the rows with
> duplicated variables.  Now, how can I remove just those rose from the 
> original data frame.  I think I can create a new data frame without 
> the duplicates using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance
Even easier would be to use unique():

  NewDF < unique(x)

NewDF will contain rows from 'x' with duplicates removed.

See ?unique for more information.

unique(), which has a data.frame method, is basically:

  x[!duplicated(x), , drop = FALSE]

which covers the case where the result may contain a single row and
which remains a data frame.

Note that the above presumes that you want to test all columns in 'x'
for dups.

HTH,

Marc Schwartz

Guenther, Cameron

2006-May-16 19:04 UTC

head link

[R] subset

Thanks Phil
That worked pergectly.

Cameron Guenther, Ph.D. 
Associate Research Scientist
FWC/FWRI, Marine Fisheries Research
100 8th Avenue S.E.
St. Petersburg, FL 33701
(727)896-8626 Ext. 4305
cameron.guenther at myfwc.com
-----Original Message-----
From: Phil Spector [mailto:spector at stat.Berkeley.EDU] 
Sent: Tuesday, May 16, 2006 3:01 PM
To: Guenther, Cameron
Subject: Re: [R] subset

Cameron -
    Is

      X1 = which(duplicated(x))
      x[-X1,]

or
      x[!duplicated(x),]

or
      subset(x,!duplicated(x))

what you're looking for?  Remember that which() will always return
indices, so negating them (with regards to subscripts) means making them
negative, not applying the not operator(!).  The not operator can only
be applied to logical values, like those returned by duplicated()

                                       - Phil Spector
					 Statistical Computing Facility
					 Department of Statistics
					 UC Berkeley
					 spector at stat.berkeley.edu

On Tue, 16 May 2006, Guenther, Cameron wrote:
> Hello everyone,
>
> I have a large dataset (x) with some rows that have duplicate 
> variables that I would like to remove.  I find which rows are the 
> duplicates with X1<-which(duplicated(x)).  That gives me the rows with
> duplicated variables.  Now, how can I remove just those rose from the 
> original data frame.  I think I can create a new data frame without 
> the duplicates using subset.  I have tried:
> Subset(x,!x1) and subset(x,!x[x1,])
> I can't seem to find the correct syntax.  Any advice.
> Thanks in advance
>
> Cameron Guenther, Ph.D.
> Associate Research Scientist
> FWC/FWRI, Marine Fisheries Research
> 100 8th Avenue S.E.
> St. Petersburg, FL 33701
> (727)896-8626 Ext. 4305
> cameron.guenther at myfwc.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - May 2006 - subset

[R] subset

[R] subset

[R] subset

[R] subset

Apparently Analagous Threads