thr3ads.net - R help - [R] Extracting unique entries by a column [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Vikram Chhatre

2015-Apr-14 19:39 UTC

[R] Extracting unique entries by a column

I have a data frame of dim 3x600.  There are pairs of rows which have the
exact same value in column 3.

head(df)
                POP1         POP2   ABSDIFF
L0005.01 0.98484848 0.688118812 0.2967297
L0005.03 0.01515152 0.311881188 0.2967297
L0008.02 0.97727273 0.004424779 0.9728479
L0008.04 0.02272727 0.995575221 0.9728479
L0012.03 0.98684211 0.004385965 0.9824561
L0012.01 0.01315789 0.995614035 0.9824561

I want to unique sort on df$ABSDIFF so that only one row per pair remains
in the subset.
>df_subset <- df[df(!duplicated(df$ABSDIFF), ]
This does not work. So I literally checked:
>identical(df[1,3], df[2,3])FALSE

How is 0.2967297 different from 0.2967297?  I am puzzled.

Thanks for any insight.

Vikram

	[[alternative HTML version deleted]]

David L Carlson

2015-Apr-14 19:53 UTC

head link

[R] Extracting unique entries by a column

Try all.equal(df[1,3], df[2,3])

This relates to how decimal numbers are stored in computers. It is not an R only
issue, but it is described in the R-FAQ:
>From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html
7.31 Why doesn't R think these numbers are equal?

The only numbers that can be represented exactly in R's numeric type are
integers and fractions whose denominator is a power of 2. Other numbers have to
be rounded to (typically) 53 binary digits accuracy. As a result, two floating
point numbers will not reliably be equal unless they have been computed by the
same algorithm, and not always even then. For example

R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16

The function all.equal() compares two objects using a numeric tolerance of
.Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will
need to consider error propagation carefully.

For more information, see e.g. David Goldberg (1991), "What Every Computer
Scientist Should Know About Floating-Point Arithmetic", ACM Computing
Surveys, 23/1, 5-48, also available via
http://www.validlab.com/goldberg/paper.pdf.

To quote from "The Elements of Programming Style" by Kernighan and
Plauger:

    10.0 times 0.1 is hardly ever 1.0.


-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram
Chhatre
Sent: Tuesday, April 14, 2015 2:40 PM
To: r-help
Subject: [R] Extracting unique entries by a column

I have a data frame of dim 3x600.  There are pairs of rows which have the
exact same value in column 3.

head(df)
                POP1         POP2   ABSDIFF
L0005.01 0.98484848 0.688118812 0.2967297
L0005.03 0.01515152 0.311881188 0.2967297
L0008.02 0.97727273 0.004424779 0.9728479
L0008.04 0.02272727 0.995575221 0.9728479
L0012.03 0.98684211 0.004385965 0.9824561
L0012.01 0.01315789 0.995614035 0.9824561

I want to unique sort on df$ABSDIFF so that only one row per pair remains
in the subset.
>df_subset <- df[df(!duplicated(df$ABSDIFF), ]
This does not work. So I literally checked:
>identical(df[1,3], df[2,3])FALSE

How is 0.2967297 different from 0.2967297?  I am puzzled.

Thanks for any insight.

Vikram

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Vikram Chhatre

2015-Apr-14 20:32 UTC

head link

[R] Extracting unique entries by a column

Hi David,

Thanks.  That was enlightening.

Whoop.

V

On Tue, Apr 14, 2015 at 3:53 PM, David L Carlson <dcarlson at tamu.edu>
wrote:
> Try all.equal(df[1,3], df[2,3])
>
> This relates to how decimal numbers are stored in computers. It is not an
> R only issue, but it is described in the R-FAQ:
>
> From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html
>
> 7.31 Why doesn't R think these numbers are equal?
>
> The only numbers that can be represented exactly in R's numeric type
are
> integers and fractions whose denominator is a power of 2. Other numbers
> have to be rounded to (typically) 53 binary digits accuracy. As a result,
> two floating point numbers will not reliably be equal unless they have been
> computed by the same algorithm, and not always even then. For example
>
> R> a <- sqrt(2)
> R> a * a == 2
> [1] FALSE
> R> a * a - 2
> [1] 4.440892e-16
>
> The function all.equal() compares two objects using a numeric tolerance of
> .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you
> will need to consider error propagation carefully.
>
> For more information, see e.g. David Goldberg (1991), "What Every
Computer
> Scientist Should Know About Floating-Point Arithmetic", ACM Computing
> Surveys, 23/1, 5-48, also available via
> http://www.validlab.com/goldberg/paper.pdf.
>
> To quote from "The Elements of Programming Style" by Kernighan
and Plauger:
>
>     10.0 times 0.1 is hardly ever 1.0.
>
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram
> Chhatre
> Sent: Tuesday, April 14, 2015 2:40 PM
> To: r-help
> Subject: [R] Extracting unique entries by a column
>
> I have a data frame of dim 3x600.  There are pairs of rows which have the
> exact same value in column 3.
>
> head(df)
>                 POP1         POP2   ABSDIFF
> L0005.01 0.98484848 0.688118812 0.2967297
> L0005.03 0.01515152 0.311881188 0.2967297
> L0008.02 0.97727273 0.004424779 0.9728479
> L0008.04 0.02272727 0.995575221 0.9728479
> L0012.03 0.98684211 0.004385965 0.9824561
> L0012.01 0.01315789 0.995614035 0.9824561
>
> I want to unique sort on df$ABSDIFF so that only one row per pair remains
> in the subset.
>
> >df_subset <- df[df(!duplicated(df$ABSDIFF), ]
>
> This does not work. So I literally checked:
>
> >identical(df[1,3], df[2,3])
> FALSE
>
> How is 0.2967297 different from 0.2967297?  I am puzzled.
>
> Thanks for any insight.
>
> Vikram
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2015-Apr-14 21:26 UTC

head link

[R] Extracting unique entries by a column

In the same way they would be different in any programming language. See R FAQ
7.31.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On April 14, 2015 12:39:50 PM PDT, Vikram Chhatre <crypticlineage at
gmail.com> wrote:>I have a data frame of dim 3x600.  There are pairs of rows which have
>the
>exact same value in column 3.
>
>head(df)
>                POP1         POP2   ABSDIFF
>L0005.01 0.98484848 0.688118812 0.2967297
>L0005.03 0.01515152 0.311881188 0.2967297
>L0008.02 0.97727273 0.004424779 0.9728479
>L0008.04 0.02272727 0.995575221 0.9728479
>L0012.03 0.98684211 0.004385965 0.9824561
>L0012.01 0.01315789 0.995614035 0.9824561
>
>I want to unique sort on df$ABSDIFF so that only one row per pair
>remains
>in the subset.
>
>>df_subset <- df[df(!duplicated(df$ABSDIFF), ]
>
>This does not work. So I literally checked:
>
>>identical(df[1,3], df[2,3])
>FALSE
>
>How is 0.2967297 different from 0.2967297?  I am puzzled.
>
>Thanks for any insight.
>
>Vikram
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Apr 2015 - Extracting unique entries by a column

[R] Extracting unique entries by a column

[R] Extracting unique entries by a column

[R] Extracting unique entries by a column

[R] Extracting unique entries by a column