I have a data frame of dim 3x600. There are pairs of rows which have the exact same value in column 3. head(df) POP1 POP2 ABSDIFF L0005.01 0.98484848 0.688118812 0.2967297 L0005.03 0.01515152 0.311881188 0.2967297 L0008.02 0.97727273 0.004424779 0.9728479 L0008.04 0.02272727 0.995575221 0.9728479 L0012.03 0.98684211 0.004385965 0.9824561 L0012.01 0.01315789 0.995614035 0.9824561 I want to unique sort on df$ABSDIFF so that only one row per pair remains in the subset.>df_subset <- df[df(!duplicated(df$ABSDIFF), ]This does not work. So I literally checked:>identical(df[1,3], df[2,3])FALSE How is 0.2967297 different from 0.2967297? I am puzzled. Thanks for any insight. Vikram [[alternative HTML version deleted]]
Try all.equal(df[1,3], df[2,3]) This relates to how decimal numbers are stored in computers. It is not an R only issue, but it is described in the R-FAQ:>From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html7.31 Why doesn't R think these numbers are equal? The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example R> a <- sqrt(2) R> a * a == 2 [1] FALSE R> a * a - 2 [1] 4.440892e-16 The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will need to consider error propagation carefully. For more information, see e.g. David Goldberg (1991), "What Every Computer Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys, 23/1, 5-48, also available via http://www.validlab.com/goldberg/paper.pdf. To quote from "The Elements of Programming Style" by Kernighan and Plauger: 10.0 times 0.1 is hardly ever 1.0. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram Chhatre Sent: Tuesday, April 14, 2015 2:40 PM To: r-help Subject: [R] Extracting unique entries by a column I have a data frame of dim 3x600. There are pairs of rows which have the exact same value in column 3. head(df) POP1 POP2 ABSDIFF L0005.01 0.98484848 0.688118812 0.2967297 L0005.03 0.01515152 0.311881188 0.2967297 L0008.02 0.97727273 0.004424779 0.9728479 L0008.04 0.02272727 0.995575221 0.9728479 L0012.03 0.98684211 0.004385965 0.9824561 L0012.01 0.01315789 0.995614035 0.9824561 I want to unique sort on df$ABSDIFF so that only one row per pair remains in the subset.>df_subset <- df[df(!duplicated(df$ABSDIFF), ]This does not work. So I literally checked:>identical(df[1,3], df[2,3])FALSE How is 0.2967297 different from 0.2967297? I am puzzled. Thanks for any insight. Vikram [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi David, Thanks. That was enlightening. Whoop. V On Tue, Apr 14, 2015 at 3:53 PM, David L Carlson <dcarlson at tamu.edu> wrote:> Try all.equal(df[1,3], df[2,3]) > > This relates to how decimal numbers are stored in computers. It is not an > R only issue, but it is described in the R-FAQ: > > From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html > > 7.31 Why doesn't R think these numbers are equal? > > The only numbers that can be represented exactly in R's numeric type are > integers and fractions whose denominator is a power of 2. Other numbers > have to be rounded to (typically) 53 binary digits accuracy. As a result, > two floating point numbers will not reliably be equal unless they have been > computed by the same algorithm, and not always even then. For example > > R> a <- sqrt(2) > R> a * a == 2 > [1] FALSE > R> a * a - 2 > [1] 4.440892e-16 > > The function all.equal() compares two objects using a numeric tolerance of > .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you > will need to consider error propagation carefully. > > For more information, see e.g. David Goldberg (1991), "What Every Computer > Scientist Should Know About Floating-Point Arithmetic", ACM Computing > Surveys, 23/1, 5-48, also available via > http://www.validlab.com/goldberg/paper.pdf. > > To quote from "The Elements of Programming Style" by Kernighan and Plauger: > > 10.0 times 0.1 is hardly ever 1.0. > > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram > Chhatre > Sent: Tuesday, April 14, 2015 2:40 PM > To: r-help > Subject: [R] Extracting unique entries by a column > > I have a data frame of dim 3x600. There are pairs of rows which have the > exact same value in column 3. > > head(df) > POP1 POP2 ABSDIFF > L0005.01 0.98484848 0.688118812 0.2967297 > L0005.03 0.01515152 0.311881188 0.2967297 > L0008.02 0.97727273 0.004424779 0.9728479 > L0008.04 0.02272727 0.995575221 0.9728479 > L0012.03 0.98684211 0.004385965 0.9824561 > L0012.01 0.01315789 0.995614035 0.9824561 > > I want to unique sort on df$ABSDIFF so that only one row per pair remains > in the subset. > > >df_subset <- df[df(!duplicated(df$ABSDIFF), ] > > This does not work. So I literally checked: > > >identical(df[1,3], df[2,3]) > FALSE > > How is 0.2967297 different from 0.2967297? I am puzzled. > > Thanks for any insight. > > Vikram > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
In the same way they would be different in any programming language. See R FAQ 7.31. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On April 14, 2015 12:39:50 PM PDT, Vikram Chhatre <crypticlineage at gmail.com> wrote:>I have a data frame of dim 3x600. There are pairs of rows which have >the >exact same value in column 3. > >head(df) > POP1 POP2 ABSDIFF >L0005.01 0.98484848 0.688118812 0.2967297 >L0005.03 0.01515152 0.311881188 0.2967297 >L0008.02 0.97727273 0.004424779 0.9728479 >L0008.04 0.02272727 0.995575221 0.9728479 >L0012.03 0.98684211 0.004385965 0.9824561 >L0012.01 0.01315789 0.995614035 0.9824561 > >I want to unique sort on df$ABSDIFF so that only one row per pair >remains >in the subset. > >>df_subset <- df[df(!duplicated(df$ABSDIFF), ] > >This does not work. So I literally checked: > >>identical(df[1,3], df[2,3]) >FALSE > >How is 0.2967297 different from 0.2967297? I am puzzled. > >Thanks for any insight. > >Vikram > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.