Paul Lemmens
2003-May-28 06:33 UTC
[R] Numbers that look equal, should be equal, but if() doesn't see as equal (repost with code included)
Hi! Apologies for sending the mail without any code. Apparently somewhere along the way the .R attachments got filtered out. I have included the code below as clean as possible. My original mail is below the code. Thank you again for your time. regards, Paul vincentize <- function(data, bins) { if ( length(data) < 2 ) { stop("The data is really short. Is that ok?"); } if ( bins < 2 ) { stop("A number of bins smaller than 2 just really isn't useful"); } if ( bins > length(data) ) { stop("This is really unusual, although perhaps possible. If your eally know what you're doing, maybe you should disable this check!?."); } ret <- c(); for ( i in 1:length(data)) { rt <- data[i]; b <- 0; while ( b < bins ) { ret <- c(ret, rt); b <- b+1; } } ret; } binify <- function(data, bins, n) { if ( bins < 2 ) { stop("Number of bins is smaller than 2. Nothing to split, exiting."); } if ( length(data) < 2 ) { stop("The length of the data is really short. Is that ok?"); } if ( bins * n != length(data) ) { stop("Cannot construct bins of equal length."); } t(array(data, c(n,bins))); } mean.bins <- function(data) { # For the vincentizing procedures in vincentize() and binify(), # it made sense to check the data array/vector/matrix. Here, # we now just need to check that data is a matrix. if ( !is.matrix(data) ) { stop("The data is not in matrix form."); } means <- c(); bins <- dim(data)[1]; for (i in 1:bins) { means <- c(means, mean(data[i,])); } # return a vector of means. means; } bins.factor <- function(data, bins) { if ( !is.data.frame(data) ) { stop("data is not a data frame."); } source('Ratcliff.r', local=TRUE); subject.bin.means <- c(); attach(data); l <- levels(Cond); for ( i in 1:length(l) ) { cat("Calculating bins for factor level ", l[i], ".\n", sep=""); flush.console(); data <- RT[Cond == l[i]]; data <- sort(data); n <- length(data); data.vincent <- vincentize(data,bins); data.vincent.bins <- binify(data.vincent, bins, n); bin.means <- mean.bins(data.vincent.bins); # FAILING TEST. mean.orig <- mean(data); mean.b <- mean(bin.means); if ( mean.b != mean.orig ) { #cat("mean.b\n", str(mean.b), "mean.orig\n", str(mean.orig)); flush.console; detach(data); stop("Something went wrong calculating the bins: means do not equal."); } subject.bin.means <- c(subject.bin.means, bin.means); } detach(data); if ( !length(subject.bin.means) == bins*length(l) ) { stop("Inappropriate number of means calculated."); } else { subject.bin.means } } ---------- Forwarded Message ---------- Date: dinsdag 27 mei 2003 14:53 +0200 From: Paul Lemmens <P.Lemmens at nici.kun.nl> To: r-help at stat.math.ethz.ch Subject: [R] Numbers that look equal, should be equal, but if() doesn't see as equal Hi! After a lot of testing and debugging I'm falling silent in figuring out what goes wrong in the following. I'm implementing the Vincentizing procedure that Ratcliff (1979) described. It's about calculating RT bins for any distribution of RT data. It boils down to rank ordering your data, replicating each data point as many times as you need bins and then splitting up the resulting distribution in equal bins. The code that I've written is attached (and not included because it is considerable in length due to many comments). Ratcliff.r contains some basic functions and distribution.bins.r contains the problematic function bins.factor() (problem area marked with 'FAILING TEST'). The final attached file is the mock up distribution I made. The failing test is the check if the mean of the mean RT's for each bin equals the mean of the original distribution. These should/are mathematically equivalent. Sometimes, however, the test fails. With the attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the means are mathematically equivalent IMHO it should not be an issue of this particular distribution. As a matter of fact, I also have tested some rnorm() distributions and my function also fails on those (albeit a little less often than with foobar.txt). Problem description: if one calculates the bins or bin means by hand, the mean of the bin means is visually the same as the overall mean, even with options(digits=20), but *still* the test fails. IMHO it's not my code and neither the distribution I use to test, but still, can you point out an obvious failure of my programming or is it indeed something of R that I don't yet grasp? thank you for your help, Paul -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /"\ Montessorilaan 3 (B.01.03) Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber +31-24-3612648 Fax +31-24-3616066 ---------- End Forwarded Message ---------- -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /"\ Montessorilaan 3 (B.01.03) Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber +31-24-3612648 Fax +31-24-3616066 -------------- next part -------------- Hi! After a lot of testing and debugging I'm falling silent in figuring out what goes wrong in the following. I'm implementing the Vincentizing procedure that Ratcliff (1979) described. It's about calculating RT bins for any distribution of RT data. It boils down to rank ordering your data, replicating each data point as many times as you need bins and then splitting up the resulting distribution in equal bins. The code that I've written is attached (and not included because it is considerable in length due to many comments). Ratcliff.r contains some basic functions and distribution.bins.r contains the problematic function bins.factor() (problem area marked with 'FAILING TEST'). The final attached file is the mock up distribution I made. The failing test is the check if the mean of the mean RT's for each bin equals the mean of the original distribution. These should/are mathematically equivalent. Sometimes, however, the test fails. With the attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the means are mathematically equivalent IMHO it should not be an issue of this particular distribution. As a matter of fact, I also have tested some rnorm() distributions and my function also fails on those (albeit a little less often than with foobar.txt). Problem description: if one calculates the bins or bin means by hand, the mean of the bin means is visually the same as the overall mean, even with options(digits=20), but *still* the test fails. IMHO it's not my code and neither the distribution I use to test, but still, can you point out an obvious failure of my programming or is it indeed something of R that I don't yet grasp? thank you for your help, Paul -- Paul Lemmens NICI, University of Nijmegen ASCII Ribbon Campaign /"\ Montessorilaan 3 (B.01.03) Against HTML Mail \ / NL-6525 HR Nijmegen X The Netherlands / \ Phonenumber +31-24-3612648 Fax +31-24-3616066 -------------- next part -------------- "RT" "Cond" "1" 1 "A" "2" 1 "A" "3" 1 "A" "4" 2 "A" "5" 2 "A" "6" 3 "A" "7" 3 "A" "8" 3 "A" "9" 3 "A" "10" 3 "A" "11" 4 "A" "12" 4 "A" "13" 4 "A" "14" 4 "A" "15" 5 "A" "16" 5 "A" "17" 5 "A" "18" 5 "A" "19" 5 "A" "20" 5 "A" "21" 5 "A" "22" 6 "A" "23" 6 "A" "24" 6 "A" "25" 6 "A" "26" 6 "A" "27" 6 "A" "28" 6 "A" "29" 6 "A" "30" 6 "A" "31" 7 "A" "32" 7 "A" "33" 7 "A" "34" 7 "A" "35" 8 "A" "36" 8 "A" "37" 8 "A" "38" 9 "A" "39" 9 "A" "40" 10 "A" "41" 2 "B" "42" 2 "B" "43" 2 "B" "44" 4 "B" "45" 4 "B" "46" 6 "B" "47" 6 "B" "48" 6 "B" "49" 6 "B" "50" 6 "B" "51" 8 "B" "52" 8 "B" "53" 8 "B" "54" 8 "B" "55" 10 "B" "56" 10 "B" "57" 10 "B" "58" 10 "B" "59" 10 "B" "60" 10 "B" "61" 10 "B" "62" 12 "B" "63" 12 "B" "64" 12 "B" "65" 12 "B" "66" 12 "B" "67" 12 "B" "68" 12 "B" "69" 12 "B" "70" 12 "B" "71" 14 "B" "72" 14 "B" "73" 14 "B" "74" 14 "B" "75" 16 "B" "76" 16 "B" "77" 16 "B" "78" 18 "B" "79" 18 "B" "80" 20 "B" "81" 3 "C" "82" 3 "C" "83" 3 "C" "84" 6 "C" "85" 6 "C" "86" 9 "C" "87" 9 "C" "88" 9 "C" "89" 9 "C" "90" 9 "C" "91" 12 "C" "92" 12 "C" "93" 12 "C" "94" 12 "C" "95" 15 "C" "96" 15 "C" "97" 15 "C" "98" 15 "C" "99" 15 "C" "100" 15 "C" "101" 15 "C" "102" 18 "C" "103" 18 "C" "104" 18 "C" "105" 18 "C" "106" 18 "C" "107" 18 "C" "108" 18 "C" "109" 18 "C" "110" 18 "C" "111" 21 "C" "112" 21 "C" "113" 21 "C" "114" 21 "C" "115" 24 "C" "116" 24 "C" "117" 24 "C" "118" 27 "C" "119" 27 "C" "120" 30 "C" -------------- next part -------------- ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Thomas Lumley
2003-May-28 14:16 UTC
[R] Numbers that look equal, should be equal, but if() doesn't see as equal (repost with code included)
On Wed, 28 May 2003, Paul Lemmens wrote:> Hi! > > Apologies for sending the mail without any code. Apparently somewhere along > the way the .R attachments got filtered out. I have included the code below > as clean as possible. My original mail is below the code.I still think you need not to be using ==. You want something like if ( abs(mean.b-mean.orig)/(epsilon+abs(mean.orig) < epsilon){ You are effectively using epsilon=0, but epsilon=10e-10 should be adequate. -thomas> Thank you again for your time. > regards, > Paul > > vincentize <- function(data, bins) > { > if ( length(data) < 2 ) > { > stop("The data is really short. Is that ok?"); > } > > if ( bins < 2 ) > { > stop("A number of bins smaller than 2 just really isn't useful"); > } > > if ( bins > length(data) ) > { > stop("This is really unusual, although perhaps possible. If your eally > know what you're doing, maybe you should disable this check!?."); > } > > ret <- c(); > for ( i in 1:length(data)) > { > rt <- data[i]; > b <- 0; > while ( b < bins ) > { > ret <- c(ret, rt); > b <- b+1; > } > } > > ret; > } > > > binify <- function(data, bins, n) > { > if ( bins < 2 ) > { > stop("Number of bins is smaller than 2. Nothing to split, exiting."); > } > > if ( length(data) < 2 ) > { > stop("The length of the data is really short. Is that ok?"); > } > > if ( bins * n != length(data) ) > { > stop("Cannot construct bins of equal length."); > } > > t(array(data, c(n,bins))); > } > > mean.bins <- function(data) > { > # For the vincentizing procedures in vincentize() and binify(), > # it made sense to check the data array/vector/matrix. Here, > # we now just need to check that data is a matrix. > if ( !is.matrix(data) ) > { > stop("The data is not in matrix form."); > } > > means <- c(); > bins <- dim(data)[1]; > for (i in 1:bins) > { > means <- c(means, mean(data[i,])); > } > > # return a vector of means. > means; > } > > bins.factor <- function(data, bins) > { > if ( !is.data.frame(data) ) > { > stop("data is not a data frame."); > } > > source('Ratcliff.r', local=TRUE); > subject.bin.means <- c(); > > attach(data); > l <- levels(Cond); > for ( i in 1:length(l) ) > { > cat("Calculating bins for factor level ", l[i], ".\n", sep=""); > flush.console(); > > data <- RT[Cond == l[i]]; > data <- sort(data); > > n <- length(data); > data.vincent <- vincentize(data,bins); > data.vincent.bins <- binify(data.vincent, bins, n); > bin.means <- mean.bins(data.vincent.bins); > > # FAILING TEST. > mean.orig <- mean(data); > mean.b <- mean(bin.means); > if ( mean.b != mean.orig ) > { > #cat("mean.b\n", str(mean.b), "mean.orig\n", str(mean.orig)); > flush.console; > detach(data); > stop("Something went wrong calculating the bins: means do not equal."); > } > subject.bin.means <- c(subject.bin.means, bin.means); > } > detach(data); > > if ( !length(subject.bin.means) == bins*length(l) ) > { > stop("Inappropriate number of means calculated."); > } > else > { > subject.bin.means > } > } > > ---------- Forwarded Message ---------- > Date: dinsdag 27 mei 2003 14:53 +0200 > From: Paul Lemmens <P.Lemmens at nici.kun.nl> > To: r-help at stat.math.ethz.ch > Subject: [R] Numbers that look equal, should be equal, but if() doesn't see > as equal > > Hi! > > After a lot of testing and debugging I'm falling silent in figuring out > what goes wrong in the following. > > I'm implementing the Vincentizing procedure that Ratcliff (1979) described. > It's about calculating RT bins for any distribution of RT data. It boils > down to rank ordering your data, replicating each data point as many times > as you need bins and then splitting up the resulting distribution in equal > bins. > > The code that I've written is attached (and not included because it is > considerable in length due to many comments). Ratcliff.r contains some > basic functions and distribution.bins.r contains the problematic function > bins.factor() (problem area marked with 'FAILING TEST'). The final attached > file is the mock up distribution I made. > > The failing test is the check if the mean of the mean RT's for each bin > equals the mean of the original distribution. These should/are > mathematically equivalent. Sometimes, however, the test fails. With the > attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the > means are mathematically equivalent IMHO it should not be an issue of this > particular distribution. As a matter of fact, I also have tested some > rnorm() distributions and my function also fails on those (albeit a little > less often than with foobar.txt). > > Problem description: if one calculates the bins or bin means by hand, the > mean of the bin means is visually the same as the overall mean, even with > options(digits=20), but *still* the test fails. > > IMHO it's not my code and neither the distribution I use to test, but > still, can you point out an obvious failure of my programming or is it > indeed something of R that I don't yet grasp? > > thank you for your help, > Paul > > > -- > Paul Lemmens > NICI, University of Nijmegen ASCII Ribbon Campaign /"\ > Montessorilaan 3 (B.01.03) Against HTML Mail \ / > NL-6525 HR Nijmegen X > The Netherlands / \ > Phonenumber +31-24-3612648 > Fax +31-24-3616066 > > > ---------- End Forwarded Message ---------- > > > > > -- > Paul Lemmens > NICI, University of Nijmegen ASCII Ribbon Campaign /"\ > Montessorilaan 3 (B.01.03) Against HTML Mail \ / > NL-6525 HR Nijmegen X > The Netherlands / \ > Phonenumber +31-24-3612648 > Fax +31-24-3616066 > >Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle ^^^^^^^^^^^^^^^^^^^^^^^^ - NOTE NEW EMAIL ADDRESS