Hi, I am trying to calculate a distance matrix on a binary data frame using dist.binary() {ade4}. This is the code I run and the error I get:> sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) /(a+b+c+d) Error in if (any(df < 0)) stop("non negative value expected in df") : missing value where TRUE/FALSE needed I don't know if the problem are the missing values in my data. If so how can I handle them? Thank you, Marc. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
Marc Moragues <Marc.Moragues <at> scri.ac.uk> writes:> > Hi, > > I am trying to calculate a distance matrix on a binary data frame using > dist.binary() {ade4}. This is the code I run and the error I get: > > > sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) / > (a+b+c+d) > Error in if (any(df < 0)) stop("non negative value expected in df") : > missing value where TRUE/FALSE needed > > I don't know if the problem are the missing values in my data. If so how > can I handle them? >Dear Marc Moragues, At least adding NA to a data.frame gave the same error message as you report above. Odds are good for NA being responsible (but we cannot know: we only guess). Further, it seems that ade4:::dist.binary does not have an option to handle NA input. Problem here is that what do you think should be done with NA? Should you get a NA result? Should the whole observation be removed because of NA? Or should the comparisons be based on pairwise omissions of NA meaning that index entries are based on different data in the same matrix? Or should you impute some values for missing entries (which is fun but tricky)? One solution is to use function designdist in vegan where you can with some acrobary design your own dissimilarity indices. Function designdist uses different notations, because its author hates that misleading and dangerous 2x2 contingency table notation. The following, however, seems to define the same index as ade4: designdist(data, "sqrt(1-(2*J+P-A-B)/P)") See the documentation of vegan:::designdist to see how to define things there (and the sqrt(1-x) part comes from the way ade4 changes similarities to dissimilarities). BTW, don't call your data 'data'. R wisdom (see fortunes) tells you that you do not call your dog dog, but I'm not quite sure of this. At least in yesterdays horse races in national betting, one of the winner horses was called 'Animal', so why not... cheers, jari oksanen
On Thu, 2008-01-10 at 10:48 +0000, Marc Moragues wrote:> Hi, > > I am trying to calculate a distance matrix on a binary data frame using > dist.binary() {ade4}. This is the code I run and the error I get: > > > sjlc.dist <- dist.binary(as.data.frame(data), method=2) #D = (a+d) / > (a+b+c+d) > Error in if (any(df < 0)) stop("non negative value expected in df") : > missing value where TRUE/FALSE needed > > I don't know if the problem are the missing values in my data. If so how > can I handle them?Marc, Take a look at distance in package analogue and method = "mixed" which implements Gower's general dissimilarity coefficient for mixed data. It can deal quite happily with binary data and where there is missing-ness. Binary data are handled through a simple matching coefficient, 1 if variable i present in both samples, 0 otherwise, and then summed over all variables i. You should probably read up on how the missing-ness is handled with this method and what properties the resulting dissimilarity has. Note that distance() outputs full dissimilarity matrices. To get something to plug into functions that require a dist object, just use as.dist() on the output from distance(). HTH G> > Thank you, > Marc. > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views > expressed by the sender are not necessarily the views of SCRI and its > subsidiaries. This email and any files transmitted with it are confidential > to the intended recipient at the e-mail address to which it has been > addressed. It may not be disclosed or used by any other than that addressee. > If you are not the intended recipient you are requested to preserve this > confidentiality and you must not use, disclose, copy, print or rely on this > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are > present in this email, neither the Institute nor the sender accepts any > responsibility for any viruses, and it is your responsibility to scan the email > and the attachments (if any). > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%