Luca.Toldo@merck.de
2002-Jan-21 10:27 UTC
[R] [R-1.4.0] minimum spanning tree of large ontology
Dear all, I have an ontology which I would like to load in R to then compute the minimum distance between elements. Since this ontology is not only DAG, the problem is non trivial and therefore decided to use R with the powerful e1071 library (allShortestPaths) to do that. That method requires to provide a distance matrix between the various objects. My ontology contains 32768 objects and therefore I have to load a matrix of 32768x32768. Values in the matrix are set to 1 for those pair of terms that have a link (semantic or hyper/hypo-nimic). I am trying to reproduce at least part of the infrastructure described in the following work |--------------------------------------------------------------------------------| | 1. Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti and Joydeep Ghosh | | Evaluating the Novelty of Text-Mined Rules using Lexical Knowledge | | Submitted to the Seventh ACM SIGKDD International Conference on Knowledge | | Discovery and Data Mining (KDD-2001), 2001 | |--------------------------------------------------------------------------------| Unfortunately, when I try to do that I get into the reported problem (cannot allocate vector of length 1073741824). I am operating on an Irix Origin 2000 with 24 processors and 16 GB of shared memory. My process has no limits set to the amount of memory it can use. I have noticed in the mailing as well as in the FAQ that since version >1.2 the memory management has been changed and I got the information that somebody already successfully managed in loading a 2GB dataset in previous versions. I would be delighted to learn your opinion and instructions on how to proceed. Regards and compliments Dr. Luca Toldo Bioinformatician at Merck KGaA, Darmstadt -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
MCCANN, JACKSON
2002-Jan-21 14:27 UTC
[R] [R-1.4.0] minimum spanning tree of large ontology
Dr. Luca Toldo wrote >>Dear all, >I have an ontology which I would like to load in R to then compute the >minimum distance between elements. >Since this ontology is not only DAG, the problem is non trivial and >therefore decided to use R with the powerful e1071 >library (allShortestPaths) to do that. That method requires to provide a >distance matrix between the >various objects. My ontology contains 32768 objects and therefore I have >to load a matrix of 32768x32768.>Values in the matrix are set to 1 for those pair of terms that have a link >(semantic or hyper/hypo-nimic). >....I'm not sure that this is the most efficient approach to the problem, the Floyd-Warshall algorithm computes all paths between all nodes on a weighted graph, this is going to run in time O(n^3) and your n is fairly large here! Another consideration is the size of the result set, I'm not sure how the e1071 package works internally but looking at the return values they will be at least twice as large as the input matrix - another 8Gbyte. As described your graph appears to be unweighted, so the minimum path between any two nodes can be found with a breadth-first search, so finding the minimum distance between any two points would take O(n) time and building an all paths solutions O(n^2). I'd suggest you look for an adjacency list based graph package with a breadth first search as this allows a more efficient implementation. Regards Jackson McCann ************ This email and any accompanying documents are intended only for the named recipient, are confidential and may be privileged. If you are not the intended recipient please notify us immediately by mailto:admin at britannic.co.uk and you must not copy, disclose or otherwise use this message. Unauthorised use is strictly prohibited and may be unlawful. The content of this email represents the view of the individual and not the company. The company reserves the right to monitor the content of all emails in accordance with lawful business practice. Whilst attachments are virus checked before transmission, Britannic Assurance plc does not accept any liability in respect of any virus which is not detected. Britannic Assurance plc, No.3002 is registered in England and maintains its registered office at 1 Wythall Green Way, Wythall, Birmingham B47 6WG. Telephone: 0870 887 0001 Fax: 0870 887 0002 Website: www.britannicassurance.com Britannic Assurance plc, Britannic Unit Linked Assurance Limited, Britannic ISA Managers Limited and Britannic Unit Trust Managers Limited are regulated by the Financial Services Authority. Each of these companies is a member of the Britannic marketing group which only advises on and sells its own life assurance, pension, unit trust and ISA products. ************ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._