Have you considered the dbscan function in library fpc, or was it another
one?
dbscan in fpc doesn't have a "distance" parameter but several
options, one
of which may resolve your memory problem (look up the documentation of the
"memory" parameter).
Using a distance matrix for hundreds of thousands of points is a recipe
for disaster (memory-wise). I'm not sure whether the function that
you used did that, but dbscan in fpc can avoid it.
It is true that dbscan requires tuning constants that the user has to
provide. There is unfortunately no general rule how to do this; it would
be necessary to understand the method and the meaning of the constants,
and how this translates into the requirements of your application.
You may try several different choices and do some cluster validation to
see what works, but I can't explain this in general terms easily via
email.
Hope this helps at least a bit.
Best regards,
Christian
On Fri, 3 Jun 2011, Paco Pastor wrote:
> Hello everyone,
>
> When looking for information about clustering of spatial data in R I was
> directed towards DBSCAN. I've read some docs about it and theb new
> questions have arisen.
>
> DBSCAN requires some parameters, one of them is "distance". As my
data
> are three dimensional, longitude, latitude and temperature, which
> "distance" should I use? which dimension is related to that
distance? I
> suposse it should be temperature. How do I find such minimum distance
> with R?
>
> Another parameter is the minimum number of points neded to form a
> cluster. Is there any method to find that number? Unfortunately I
> haven't found.
>
> Searching thorugh Google I could not find an R example for using dbscan
> in a dataset similar to mine, do you know any website with such kind of
> examples? So I can read and try to adapt to my case.
>
> The last question is that my first R attempt with DBSCAN (without a
> proper answer to the prior questions) resulted in a memory problem. R
> says it can not allocate vector. I start with a 4 km spaced grid with
> 779191 points that ends in approximately 300000 rows x 3 columns
> (latitude, longitude and temperature) when removing not valid SST
> points. Any hint to address this memory problem. Does it depend on my
> computer or in DBSCAN itself?
>
> Thanks for the patience to read a long and probably boring message and
> for your help.
>
> --
> -----------
> Francisco Pastor
> Meteorology department, Instituto Universitario CEAM-UMH
> http://www.ceam.es
> -----------
> mail: paco at ceam.es
> skype: paco.pastor.guzman
> Researcher ID: http://www.researcherid.com/rid/B-8331-2008
> Cosis profile: http://www.cosis.net/profile/francisco.pastor
> -----------
> Parque Tecnologico, C/ Charles R. Darwin, 14
> 46980 PATERNA (Valencia), Spain
> Tlf. 96 131 82 27 - Fax. 96 131 81 90
>
>
> ---------------------------------------------------------------------
> Este mensaje y los ficheros anexos son confidenciales. Los mismos
> contienen informaci?n reservada de la empresa que no puede ser
> difundida. Si usted ha recibido este correo por error, tenga la
> amabilidad de eliminarlo de su sistema y avisar al remitente mediante
> reenv?o a su direcci?n electr?nica; no deber? copiar el mensaje ni
> divulgar su contenido a ninguna persona.
>
> Su direcci?n de correo electr?nico junto a sus datos personales forman
> parte de un fichero titularidad de la Fundaci?n de la Comunidad
> Valenciana Centro de Estudios Ambientales del Mediterr?neo - CEAM, con
> CIF: G-46957213, cuya finalidad es la de mantener el contacto con Ud. De
> acuerdo con la Ley Org?nica 15/1999, usted puede ejercitar sus derechos
> de acceso, rectificaci?n, cancelaci?n y, en su caso, oposici?n enviando
> una solicitud por escrito, acompa?ada de una fotocopia de su DNI
> dirigida a: Fundaci?n de la Comunidad Valenciana Centro de Estudios
> Ambientales del Mediterr?neo - CEAM. C/ Charles R. Darwin, 14. Parque
> Tecnol?gico.46980 PATERNA (Valencia).
>
> This message and the attached files are confidential. They contain
> reserved information belonging to our centre and are not to be
> broadcast. If you have received this email by mistake, please delete it
> from your system and alert the sender by returning it to his/her email
> address. You must not copy or divulge the contents of the message to
> anyone.
>
> Your email address and personal data are included in a file belonging to
> the Fundaci?n de la Comunidad Valenciana Centro de Estudios Ambientales
> del Mediterr?neo - CEAM, con CIF: G-46957213. The purpose of this file
> is to allow us to keep in contact with you. In accordance with Organic
> Law 15/1999, you are permitted to access, rectify, cancel or oppose the
> contents of this file by submitting a written request, accompanied by a
> photocopy of your DNI, to: Fundaci?n de la Comunidad Valenciana Centro
> de Estudios Ambientales del Mediterr?neo - CEAM. C/ Charles R. Darwin,
> 14. Parque Tecnol?gico.46980 PATERNA (Valencia).
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche