Dear R users I wondered if any of you ever tried to calculate distance matrix with very large data set, and if anyone out there can confirm this error message I got actually mean that my data is too large for this task. negative length vectors are not allowed My data size and code used dim(mydata_nor)[1] 365000 144> d <- dist(mydata_nor, method = "euclidean") Here my data has 1000 samples each has a year data observed by 10 minutes interval daily, so the size is (365* 1000) * 144. I checked the manual of function 'dist' but can not see the upper limit size allowed, and I bet there should be one, so any hints is appreciated. I would also be grateful if any other method for calculating distance matrix for large dataset could be advised. I appreciate reproducible code should be provided for your advice, so try below if needed: A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000 144> d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") : negative length vectors are not allowed Many thanks in advance! HJ [[alternative HTML version deleted]]
Here's the result on R 3.0.0 64 bit under Windows 8:> A<-matrix(1:365000*144,nrow=365000,ncol=144) > dim(A)[1] 365000 144> d <- dist(mydata_nor, method = "euclidean")Error in as.matrix(x) : object 'mydata_nor' not found> d <- dist(A, method = "euclidean")Error: cannot allocate vector of size 496.3 Gb In addition: Warning messages: 1: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 2: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 3: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) 4: In dist(A, method = "euclidean") : Reached total allocation of 8078Mb: see help(memory.size) Your message suggests that your system could not accurately compute the requirements. Unless you have access to a computer with 500 gigabytes, you need to consider alternate approaches such as aggregating the data into longer time blocks or using kmeans. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of HJ YAN Sent: Thursday, May 2, 2013 6:02 PM To: r-help at r-project.org Subject: [R] Calculating distance matrix for large dataset Dear R users I wondered if any of you ever tried to calculate distance matrix with very large data set, and if anyone out there can confirm this error message I got actually mean that my data is too large for this task. negative length vectors are not allowed My data size and code used dim(mydata_nor)[1] 365000 144> d <- dist(mydata_nor, method "euclidean") Here my data has 1000 samples each has a year data observed by 10 minutes interval daily, so the size is (365* 1000) * 144. I checked the manual of function 'dist' but can not see the upper limit size allowed, and I bet there should be one, so any hints is appreciated. I would also be grateful if any other method for calculating distance matrix for large dataset could be advised. I appreciate reproducible code should be provided for your advice, so try below if needed: A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000 144> d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") : negative length vectors are not allowed Many thanks in advance! HJ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I have a version that uses bigmemory on my blog, but looks at distance on a sphere for a 36k * 36K matrix not hundreds of Gb so I dont know if the approach will work for you http://stevemosher.wordpress.com/2012/04/12/nick-stokes-distance-code-now-with-big-memory/ Steve However, I never tested it with On May 2, 2013 9:40 PM, "HJ YAN" <yhj204@googlemail.com> wrote:> Dear R users > > > I wondered if any of you ever tried to calculate distance matrix with very > large data set, and if anyone out there can confirm this error message I > got actually mean that my data is too large for this task. > > negative length vectors are not allowed > > > My data size and code used > > dim(mydata_nor)[1] 365000 144> d <- dist(mydata_nor, method > "euclidean") > > > > Here my data has 1000 samples each has a year data observed by 10 minutes > interval daily, so the size is (365* 1000) * 144. > > > I checked the manual of function 'dist' but can not see the upper limit > size allowed, and I bet there should be one, so any hints is appreciated. > > > I would also be grateful if any other method for calculating distance > matrix for large dataset could be advised. > > > > I appreciate reproducible code should be provided for your advice, so try > below if needed: > > A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000 144> > d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") : > negative length vectors are not allowed > > > > > Many thanks in advance! > > HJ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Seemingly Similar Threads
- stats 'dist' euclidean distance calculation
- for loop question Documentation and its application for calculating euclidean distance on MDS ordination axis scores
- Calculating the distance samples using distance metics method
- Euclidean distance function
- relative euclidean distance