thr3ads.net - similar to: "Large data sets with high dimensional fixed effects"

Displaying 20 results from an estimated 10000 matches similar to: "Large data sets with high dimensional fixed effects"

merging and working with BIG data sets. Is sqldf the best way??

2010 Oct 12

merging and working with BIG data sets. Is sqldf the best way??

Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets

Prediction with two fixed-effects - large number of IDs

2017 Jun 17

Prediction with two fixed-effects - large number of IDs

I have no direct experience with such horrific models, but your formula is a mess and Google suggests the biglm package with ffdf. Specifically, you should convert your discrete variables to factors before you build the model, particularly since you want to use predict after the fact, for which you will need a new data set with the exact same levels in the factors. Also, your use of I() is

functions for high dimensional integral

2008 Jun 06

functions for high dimensional integral

I need to compute a high dimensional integral. Currently I'm using the function adapt in R package adapt. But this method is kind of slow to me. I'm wondering if there are other solutions. Thanks. Zhongwen -- View this message in context: http://www.nabble.com/functions-for-high-dimensional-integral-tp17702978p17702978.html Sent from the R help mailing list archive at Nabble.com.

high-dimensional contingency table

2010 May 24

high-dimensional contingency table

Dear Friends. I am just starting to use R. And in this occasion I want to construct a high-dimensional contingency table, because I want to crate a mosaic plot with the vcd package. My table is in this format: año ac.rep cat.gru conteos 1 2005 R parejas 253 2 2005 N parejas 23 3 2006 R parejas 347 4 2006 N parejas 39 5 2007 R

New Package: geozoo. High-Dimensional Geometric Objects

2008 Feb 28

New Package: geozoo. High-Dimensional Geometric Objects

Dear useRs, I'd like to announce a new package called geozoo, short for geometric zoo. It's a compilation of functions to produce high-dimensional geometric objects, including hypercubes and hyperspheres, Boy's surface, the hyper torus and a selection of polytopes. For a complete list, as well as images and movies, visit

Question about the high dimensional density estimation

2003 Nov 01

Question about the high dimensional density estimation

Hi, I found that the R package "KernSmooth" can deal with only 1D and 2D data. But now I have a collection of 4-dimensional data (x1,x2,x3,x4) and would like to estimate the "mode" of the underlying density. What can I do about it ? Thanks a lot. -- Ying-Chao Hung Assistant Professor Graduate Institute of Statistics National Central University Chung-Li, Taiwan TEL:

high dimensional convex hull

2001 Dec 10

high dimensional convex hull

Does anyone know of a R package that will determine the convex hull of a high-dimensional dataset (say 4-10 dimensions). I know chull works for 2D data. I'm neophyte to R and convex hulls so please keep it simple. Many thanks Ben -- Ben Stapley. Biomolecular Sciences, UMIST, PO Box 88, Manchester M60 1QD. Tel 0161 200 5818 Fax 0161 236 0409

How to create a high-dimensional matrix

2012 Oct 16

How to create a high-dimensional matrix

Hi, everyone I need to create a 429497 x 429497 matrix. When I use *matrix(0,429497,429497)* I got the error information : Error in matrix(0, 429497, 429497) : too many elements specified Then I use "ff" package, try to store this matrix on disk * x<-ff(0,dim=c(429497,429497)* And I got the error information : Error in if (length < 0 || length > .Machine$integer.max)

What is the best package for large data cleaning (not statistical analysis)?

2009 Mar 15

What is the best package for large data cleaning (not statistical analysis)?

Dear R helpers: I am a newbie to R and have a question related to cleaning large data frames in R. So far, I have been using SAS for data cleaning because my data sets are relatively large (handling multiple files, each could be as large as 5-10 G). I am not a fan of SAS at all and am eager to move data cleaning tasks into R completely. Seems to me, there are 3 options. Using SQL, ff or

biglm: how it handles large data set?

2010 Oct 31

biglm: how it handles large data set?

I am trying to figure out why 'biglm' can handle large data set... According to the R document - "biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory." After reading the source code below? I still could not figure out how 'update'

Large vector support in data.frames

2024 Jul 04

Large vector support in data.frames

Ivan, Simon, Thanks for the replies. I can work around the limitation. I currently either divide the data into shards or use a list with (long) vectors depending on what I am trying to do. But I have to transform between the two representations which takes time and memory and often need more code than I would have if I could have used data.frames. Being able to create large (> 2^31-1

Re : Large database help

2006 May 17

Re : Large database help

Thanks for doing this Thomas, I have been thinking about what it would take to do this, but if it were left to me, it would have taken a lot longer. Back in the 80's there was a statistical package called RUMMAGE that did all computations based on sufficient statistics and did not keep the actual data in memory. Memory for computers became cheap before datasets turned huge so there

Prediction with two fixed-effects - large number of IDs

2017 Jun 17

Prediction with two fixed-effects - large number of IDs

Dear all, I am running a panel regression with time and location fixed effects: ### reg1 <- lm(lny ~ factor(id) + factor(year) + x1+ I(x1)^2 + x2+ I(x2)^2 , data=mydata, na.action="na.omit") ### My goal is to use the estimation for prediction. However, I have 8,500 IDs, which is resulting in very slow computation. Ideally, I would like to do the following: ### reg2 <-

Help with R package forecast

2012 Mar 23

Help with R package forecast

When I type library() to see what is installed the following list in RED comes up. Packages in library '/home/jason/R/i686-pc-linux-gnu-library/2.13': abind Combine multi-dimensional arrays aplpack Another Plot PACKage: stem.leaf, bagplot, faces, spin3R, and some slider functions biglm bounded memory linear and

Two-Dimensional Hashes through links?

2006 May 18

Two-Dimensional Hashes through links?

I''m really hitting a wall here. My program has a search engine, returns a list of results. I am using a two-dimensional hash to pass the form data back to my controller. (i.e. params[:job] => {:description => "xxx", :location => "xxx", company => "xxx"}) I use that Job object to search my database, and then wait for user input. Now when the

Using large datasets: can I overload the subscript operator?

2007 Mar 09

Using large datasets: can I overload the subscript operator?

Hello, I do some computations on datasets that come from climate models. These data are huge arrays, significantly larger than typically available RAM, so they have to be accessed row-by-row, or rather slice-by slice, depending on the task. I would like to make an R package to easily access such datasets within R. The C++ backend is ready and being used under Windows/.Net/Visual Basic, but I have

Analysis of 3-dimensional spatial point patterns

2007 Mar 12

Analysis of 3-dimensional spatial point patterns

I am trying to determine how to evaluate homogeneity of points in three-dimensional space. In two-dimensional data, I have used functions available in the Spatial package and I've have looked into the spatstat package but, as far as I can tell, neither appears to handle 3-dimensional data. Is there another version, package, or software that does the same type (G-function,

R-alpha: is.vector of one-dimensional array

1997 Dec 05

R-alpha: is.vector of one-dimensional array

maybe we've already diskussed this before, but Kurt and I can't remember ... is.vector() of an one-dimensional array returns FALSE. this is also the behavior of Splus, but totally counter-intuitive for me ... IMO an array of dimension 1 is exactly the definition of a vector ... it also breaks our current plot.factor, which is simply a barplot(table(x)) table() returns an

How would you export a 3-dimensional array to an SQL database?

2006 Jul 19

How would you export a 3-dimensional array to an SQL database?

Hello, How would you export a 3-dimensional array to an SQL database? a<-array(1:24, 2:4) Is there an open source DB that would be more adequate for this type of operation? Is there a way to reshape/flatten a 3-dimensional array? Regards, Pierre Lapointe ************************************************** AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}}

Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays

2007 Jan 19

Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays

Hi Everyone, Given the scenario I have, I was wondering if anyone would be able to give me a hind on how to get the results from hubers() in a more efficient way. I have an outcome on an array [N x S x D]. I also have a factor (levels 1,2,3) stored on a matrix N x S. My objective is to get "mu" and "sigma" for each of the N rows (outcome) stratified by the factor

similar to: Large data sets with high dimensional fixed effects