similar to: Large data sets with high dimensional fixed effects

Displaying 20 results from an estimated 10000 matches similar to: "Large data sets with high dimensional fixed effects"

2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2017 Jun 17
0
Prediction with two fixed-effects - large number of IDs
I have no direct experience with such horrific models, but your formula is a mess and Google suggests the biglm package with ffdf. Specifically, you should convert your discrete variables to factors before you build the model, particularly since you want to use predict after the fact, for which you will need a new data set with the exact same levels in the factors. Also, your use of I() is
2008 Jun 06
1
functions for high dimensional integral
I need to compute a high dimensional integral. Currently I'm using the function adapt in R package adapt. But this method is kind of slow to me. I'm wondering if there are other solutions. Thanks. Zhongwen -- View this message in context: http://www.nabble.com/functions-for-high-dimensional-integral-tp17702978p17702978.html Sent from the R help mailing list archive at Nabble.com.
2010 May 24
1
high-dimensional contingency table
Dear Friends. I am just starting to use R. And in this occasion I want to construct a high-dimensional contingency table, because I want to crate a mosaic plot with the vcd package. My table is in this format: año ac.rep cat.gru conteos 1 2005 R parejas 253 2 2005 N parejas 23 3 2006 R parejas 347 4 2006 N parejas 39 5 2007 R
2008 Feb 28
0
New Package: geozoo. High-Dimensional Geometric Objects
Dear useRs, I'd like to announce a new package called geozoo, short for geometric zoo. It's a compilation of functions to produce high-dimensional geometric objects, including hypercubes and hyperspheres, Boy's surface, the hyper torus and a selection of polytopes. For a complete list, as well as images and movies, visit
2003 Nov 01
2
Question about the high dimensional density estimation
Hi, I found that the R package "KernSmooth" can deal with only 1D and 2D data. But now I have a collection of 4-dimensional data (x1,x2,x3,x4) and would like to estimate the "mode" of the underlying density. What can I do about it ? Thanks a lot. -- Ying-Chao Hung Assistant Professor Graduate Institute of Statistics National Central University Chung-Li, Taiwan TEL:
2001 Dec 10
1
high dimensional convex hull
Does anyone know of a R package that will determine the convex hull of a high-dimensional dataset (say 4-10 dimensions). I know chull works for 2D data. I'm neophyte to R and convex hulls so please keep it simple. Many thanks Ben -- Ben Stapley. Biomolecular Sciences, UMIST, PO Box 88, Manchester M60 1QD. Tel 0161 200 5818 Fax 0161 236 0409
2012 Oct 16
1
How to create a high-dimensional matrix
Hi, everyone I need to create a 429497 x 429497 matrix. When I use *matrix(0,429497,429497)* I got the error information : Error in matrix(0, 429497, 429497) : too many elements specified Then I use "ff" package, try to store this matrix on disk * x<-ff(0,dim=c(429497,429497)* And I got the error information : Error in if (length < 0 || length > .Machine$integer.max)
2009 Mar 15
1
What is the best package for large data cleaning (not statistical analysis)?
Dear R helpers: I am a newbie to R and have a question related to cleaning large data frames in R. So far, I have been using SAS for data cleaning because my data sets are relatively large (handling multiple files, each could be as large as 5-10 G). I am not a fan of SAS at all and am eager to move data cleaning tasks into R completely. Seems to me, there are 3 options. Using SQL, ff or
2010 Oct 31
1
biglm: how it handles large data set?
I am trying to figure out why 'biglm' can handle large data set... According to the R document - "biglm creates a linear model object that uses only p^2 memory for p variables. It can be updated with more data using update. This allows linear regression on data sets larger than memory." After reading the source code below? I still could not figure out how 'update'
2024 Jul 04
1
Large vector support in data.frames
Ivan, Simon, Thanks for the replies. I can work around the limitation. I currently either divide the data into shards or use a list with (long) vectors depending on what I am trying to do. But I have to transform between the two representations which takes time and memory and often need more code than I would have if I could have used data.frames. Being able to create large (> 2^31-1
2006 May 17
1
Re : Large database help
Thanks for doing this Thomas, I have been thinking about what it would take to do this, but if it were left to me, it would have taken a lot longer. Back in the 80's there was a statistical package called RUMMAGE that did all computations based on sufficient statistics and did not keep the actual data in memory. Memory for computers became cheap before datasets turned huge so there
2017 Jun 17
3
Prediction with two fixed-effects - large number of IDs
Dear all, I am running a panel regression with time and location fixed effects: ### reg1 <- lm(lny ~ factor(id) + factor(year) + x1+ I(x1)^2 + x2+ I(x2)^2 , data=mydata, na.action="na.omit") ### My goal is to use the estimation for prediction. However, I have 8,500 IDs, which is resulting in very slow computation. Ideally, I would like to do the following: ### reg2 <-
2012 Mar 23
2
Help with R package forecast
When I type library() to see what is installed the following list in RED comes up. Packages in library '/home/jason/R/i686-pc-linux-gnu-library/2.13': abind Combine multi-dimensional arrays aplpack Another Plot PACKage: stem.leaf, bagplot, faces, spin3R, and some slider functions biglm bounded memory linear and
2006 May 18
3
Two-Dimensional Hashes through links?
I''m really hitting a wall here. My program has a search engine, returns a list of results. I am using a two-dimensional hash to pass the form data back to my controller. (i.e. params[:job] => {:description => "xxx", :location => "xxx", company => "xxx"}) I use that Job object to search my database, and then wait for user input. Now when the
2007 Mar 09
4
Using large datasets: can I overload the subscript operator?
Hello, I do some computations on datasets that come from climate models. These data are huge arrays, significantly larger than typically available RAM, so they have to be accessed row-by-row, or rather slice-by slice, depending on the task. I would like to make an R package to easily access such datasets within R. The C++ backend is ready and being used under Windows/.Net/Visual Basic, but I have
2007 Mar 12
1
Analysis of 3-dimensional spatial point patterns
I am trying to determine how to evaluate homogeneity of points in three-dimensional space. In two-dimensional data, I have used functions available in the Spatial package and I've have looked into the spatstat package but, as far as I can tell, neither appears to handle 3-dimensional data. Is there another version, package, or software that does the same type (G-function,
1997 Dec 05
1
R-alpha: is.vector of one-dimensional array
maybe we've already diskussed this before, but Kurt and I can't remember ... is.vector() of an one-dimensional array returns FALSE. this is also the behavior of Splus, but totally counter-intuitive for me ... IMO an array of dimension 1 is exactly the definition of a vector ... it also breaks our current plot.factor, which is simply a barplot(table(x)) table() returns an
2006 Jul 19
1
How would you export a 3-dimensional array to an SQL database?
Hello, How would you export a 3-dimensional array to an SQL database? a<-array(1:24, 2:4) Is there an open source DB that would be more adequate for this type of operation? Is there a way to reshape/flatten a 3-dimensional array? Regards, Pierre Lapointe ************************************************** AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}}
2007 Jan 19
1
Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays
Hi Everyone, Given the scenario I have, I was wondering if anyone would be able to give me a hind on how to get the results from hubers() in a more efficient way. I have an outcome on an array [N x S x D]. I also have a factor (levels 1,2,3) stored on a matrix N x S. My objective is to get "mu" and "sigma" for each of the N rows (outcome) stratified by the factor