Displaying 20 results from an estimated 10000 matches similar to: "Large data sets with high dimensional fixed effects"
2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone,
I’m working with some very big datasets (each dataset has 11 million rows
and 2 columns). My first step is to merge all my individual data sets
together (I have about 20)
I’m using the following command from sqldf
data1 <- sqldf("select A.*, B.* from A inner join B
using(ID)")
But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2017 Jun 17
0
Prediction with two fixed-effects - large number of IDs
I have no direct experience with such horrific models, but your formula is a mess and Google suggests the biglm package with ffdf.
Specifically, you should convert your discrete variables to factors before you build the model, particularly since you want to use predict after the fact, for which you will need a new data set with the exact same levels in the factors.
Also, your use of I() is
2008 Jun 06
1
functions for high dimensional integral
I need to compute a high dimensional integral. Currently I'm using the
function adapt in R package adapt. But this method is kind of slow to me.
I'm wondering if there are other solutions. Thanks.
Zhongwen
--
View this message in context: http://www.nabble.com/functions-for-high-dimensional-integral-tp17702978p17702978.html
Sent from the R help mailing list archive at Nabble.com.
2010 May 24
1
high-dimensional contingency table
Dear Friends.
I am just starting to use R. And in this occasion I want to construct a
high-dimensional contingency table, because I want to crate a mosaic plot
with the vcd package.
My table is in this format:
año ac.rep cat.gru conteos
1 2005 R parejas 253
2 2005 N parejas 23
3 2006 R parejas 347
4 2006 N parejas 39
5 2007 R
2008 Feb 28
0
New Package: geozoo. High-Dimensional Geometric Objects
Dear useRs,
I'd like to announce a new package called geozoo, short for geometric
zoo. It's a compilation of functions to produce high-dimensional
geometric objects, including hypercubes and hyperspheres, Boy's
surface, the hyper torus and a selection of polytopes. For a complete
list, as well as images and movies, visit
2003 Nov 01
2
Question about the high dimensional density estimation
Hi,
I found that the R package "KernSmooth" can deal with only 1D and 2D data. But now I have a collection of 4-dimensional data (x1,x2,x3,x4) and would like to estimate the "mode" of the underlying density. What can I do about it ?
Thanks a lot.
--
Ying-Chao Hung
Assistant Professor
Graduate Institute of Statistics
National Central University
Chung-Li, Taiwan
TEL:
2001 Dec 10
1
high dimensional convex hull
Does anyone know of a R package that will determine the convex hull of a
high-dimensional dataset (say 4-10 dimensions). I know chull works for
2D data.
I'm neophyte to R and convex hulls so please keep it simple.
Many thanks
Ben
--
Ben Stapley.
Biomolecular Sciences, UMIST, PO Box 88, Manchester M60 1QD.
Tel 0161 200 5818
Fax 0161 236 0409
2012 Oct 16
1
How to create a high-dimensional matrix
Hi, everyone
I need to create a 429497 x 429497 matrix.
When I use
*matrix(0,429497,429497)*
I got the error information : Error in matrix(0, 429497, 429497) : too many
elements specified
Then I use "ff" package, try to store this matrix on disk
* x<-ff(0,dim=c(429497,429497)*
And I got the error information :
Error in if (length < 0 || length > .Machine$integer.max)
2009 Mar 15
1
What is the best package for large data cleaning (not statistical analysis)?
Dear R helpers:
I am a newbie to R and have a question related to cleaning large data frames
in R.
So far, I have been using SAS for data cleaning because my data sets are
relatively large (handling multiple files, each could be as large as 5-10
G).
I am not a fan of SAS at all and am eager to move data cleaning tasks into R
completely.
Seems to me, there are 3 options. Using SQL, ff or
2010 Oct 31
1
biglm: how it handles large data set?
I am trying to figure out why 'biglm' can handle large data set...
According to the R document - "biglm creates a linear model object that uses
only p^2 memory for p variables. It can be updated with more data using
update. This allows linear regression on data sets larger than memory."
After reading the source code below? I still could not figure out how
'update'
2024 Jul 04
1
Large vector support in data.frames
Ivan, Simon,
Thanks for the replies.
I can work around the limitation. I currently either divide the data
into shards or use a list with (long) vectors depending on what I am
trying to do. But I have to transform between the two representations
which takes time and memory and often need more code than I would have
if I could have used data.frames.
Being able to create large (> 2^31-1
2006 May 17
1
Re : Large database help
Thanks for doing this Thomas, I have been thinking about what it would
take to do this, but if it were left to me, it would have taken a lot
longer.
Back in the 80's there was a statistical package called RUMMAGE that did
all computations based on sufficient statistics and did not keep the
actual data in memory. Memory for computers became cheap before
datasets turned huge so there
2017 Jun 17
3
Prediction with two fixed-effects - large number of IDs
Dear all,
I am running a panel regression with time and location fixed effects:
###
reg1 <- lm(lny ~ factor(id) + factor(year) + x1+ I(x1)^2 + x2+ I(x2)^2 ,
data=mydata, na.action="na.omit")
###
My goal is to use the estimation for prediction. However, I have 8,500 IDs,
which is resulting in very slow computation. Ideally, I would like to do
the following:
###
reg2 <-
2012 Mar 23
2
Help with R package forecast
When I type library() to see what is installed the following list in RED
comes up.
Packages in library '/home/jason/R/i686-pc-linux-gnu-library/2.13':
abind Combine multi-dimensional arrays
aplpack Another Plot PACKage: stem.leaf, bagplot,
faces, spin3R, and some slider functions
biglm bounded memory linear and
2006 May 18
3
Two-Dimensional Hashes through links?
I''m really hitting a wall here. My program has a search engine, returns
a list of results. I am using a two-dimensional hash to pass the form
data back to my controller. (i.e. params[:job] => {:description =>
"xxx", :location => "xxx", company => "xxx"}) I use that Job object to
search my database, and then wait for user input.
Now when the
2007 Mar 09
4
Using large datasets: can I overload the subscript operator?
Hello,
I do some computations on datasets that come from climate models. These data
are huge arrays, significantly larger than typically available RAM, so they
have to be accessed row-by-row, or rather slice-by slice, depending on the
task. I would like to make an R package to easily access such datasets
within R. The C++ backend is ready and being used under Windows/.Net/Visual
Basic, but I have
2007 Mar 12
1
Analysis of 3-dimensional spatial point patterns
I am trying to determine how to evaluate homogeneity of points in
three-dimensional space.
In two-dimensional data, I have used functions available in the Spatial
package
and I've have looked into the spatstat package
but, as far as I can tell, neither appears to handle 3-dimensional
data.
Is there another version, package, or software that does the same type
(G-function,
1997 Dec 05
1
R-alpha: is.vector of one-dimensional array
maybe we've already diskussed this before, but Kurt and I can't
remember ...
is.vector() of an one-dimensional array returns FALSE. this is also the
behavior of Splus, but totally counter-intuitive for me ... IMO an
array of dimension 1 is exactly the definition of a vector ...
it also breaks our current plot.factor, which is simply a
barplot(table(x))
table() returns an
2006 Jul 19
1
How would you export a 3-dimensional array to an SQL database?
Hello,
How would you export a 3-dimensional array to an SQL database?
a<-array(1:24, 2:4)
Is there an open source DB that would be more adequate for this type of
operation?
Is there a way to reshape/flatten a 3-dimensional array?
Regards,
Pierre Lapointe
**************************************************
AVIS DE NON-RESPONSABILITE: Ce document transmis par courrie...{{dropped}}
2007 Jan 19
1
Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays
Hi Everyone,
Given the scenario I have, I was wondering if anyone would be able to
give me a hind on how to get the results from hubers() in a more
efficient way.
I have an outcome on an array [N x S x D].
I also have a factor (levels 1,2,3) stored on a matrix N x S.
My objective is to get "mu" and "sigma" for each of the N rows
(outcome) stratified by the factor