Displaying 20 results from an estimated 2000 matches similar to: "Handling large dataset & dataframe [Broadcast]"
2007 Aug 16
4
Linear models over large datasets
I'd like to fit linear models on very large datasets. My data frames
are about 2000000 rows x 200 columns of doubles and I am using an 64
bit build of R. I've googled about this extensively and went over the
"R Data Import/Export" guide. My primary issue is although my data
represented in ascii form is 4Gb in size (therefore much smaller
considered in binary), R consumes about
2009 Mar 31
1
error during DPpackage compilation
Dear All,
I've had trouble compiling DPpackage as a user in one system. It works fine
as root in other machines.
I can see any clues in error messages My guess is that it is a permissions
matter.
Any help is appreciated.
OS: Linux
Kernel: 2.6.27 SMP
Arch: Intel 64 bits
gfortran not available
Thank you.
----------------------><8-------------------------------------
g77 ? -fpic ?-g
2006 Apr 24
6
Handling large dataset & dataframe
Hi,
I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 columns 250 are dummy variable columns. I am trying to read this data set into R dataframe object but unable to do it due to memory size limitations (object size created is too large to handle in R). Is there a way to handle such a large dataset in R.
My PC has 1GB of RAM, and 55 GB harddisk space running
2012 Apr 23
0
linear model benchmarking
I cleaned up my old benchmarking code and added checks for missing
data to compare various ways of finding OLS regression coefficients.
I thought I would share this for others. the long and short of it is
that I would recommend
ols.crossprod = function (y, x) {
x <- as.matrix(x)
ok <- (!is.na(y))&(!is.na(rowSums(x)))
y <- y[ok]; x
2008 Jun 12
0
[LLVMdev] code generation order revisited.
On Jun 12, 2008, at 11:38, Hendrik Boom wrote:
> On Tue, 06 May 2008 16:06:35 -0400, Gordon Henriksen wrote:
>
>> On 2008-05-06, at 13:42, Hendrik Boom wrote:
>>
>>> One more question. I hope you're not getting tired of me already.
>>> Does generating LLVM code have to proceed in any particular order?
>>>
>>> Of course, if I am writing
2009 Mar 12
4
stats lm() function
Hi,
Im using the lm() function where the formula is quite big (300 arguments)
and the data is a frame of 3000 values.
This is running in a loop where in each step the formula is reduced by one
argument, and the lm command is called again (to check which arguments are
useful) .
This takes 1-2 minutes.
Is there a way to speed this up?
i checked the code of the lm function and its seems that its
2006 Nov 05
2
solution to a regression with multiple independent variable
Please forgive a statistics question.
I know that a simple bivariate linear regression, y=f(x) or in R
parlance lm(y~x) can be solved using the variance-covariance matrix:
beta(x)=covariance(x,y)/variance(x). I also know that a linear
regression with multiple independent variables, for example y=f(x,z)
can also be solved using the variance-covariance matrix, but I don't
know how to do this.
2008 Mar 24
6
vlookup in R
Hi,
Is there are function similar to excel vlookup in R. Please let me know.
Thanks,
Sachin
____________________________________________________________________________________
[[alternative HTML version deleted]]
2006 May 23
5
conditional replacement
Hi
How can do this in R.
>df
48
1
35
32
80
If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I have a large dataset so I cant afford to identify indexes and then replace.
Desired o/p:
48
30
35
32
60
Thanx in advance.
Sachin
2008 May 13
2
Plotting Frequency Distribution in R
Hi,
How can plot a frequency distribution curve for the following data.
V1 V2
1 1 160.54%
2 1 201.59%
3 1 18.45%
4 1 179.03%
5 1 274.37%
6 1 0.00%
7 1 24.52%
8 1 39.17%
9 3 43.72%
10 1 53.06%
11 1 64.97%
12 1 79.84%
13 1 98.08%
14 1 115.32%
15 1 127.96%
16 1 155.38%
17 1 157.25%
18 1 193.17%
19 1 51.53%
20 15 99.32%
21 1 106.86%
22 1 219.44%
2007 Oct 03
1
inverse of matrix made by low.tri function
Hi all,
I am using R trying to get a inverse matrix of (X^T)X , but I keep getting
the error
message like: no b argument and no default value for sprintf(gettext(fmt,
domain = domain), ...) .
--------------------------------------------------------------------------------------------
# my code
X<-Matrix(rep(1,500),100,5)
X[lower.tri(X)]<-1-10^-7
XtX<- t(X)%*% X
XtXu<-lu(XtX)
2011 Aug 16
2
generalized inverse using matinv (Design)
i am trying to use matinv from the Design package
to compute the generalized inverse of the normal equations
of a 3x3 design via the sweep operator.
That is, for the linear model
y = ? + x1 + x2 + x1*x2
where x1, x2 are 3-level factors and dummy coding is being used
the matrix to be inverted is
X'X =
9 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1
3 3 0 0 1 1 1 1 0 0 1 0 0 1 0 0
3 0 3 0 1 1 1 0 1 0 0 1
2006 Jul 14
2
Recreate new dataframe based on condition
Hi,
How can I achieve this in R. Dataset is as follows:
>df
x
1 2
2 4
3 1
4 3
5 3
6 2
structure(list(x = c(2, 4, 1, 3, 3, 2)), .Names = "x", row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
I want to recreate a new data frame whose rows are sum of (1&2, 3&4, 5&6)
2006 Apr 21
3
Creat new column based on condition
Hi,
How can I accomplish this task in R?
V1
10
20
30
10
10
20
Create a new column V2 such that:
If V1 = 10 then V2 = 4
If V1 = 20 then V2 = 6
V1 = 30 then V2 = 10
So the O/P looks like this
V1 V2
10 4
20 6
30 10
10 4
10 4
20 6
Thanks in advance.
Sachin
2005 Aug 24
1
lm.ridge
Hello, I have posted this mail a few days ago but I did it wrong, I hope
is right now:
I have the following doubts related with lm.ridge, from MASS package. To
show the problem using the Longley example, I have the following doubts:
First: I think coefficients from lm(Employed~.,data=longley) should be
equal coefficients from lm.ridge(Employed~.,data=longley, lambda=0) why
it does not happen?
2012 Mar 22
3
Recommendations regarding textbooks
Hello
I was hoping to get some advice regarding teaching R in an academic environment.
What are the best choices with respect to textbooks?
When this question was asked a few years back, people were primarily
recommending ?Modern Applied Statistics with S? and ?Introductory
Statistics with R? as two good choices. I?ve also heard some good
thinks regarding ?An R Companion to Applied
2010 Feb 22
9
Couldn't find Order with ID=pending_orders
I have a Controller named Orders which has a pending_orders method
which is expected to fetch some records from the database.
If i dont write a route for this method, I get the following error
when i call this method.
Couldn''t find Order with ID=pending_orders
I am using rails 2.3.5, in the previous versions i use to get this
I am not getting whether its new version requirement...
Help
2006 Apr 20
2
Conditional Row Sum
Hi,
How can I accomplish this in R. Example:
R1 R2
3 101
4 102
3 102
18 102
11 101
I want to find Sum(101) = 14 - i.e SUM(R1) where R2 = 101
Sum(102) = 25 - SUM(R2) where R2 = 102
TIA
Sachin
---------------------------------
[[alternative HTML version deleted]]
2006 Jun 26
2
write.table & csv help
Hi,
How can I produce the following output in .csv format using write.table function.
for(i in seq(1:2))
{
df <- rnorm(4, mean=0, sd=1)
write.table(df,"C:/output.csv", append = TRUE, quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE)
}
Current O/p:
x 0.287816 -0.81803 -0.15231 -0.25849 x 2.26831 0.863174
2009 Mar 25
3
very fast OLS regression?
Dear R experts:
I just tried some simple test that told me that hand computing the OLS
coefficients is about 3-10 times as fast as using the built-in lm()
function. (code included below.) Most of the time, I do not care,
because I like the convenience, and I presume some of the time goes
into saving a lot of stuff that I may or may not need. But when I do
want to learn the properties of an