Displaying 20 results from an estimated 10000 matches similar to: "R on Large Data Sets (again)"
2010 Aug 12
2
R 64-bit and Revolution
Dear users,
The company where I work is considering getting a license for Revolution
Enterprise - Windows 64-bit. I'll appreciate for those familiar with the
product if can share your experiences with it? In particular, how does it
compare to the "free" version of R 64-bit?
Thanks in advance.
Regards,
Lars.
	[[alternative HTML version deleted]]
2009 Sep 11
3
Working with large matrix
Dear All,
I have large matrix (46000 x 11250). I would like to do the linear regression for each row. I wrote a simple function that has lm() and used apply(mat,1,func). The issue is that it takes ages to load the file and also to finish the lm. I am using LINUX 64 bit with 32G mem. Is there an elegant and fast way of completing this task?
Thanks in advance.
Kind regards,
Ezhil
2010 Jun 15
1
help biglm.big.matrix; problem with weights
Hello colleagues,
I  have tried to use the package biglm. I want to specify a
multivariate regression with a weight.
I have imported a large dataset with the library(bigmemory). I load
the library (biglm) and specified a regression with a weight. But I
get everytime a error message like ?object not found? or ?`weights'
must be a formula? or "error in eval(expr, envir, enclos)". I
2010 Feb 24
4
R Graphics into Latex
Hi,
I'm new in Latex and I'm trying to include an R chart into a Latex document.
This is what I'm doing:
1) In R: save the chart as a a Postcript in a folder C:/xxx/Density.eps
2) In Latex (using TexWorks on windows xp) :
In the preambule:
\documentclass[11pt]{article}
\usepackage{graphicx}
\begin{document}
blah..blah…blah
\begin{figure}
\centering
2010 Dec 31
4
Sweave for "big" data analysis
Hi,
Maybe I'm missing the point here...but let's suppose you are working with
"large" data sets and using functions that take a significant amount of time
to run in R. I woulnd't like to run these functions every time I call
Sweave("myfile.Rnw") within R. What is the "common" practice to use Sweave
in these situations. I would just run the function once,
2009 Mar 17
3
Non-Linear Optimization - Query
Dear All,
I couple of weeks ago, I’ve asked for a package recommendation for nonlinear
optimization. In my problem I have a fairly complicated non-linear objective
function subject to one non-linear equality constrain.
I’ve been suggested to use the *Rdonlp2* package, but I did not get any
results after running the program for 5 hrs. Is it normal to run this type
of programs for hours? Also,
2009 Feb 25
3
Using very large matrix
Dear friends,
I have to use a very large matrix. Something of the sort of 
matrix(80000,80000,n) .... where n is something numeric of the sort 0.xxxxxx
I have not found a way of doing it. I keep getting the error
Error in matrix(nrow = 80000, ncol = 80000, 0.2) : too many elements specified
Any suggestions? I have searched the mailing list, but to no avail. 
Best,
-- 
Corrado Topi
Global
2011 May 21
3
Simple R Question...
Let's say I have the data frame 'dd' below. I'd like to select one
column from this data frame (say 'a') and keep its name in the
resulting data frame. That can be done as in #2. However, what if I
want to make my selection based on a vector of names (and again keep
those names in the resulting data frame). My attempt is #4 but doesn't
work.
dd <- data.frame(a =
2011 Jul 30
3
Problem with effects package
Dear List,
Several times I use this package I get the error message shown below.
When I work out simple examples, it turns out to be fine, but when
working with real and moderate size data sets I always get the same
error.
Do you know what could be the cause of the problem?
Error in apply(mod.matrix[, components], 1, prod) :
  subscript out of bounds
Error in
2008 Jan 31
3
Memory problem?
Hello R users,
I am trying to run a cox model for the prediction of relapse of 80 cancer
tumors, taking into account the expression of 17000 genes. The data are
large and I retrieve an error:
"Cannot allocate vector of 2.4 Mb". I increase the memory.limit to 4000
(which is the largest supported by my computer) but I still retrieve the
error because of other big variables that I have in
2011 Apr 26
7
Second largest element from each matrix row
Hi,
I need to extract the second largest element from each row of a
matrix. Below is my solution, but I think there should be a more efficient
way to accomplish the same, or not?
 set.seed(1)
 a <- matrix(rnorm(9), 3 ,3)
 sec.large <- as.vector(apply(a, 1, order, decreasing=T)[2,])
 ans <- sapply(1:length(sec.large), function(i) a[i, sec.large[i]])
 ans
Thanks in advance for your
2010 Dec 17
1
[Fwd: adding more columns in big.matrix object of bigmemory package]
Hi,
   With reference to the mail below, I have large datasets, coming from various
   different sources, which I can read into filebacked big.matrix using library
   bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I
   want to run regression using library 'biglm').
   I am unsuccessfully trying to do this from quite some time now. Can you
   please
2017 Sep 09
2
Avoid duplication in dplyr::summarise
Dear group,
Is there a way I could avoid the sort of duplication illustrated below?
i.e., I have the same dplyr::summarise function on different group_by
arguments. So I'd like to create a single summarise function that could be
applied to both. My attempt below fails.
df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 =
2013 Oct 12
2
Order of factors with facets in ggplot2
Hello,
I'd like to produce a ggplot where the order of factors within facets is
based on the average of another variable.
Here's a reproducible example. My problem is that the factors are ordered
similarly in both facets. I would like to have, within each facet of `f1',
boxplots for 'x' within each factor `f2', where the boxplots are ordered
based on the average of x
2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone,
I’m working with some very big datasets (each dataset has 11 million rows
and 2 columns). My first step is to merge all my individual data sets
together (I have about 20)
I’m using the following command from sqldf
               data1 <- sqldf("select A.*, B.* from A inner join B
using(ID)")
But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2011 Jul 16
2
(unclassified?) Help Question
Dear List,
I'd appreciate you guidance for obtaining the desired result shown
below, by combining tapply(x, g, mean) and g in the example.
Basically, I'm trying to create a vector whose values are based on the
result from tapply(x, g, mean)  but that follow the pattern and length
given by the factor g. Of course I'm looking for a generic solution
(i.e, not something that just work
2010 Feb 24
1
Sparse KMeans/KDE/Nearest Neighbors?
hi,
I have a dataset (the netflix dataset) which is basically ~18k columns and
well variable number of rows but let's assume 25 thousand for now. The
dataset is very sparse. I was wondering how to do kmeans/nearest neighbors
or kernel density estimation on it. 
I tired using the spMatrix function in "Matrix" package. I think I'm able to
create the matrix but as soon as I pass
2009 Jul 03
2
Error using the Rdonlp2 Package
Dear experts,
I'm attempting to solve a constrained optimization problem using the Rdonlp2
package.
I created a Lagrange function (L=f(x)-lambda(g(x)-c)), where x is a vector
of 16 parameters. This is what I'm using as objective function in the code
below. In addition, I set bounds on these parameters (par.u and par.l). When
I run the code, I get the error message shown below. Any idea
2009 Apr 21
4
My surprising experience in trying out REvolution's R
I care a lot about R's speed. So I decided to give REvolution's R
(http://revolution-computing.com/) a try, which bills itself as an
optimized R. Note that I used the free version.
My machine is a Intel core 2 duo under Windows XP professional. The code
I run is in the end of this post.
First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage
50%
Next, REvolution's R.
2017 Sep 09
0
Avoid duplication in dplyr::summarise
Hi Lars
I am not very sure what you really want. However, I am suggesting the
following code that enables (1) to obtain the full summary of your data and
(2) retrieve only mean of X values as function of factors f1 and f2.
library(tidyverse)
library(psych)
df <- data.frame(matrix(rnorm(40), 10, 4),
                 f1 = gl(3, 10, labels = letters[1:3]),
                 f2 = gl(3, 10, labels