thr3ads.net - similar to: "R on Large Data Sets (again)"

Displaying 20 results from an estimated 10000 matches similar to: "R on Large Data Sets (again)"

2010 Aug 12

R 64-bit and Revolution

Dear users, The company where I work is considering getting a license for Revolution Enterprise - Windows 64-bit. I'll appreciate for those familiar with the product if can share your experiences with it? In particular, how does it compare to the "free" version of R 64-bit? Thanks in advance. Regards, Lars. [[alternative HTML version deleted]]

Working with large matrix

2009 Sep 11

Working with large matrix

Dear All, I have large matrix (46000 x 11250). I would like to do the linear regression for each row. I wrote a simple function that has lm() and used apply(mat,1,func). The issue is that it takes ages to load the file and also to finish the lm. I am using LINUX 64 bit with 32G mem. Is there an elegant and fast way of completing this task? Thanks in advance. Kind regards, Ezhil

help biglm.big.matrix; problem with weights

2010 Jun 15

help biglm.big.matrix; problem with weights

Hello colleagues, I have tried to use the package biglm. I want to specify a multivariate regression with a weight. I have imported a large dataset with the library(bigmemory). I load the library (biglm) and specified a regression with a weight. But I get everytime a error message like ?object not found? or ?`weights' must be a formula? or "error in eval(expr, envir, enclos)". I

R Graphics into Latex‏

2010 Feb 24

R Graphics into Latex‏

Hi, I'm new in Latex and I'm trying to include an R chart into a Latex document. This is what I'm doing: 1) In R: save the chart as a a Postcript in a folder C:/xxx/Density.eps 2) In Latex (using TexWorks on windows xp) : In the preambule: \documentclass[11pt]{article} \usepackage{graphicx} \begin{document} blah..blah…blah \begin{figure} \centering

Sweave for "big" data analysis

2010 Dec 31

Sweave for "big" data analysis

Hi, Maybe I'm missing the point here...but let's suppose you are working with "large" data sets and using functions that take a significant amount of time to run in R. I woulnd't like to run these functions every time I call Sweave("myfile.Rnw") within R. What is the "common" practice to use Sweave in these situations. I would just run the function once,

Non-Linear Optimization - Query

2009 Mar 17

Non-Linear Optimization - Query

Dear All, I couple of weeks ago, I’ve asked for a package recommendation for nonlinear optimization. In my problem I have a fairly complicated non-linear objective function subject to one non-linear equality constrain. I’ve been suggested to use the *Rdonlp2* package, but I did not get any results after running the program for 5 hrs. Is it normal to run this type of programs for hours? Also,

Using very large matrix

2009 Feb 25

Using very large matrix

Dear friends, I have to use a very large matrix. Something of the sort of matrix(80000,80000,n) .... where n is something numeric of the sort 0.xxxxxx I have not found a way of doing it. I keep getting the error Error in matrix(nrow = 80000, ncol = 80000, 0.2) : too many elements specified Any suggestions? I have searched the mailing list, but to no avail. Best, -- Corrado Topi Global

Simple R Question...

2011 May 21

Simple R Question...

Let's say I have the data frame 'dd' below. I'd like to select one column from this data frame (say 'a') and keep its name in the resulting data frame. That can be done as in #2. However, what if I want to make my selection based on a vector of names (and again keep those names in the resulting data frame). My attempt is #4 but doesn't work. dd <- data.frame(a =

Problem with effects package

2011 Jul 30

Problem with effects package

Dear List, Several times I use this package I get the error message shown below. When I work out simple examples, it turns out to be fine, but when working with real and moderate size data sets I always get the same error. Do you know what could be the cause of the problem? Error in apply(mod.matrix[, components], 1, prod) : subscript out of bounds Error in

Memory problem?

2008 Jan 31

Memory problem?

Hello R users, I am trying to run a cox model for the prediction of relapse of 80 cancer tumors, taking into account the expression of 17000 genes. The data are large and I retrieve an error: "Cannot allocate vector of 2.4 Mb". I increase the memory.limit to 4000 (which is the largest supported by my computer) but I still retrieve the error because of other big variables that I have in

Second largest element from each matrix row

2011 Apr 26

Second largest element from each matrix row

Hi, I need to extract the second largest element from each row of a matrix. Below is my solution, but I think there should be a more efficient way to accomplish the same, or not? set.seed(1) a <- matrix(rnorm(9), 3 ,3) sec.large <- as.vector(apply(a, 1, order, decreasing=T)[2,]) ans <- sapply(1:length(sec.large), function(i) a[i, sec.large[i]]) ans Thanks in advance for your

[Fwd: adding more columns in big.matrix object of bigmemory package]

2010 Dec 17

[Fwd: adding more columns in big.matrix object of bigmemory package]

Hi, With reference to the mail below, I have large datasets, coming from various different sources, which I can read into filebacked big.matrix using library bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I want to run regression using library 'biglm'). I am unsuccessfully trying to do this from quite some time now. Can you please

Avoid duplication in dplyr::summarise

2017 Sep 09

Avoid duplication in dplyr::summarise

Dear group, Is there a way I could avoid the sort of duplication illustrated below? i.e., I have the same dplyr::summarise function on different group_by arguments. So I'd like to create a single summarise function that could be applied to both. My attempt below fails. df <- data.frame(matrix(rnorm(40), 10, 4), f1 = gl(3, 10, labels = letters[1:3]), f2 =

Order of factors with facets in ggplot2

2013 Oct 12

Order of factors with facets in ggplot2

Hello, I'd like to produce a ggplot where the order of factors within facets is based on the average of another variable. Here's a reproducible example. My problem is that the factors are ordered similarly in both facets. I would like to have, within each facet of `f1', boxplots for 'x' within each factor `f2', where the boxplots are ordered based on the average of x

merging and working with BIG data sets. Is sqldf the best way??

2010 Oct 12

merging and working with BIG data sets. Is sqldf the best way??

Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets

(unclassified?) Help Question

2011 Jul 16

(unclassified?) Help Question

Dear List, I'd appreciate you guidance for obtaining the desired result shown below, by combining tapply(x, g, mean) and g in the example. Basically, I'm trying to create a vector whose values are based on the result from tapply(x, g, mean) but that follow the pattern and length given by the factor g. Of course I'm looking for a generic solution (i.e, not something that just work

Sparse KMeans/KDE/Nearest Neighbors?

2010 Feb 24

Sparse KMeans/KDE/Nearest Neighbors?

hi, I have a dataset (the netflix dataset) which is basically ~18k columns and well variable number of rows but let's assume 25 thousand for now. The dataset is very sparse. I was wondering how to do kmeans/nearest neighbors or kernel density estimation on it. I tired using the spMatrix function in "Matrix" package. I think I'm able to create the matrix but as soon as I pass

Error using the Rdonlp2‏ Package

2009 Jul 03

Error using the Rdonlp2‏ Package

Dear experts, I'm attempting to solve a constrained optimization problem using the Rdonlp2 package. I created a Lagrange function (L=f(x)-lambda(g(x)-c)), where x is a vector of 16 parameters. This is what I'm using as objective function in the code below. In addition, I set bounds on these parameters (par.u and par.l). When I run the code, I get the error message shown below. Any idea

My surprising experience in trying out REvolution's R

2009 Apr 21

My surprising experience in trying out REvolution's R

I care a lot about R's speed. So I decided to give REvolution's R (http://revolution-computing.com/) a try, which bills itself as an optimized R. Note that I used the free version. My machine is a Intel core 2 duo under Windows XP professional. The code I run is in the end of this post. First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage 50% Next, REvolution's R.

Avoid duplication in dplyr::summarise

2017 Sep 09

Avoid duplication in dplyr::summarise

Hi Lars I am not very sure what you really want. However, I am suggesting the following code that enables (1) to obtain the full summary of your data and (2) retrieve only mean of X values as function of factors f1 and f2. library(tidyverse) library(psych) df <- data.frame(matrix(rnorm(40), 10, 4), f1 = gl(3, 10, labels = letters[1:3]), f2 = gl(3, 10, labels

similar to: R on Large Data Sets (again)