similar to: R on Large Data Sets (again)

Displaying 20 results from an estimated 10000 matches similar to: "R on Large Data Sets (again)"

2010 Aug 12
2
R 64-bit and Revolution
Dear users, The company where I work is considering getting a license for Revolution Enterprise - Windows 64-bit. I'll appreciate for those familiar with the product if can share your experiences with it? In particular, how does it compare to the "free" version of R 64-bit? Thanks in advance. Regards, Lars. [[alternative HTML version deleted]]
2009 Sep 11
3
Working with large matrix
Dear All, I have large matrix (46000 x 11250). I would like to do the linear regression for each row. I wrote a simple function that has lm() and used apply(mat,1,func). The issue is that it takes ages to load the file and also to finish the lm. I am using LINUX 64 bit with 32G mem. Is there an elegant and fast way of completing this task? Thanks in advance. Kind regards, Ezhil
2010 Jun 15
1
help biglm.big.matrix; problem with weights
Hello colleagues, I have tried to use the package biglm. I want to specify a multivariate regression with a weight. I have imported a large dataset with the library(bigmemory). I load the library (biglm) and specified a regression with a weight. But I get everytime a error message like ?object not found? or ?`weights' must be a formula? or "error in eval(expr, envir, enclos)". I
2010 Feb 24
4
R Graphics into Latex‏
Hi, I'm new in Latex and I'm trying to include an R chart into a Latex document. This is what I'm doing: 1) In R: save the chart as a a Postcript in a folder C:/xxx/Density.eps 2) In Latex (using TexWorks on windows xp) : In the preambule: \documentclass[11pt]{article} \usepackage{graphicx} \begin{document} blah..blah…blah \begin{figure} \centering
2010 Dec 31
4
Sweave for "big" data analysis
Hi, Maybe I'm missing the point here...but let's suppose you are working with "large" data sets and using functions that take a significant amount of time to run in R. I woulnd't like to run these functions every time I call Sweave("myfile.Rnw") within R. What is the "common" practice to use Sweave in these situations. I would just run the function once,
2009 Mar 17
3
Non-Linear Optimization - Query
Dear All, I couple of weeks ago, I’ve asked for a package recommendation for nonlinear optimization. In my problem I have a fairly complicated non-linear objective function subject to one non-linear equality constrain. I’ve been suggested to use the *Rdonlp2* package, but I did not get any results after running the program for 5 hrs. Is it normal to run this type of programs for hours? Also,
2009 Feb 25
3
Using very large matrix
Dear friends, I have to use a very large matrix. Something of the sort of matrix(80000,80000,n) .... where n is something numeric of the sort 0.xxxxxx I have not found a way of doing it. I keep getting the error Error in matrix(nrow = 80000, ncol = 80000, 0.2) : too many elements specified Any suggestions? I have searched the mailing list, but to no avail. Best, -- Corrado Topi Global
2011 May 21
3
Simple R Question...
Let's say I have the data frame 'dd' below. I'd like to select one column from this data frame (say 'a') and keep its name in the resulting data frame. That can be done as in #2. However, what if I want to make my selection based on a vector of names (and again keep those names in the resulting data frame). My attempt is #4 but doesn't work. dd <- data.frame(a =
2011 Jul 30
3
Problem with effects package
Dear List, Several times I use this package I get the error message shown below. When I work out simple examples, it turns out to be fine, but when working with real and moderate size data sets I always get the same error. Do you know what could be the cause of the problem? Error in apply(mod.matrix[, components], 1, prod) : subscript out of bounds Error in
2008 Jan 31
3
Memory problem?
Hello R users, I am trying to run a cox model for the prediction of relapse of 80 cancer tumors, taking into account the expression of 17000 genes. The data are large and I retrieve an error: "Cannot allocate vector of 2.4 Mb". I increase the memory.limit to 4000 (which is the largest supported by my computer) but I still retrieve the error because of other big variables that I have in
2011 Apr 26
7
Second largest element from each matrix row
Hi, I need to extract the second largest element from each row of a matrix. Below is my solution, but I think there should be a more efficient way to accomplish the same, or not? set.seed(1) a <- matrix(rnorm(9), 3 ,3) sec.large <- as.vector(apply(a, 1, order, decreasing=T)[2,]) ans <- sapply(1:length(sec.large), function(i) a[i, sec.large[i]]) ans Thanks in advance for your
2010 Dec 17
1
[Fwd: adding more columns in big.matrix object of bigmemory package]
Hi, With reference to the mail below, I have large datasets, coming from various different sources, which I can read into filebacked big.matrix using library bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I want to run regression using library 'biglm'). I am unsuccessfully trying to do this from quite some time now. Can you please
2017 Sep 09
2
Avoid duplication in dplyr::summarise
Dear group, Is there a way I could avoid the sort of duplication illustrated below? i.e., I have the same dplyr::summarise function on different group_by arguments. So I'd like to create a single summarise function that could be applied to both. My attempt below fails. df <- data.frame(matrix(rnorm(40), 10, 4), f1 = gl(3, 10, labels = letters[1:3]), f2 =
2013 Oct 12
2
Order of factors with facets in ggplot2
Hello, I'd like to produce a ggplot where the order of factors within facets is based on the average of another variable. Here's a reproducible example. My problem is that the factors are ordered similarly in both facets. I would like to have, within each facet of `f1', boxplots for 'x' within each factor `f2', where the boxplots are ordered based on the average of x
2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone, I’m working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) I’m using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2011 Jul 16
2
(unclassified?) Help Question
Dear List, I'd appreciate you guidance for obtaining the desired result shown below, by combining tapply(x, g, mean) and g in the example. Basically, I'm trying to create a vector whose values are based on the result from tapply(x, g, mean) but that follow the pattern and length given by the factor g. Of course I'm looking for a generic solution (i.e, not something that just work
2010 Feb 24
1
Sparse KMeans/KDE/Nearest Neighbors?
hi, I have a dataset (the netflix dataset) which is basically ~18k columns and well variable number of rows but let's assume 25 thousand for now. The dataset is very sparse. I was wondering how to do kmeans/nearest neighbors or kernel density estimation on it. I tired using the spMatrix function in "Matrix" package. I think I'm able to create the matrix but as soon as I pass
2009 Jul 03
2
Error using the Rdonlp2‏ Package
Dear experts, I'm attempting to solve a constrained optimization problem using the Rdonlp2 package. I created a Lagrange function (L=f(x)-lambda(g(x)-c)), where x is a vector of 16 parameters. This is what I'm using as objective function in the code below. In addition, I set bounds on these parameters (par.u and par.l). When I run the code, I get the error message shown below. Any idea
2009 Apr 21
4
My surprising experience in trying out REvolution's R
I care a lot about R's speed. So I decided to give REvolution's R (http://revolution-computing.com/) a try, which bills itself as an optimized R. Note that I used the free version. My machine is a Intel core 2 duo under Windows XP professional. The code I run is in the end of this post. First, the regular R 1.9. It takes 2 minutes and 6 seconds, CPU usage 50% Next, REvolution's R.
2017 Sep 09
0
Avoid duplication in dplyr::summarise
Hi Lars I am not very sure what you really want. However, I am suggesting the following code that enables (1) to obtain the full summary of your data and (2) retrieve only mean of X values as function of factors f1 and f2. library(tidyverse) library(psych) df <- data.frame(matrix(rnorm(40), 10, 4), f1 = gl(3, 10, labels = letters[1:3]), f2 = gl(3, 10, labels