thr3ads.net - similar to: "Too large a data set to be handled by R?"

Displaying 20 results from an estimated 10000 matches similar to: "Too large a data set to be handled by R?"

2009 May 21

index to select rows of a large matrix

Dear R Users, I have created a 1500 x 20000 data frame - DataSeq. Each of the 1500 rows represents a data sequence. I have another data frame iData that stores the information of these 1500 data sequences in the same order, for example, condition, gender, etc. If I use "subset" to select certain groups within iData according to some criteria that I have set, e.g. condition, gender Then

read.table or read.csv without row index?

2009 Apr 22

read.table or read.csv without row index?

Hello all, Probably my concepts about the data.frame and matrix and array in R are not clear, I need some clarification to help me understand them better. >M <- read.table("test1.csv",sep=",",row.names=NULL,header=T) gives me: M as M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 1 9 11 14 15 18 20 20 20 20 20 2 3 4 8 9 11 12 14 15 15 15 3 4 5 8 8 9 9 9 9 9 9 4 4

Transforming a string to a variable's name? help me newbie...

2008 Dec 08

Transforming a string to a variable's name? help me newbie...

Dear all, I'm a newbie in R. I have a 45x2x2x8 design. A dataframe stores the metadata of trials. And each trial has its own data file: I used "read.table" to import every trial into R as a dataframe (variable). Now I dynamically ask R to retrieve trials that fit certain selection criteria, so I use "subset", e.g. tmptrialinfo <- subset(trialinfo, (Subject==24 &

Replace / swap values of subset of a data.frame

2009 May 19

Replace / swap values of subset of a data.frame

Dear R users, I have 1 data.frame of 1500x80 - data1. I found out that there are a few cells of data that I have misplace, and I need to fix the ordering of them. In an attempt trying to swap column 22 & 23 of the Subject with misplaced data, I did the following: > data2 <- data1 > subset(data1,(Subject==25 & Session==1))[,22] <- subset(data2,(Subject==25 &

Resampling physiological data using R?

2008 Dec 11

Resampling physiological data using R?

Dear all R users, I am going to use R to process some of my physiological data about eye. The problem is the recording machine does not sample in a reliably constant rate: the time intervals between data sampled can vary from 9msec to ~120msec, while most around in the 15-30msec range. The below is a fraction of a single data file of a trial: Time CursorX CursorY Pupilsize 1811543 -1 -1 -1

Within Subject ANOVA question

2009 Jun 01

Within Subject ANOVA question

Dear R users, I have copied for following table from an article on "Using confidence intervals in within-subject designs": Subject 1sec 2sec 5sec 1 10 13 13 12.00 2 6 8 8 7.33 3 11 14 14 13.00 4 22 23 25 23.33 5 16 18 20 18.00 6 15 17 17 16.33 7 1 1 4 2.00 8 12 15 17 14.67 9 9 12 12 11.00 10 8 9 12 9.67 I rearranged the data this way:

cannot allocate vector of size... restructuring suggestion please...

2008 Dec 15

cannot allocate vector of size... restructuring suggestion please...

Dear R Users, I was running some data analysis scripts and ran into this error: Error: cannot allocate vector of size 27.6 Mb Doing a "memory.size(max=TRUE)" will give me: [1] 1506.812 The current situation is: I'm working on a Windows Vista 32bit laptop with 4GB RAM (effectively 3GB I assume...) I have a data file of 450Mb loaded into R and have around 1500 data.frames floating

Multpile (45x8) graphs of the same page / device: titles crammed

2008 Dec 10

Multpile (45x8) graphs of the same page / device: titles crammed

Dear R users, I'm trying to plot 45x8 graphs on the same pdf / device for the sake of visual comparison. par(mfcol=c(45,8)) par(mai=c(0,0,0,0)) In ?title, I can see there are cex and font settings: I set cex = 0.01 and font = 1: it is still very large, and then I tried setting font < 1, e.g. font = 0.5, Then I get an error: Error in title(paste(Ppercent, "% ",

Paste Strings as logical for functions?

2009 May 22

Paste Strings as logical for functions?

Dear R Users, I have some dynamic selection rules that I want to pass around for my functions: >rules <- paste(g$TrialList==1 & g$Session==2) >myfunction <- function(rules) { > index <- which(rules) > anotherFunction(index) > } However, I can't find a way to pass around these selection rules easily (for subset, for which, etc) Please let me know if you have

R help - howto suppress chm help and use text/latex help?

2009 Jul 22

R help - howto suppress chm help and use text/latex help?

Dear R Users, I've installed R with the chm option, but eventually I found I am more used to the normal way of latex help. Is there any argument to suppress chm help by default when starting R? So that I don't have to type something like >help(rle,chmhelp=NULL) I know the last resort would be to reinstall R. Thanks in advance! - J

Calling R (GNU R) functions from Common Lisp, how?

2009 Dec 02

Calling R (GNU R) functions from Common Lisp, how?

Hi Lisp users, I'm a user of both Common Lisp and R (GNU R). I found R has a rich collection of statistical and numerical computation functions, while it is not as extensible as Lisp (Common Lisp). I considered Lisp-Stat but its only implementation is not in the usuall Common Lisp, and the available functions in CRAN are far richer than Lisp-Stat currently has. I want to know if there is

match and unique

2016 Mar 16

match and unique

Is the phrase "index <- match(x, sort(unique(x)))" reliable, in the sense that it will never return NA? Context: Calculation of survival curves involves the concept of unique death times. I've had reported cases in the past where survfit failed, and it was due to the fact that two "differ by machine precision" values would sometimes match and sometimes not,

sorting large msets

2018 Mar 30

sorting large msets

Hello, is there a way to optimize sorting by certain values for queries which return a huge amount of results? For example, I just want a simple query that gives me the 200 most recent emails out of millions. The elapsed time for get_mset increases as the number of documents ($n * 2000) increases. I suppose I could store a pre-sorted set using SQLite or similar. Thanks in advance for any

Problem to compute a function with very large numbers

2010 Aug 20

Problem to compute a function with very large numbers

Dear R users, I have been trying to compute the following function and need it to work with n=15000, but it would only compute for smaller ns, such as n=1000 and not above. I was wondering if anyone would have a solution for this problem! Thank you very much for your kind support! Sincerely, Nan ------ Wi <- function(n) { fun <- function(w,i){

tip: large plots

2011 Nov 18

tip: large plots

Hi all, I'm working with a bunch of large graphs, and stumbled across something useful. Probably many of you know this, but I didn't and so others might benefit. Using pch="." speeds up plotting considerably over using symbols. > x <- runif(1000000) > y <- runif(1000000) > system.time(plot(x, y, pch=".")) user system elapsed 1.042 0.030 1.077

Performance of 'by' and 'ddply' on a large data frame

2009 Nov 19

Performance of 'by' and 'ddply' on a large data frame

I've only recently started using R. One of the problems I come up against is after having extracted a large dataset (>5M rows) out of database, I realize I need another variable. In this case I have data frame with dates. I want to find the minimum date for each value of x1 and add that minimum date to my data.frame. > randomdf <- function(p) { data.frame(x1=sample(1:10^4, 10^p,

large fonts on plots

2004 Feb 11

large fonts on plots

Hi all, I need to enlarge te fonts used oo R-plots (plots, histograms, ...) in labels and titles etc. I seem to be unable to figure out how to do it. The problem is that the titles of the plots are simply unreadable when I insert them into my LaTeX text, since they are relatively small compared to the entire plot. I am sure it is pretty simple, can anybody give me a hint ? Please reply

combining large list of data.frames

2012 Apr 19

combining large list of data.frames

It's normal for me to create a list of data.frames and then use do.call('rbind', list(...)) to create a single data.frame. However, I've noticed as the size of the list grows large, it is perhaps better to do this in chunks. As an example here's a list of 20,000 similar data.frames. # create list of data.frames dat <- vector("list", 20000) for(i in

Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?

2013 May 25

Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?

Hi, in my packages/functions/code I tend to remove large temporary variables as soon as possible, e.g. large intermediate vectors used in iterations. I sometimes also have the habit of doing this to make it explicit in the source code when a temporary object is no longer needed. However, I did notice that this can add a noticeable overhead when the rest of the iteration step does not take that

paired wilcox test on each row of a large dataframe

2010 Feb 12

paired wilcox test on each row of a large dataframe

hI I have to calculate V statistic for each row of a large dataframe (28000). I can not use multtest package for paired wilcox test. I have been using for loop which are. Is there a way to speed the computation with another method like using apply or tapply? My data set looks like this: 11573_MB 11911_MB 11966_MB 12091_MB 12168_MB 12420_MB................ cg00000292

similar to: Too large a data set to be handled by R?