similar to: Too large a data set to be handled by R?

Displaying 20 results from an estimated 10000 matches similar to: "Too large a data set to be handled by R?"

2009 May 21
3
index to select rows of a large matrix
Dear R Users, I have created a 1500 x 20000 data frame - DataSeq. Each of the 1500 rows represents a data sequence. I have another data frame iData that stores the information of these 1500 data sequences in the same order, for example, condition, gender, etc. If I use "subset" to select certain groups within iData according to some criteria that I have set, e.g. condition, gender Then
2009 Apr 22
4
read.table or read.csv without row index?
Hello all, Probably my concepts about the data.frame and matrix and array in R are not clear, I need some clarification to help me understand them better. >M <- read.table("test1.csv",sep=",",row.names=NULL,header=T) gives me: M as M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 1 9 11 14 15 18 20 20 20 20 20 2 3 4 8 9 11 12 14 15 15 15 3 4 5 8 8 9 9 9 9 9 9 4 4
2008 Dec 08
3
Transforming a string to a variable's name? help me newbie...
Dear all, I'm a newbie in R. I have a 45x2x2x8 design. A dataframe stores the metadata of trials. And each trial has its own data file: I used "read.table" to import every trial into R as a dataframe (variable). Now I dynamically ask R to retrieve trials that fit certain selection criteria, so I use "subset", e.g. tmptrialinfo <- subset(trialinfo, (Subject==24 &
2009 May 19
2
Replace / swap values of subset of a data.frame
Dear R users, I have 1 data.frame of 1500x80 - data1. I found out that there are a few cells of data that I have misplace, and I need to fix the ordering of them. In an attempt trying to swap column 22 & 23 of the Subject with misplaced data, I did the following: > data2 <- data1 > subset(data1,(Subject==25 & Session==1))[,22] <- subset(data2,(Subject==25 &
2008 Dec 11
3
Resampling physiological data using R?
Dear all R users, I am going to use R to process some of my physiological data about eye. The problem is the recording machine does not sample in a reliably constant rate: the time intervals between data sampled can vary from 9msec to ~120msec, while most around in the 15-30msec range. The below is a fraction of a single data file of a trial: Time CursorX CursorY Pupilsize 1811543 -1 -1 -1
2009 Jun 01
3
Within Subject ANOVA question
Dear R users, I have copied for following table from an article on "Using confidence intervals in within-subject designs": Subject 1sec 2sec 5sec 1 10 13 13 12.00 2 6 8 8 7.33 3 11 14 14 13.00 4 22 23 25 23.33 5 16 18 20 18.00 6 15 17 17 16.33 7 1 1 4 2.00 8 12 15 17 14.67 9 9 12 12 11.00 10 8 9 12 9.67 I rearranged the data this way:
2008 Dec 15
2
cannot allocate vector of size... restructuring suggestion please...
Dear R Users, I was running some data analysis scripts and ran into this error: Error: cannot allocate vector of size 27.6 Mb Doing a "memory.size(max=TRUE)" will give me: [1] 1506.812 The current situation is: I'm working on a Windows Vista 32bit laptop with 4GB RAM (effectively 3GB I assume...) I have a data file of 450Mb loaded into R and have around 1500 data.frames floating
2008 Dec 10
2
Multpile (45x8) graphs of the same page / device: titles crammed
Dear R users, I'm trying to plot 45x8 graphs on the same pdf / device for the sake of visual comparison. par(mfcol=c(45,8)) par(mai=c(0,0,0,0)) In ?title, I can see there are cex and font settings: I set cex = 0.01 and font = 1: it is still very large, and then I tried setting font < 1, e.g. font = 0.5, Then I get an error: Error in title(paste(Ppercent, "% ",
2009 May 22
1
Paste Strings as logical for functions?
Dear R Users, I have some dynamic selection rules that I want to pass around for my functions: >rules <- paste(g$TrialList==1 & g$Session==2) >myfunction <- function(rules) { > index <- which(rules) > anotherFunction(index) > } However, I can't find a way to pass around these selection rules easily (for subset, for which, etc) Please let me know if you have
2009 Jul 22
1
R help - howto suppress chm help and use text/latex help?
Dear R Users, I've installed R with the chm option, but eventually I found I am more used to the normal way of latex help. Is there any argument to suppress chm help by default when starting R? So that I don't have to type something like >help(rle,chmhelp=NULL) I know the last resort would be to reinstall R. Thanks in advance! - J
2009 Dec 02
1
Calling R (GNU R) functions from Common Lisp, how?
Hi Lisp users, I'm a user of both Common Lisp and R (GNU R). I found R has a rich collection of statistical and numerical computation functions, while it is not as extensible as Lisp (Common Lisp). I considered Lisp-Stat but its only implementation is not in the usuall Common Lisp, and the available functions in CRAN are far richer than Lisp-Stat currently has. I want to know if there is
2016 Mar 16
2
match and unique
Is the phrase "index <- match(x, sort(unique(x)))" reliable, in the sense that it will never return NA? Context: Calculation of survival curves involves the concept of unique death times. I've had reported cases in the past where survfit failed, and it was due to the fact that two "differ by machine precision" values would sometimes match and sometimes not,
2018 Mar 30
2
sorting large msets
Hello, is there a way to optimize sorting by certain values for queries which return a huge amount of results? For example, I just want a simple query that gives me the 200 most recent emails out of millions. The elapsed time for get_mset increases as the number of documents ($n * 2000) increases. I suppose I could store a pre-sorted set using SQLite or similar. Thanks in advance for any
2010 Aug 20
1
Problem to compute a function with very large numbers
Dear R users, I have been trying to compute the following function and need it to work with n=15000, but it would only compute for smaller ns, such as n=1000 and not above. I was wondering if anyone would have a solution for this problem! Thank you very much for your kind support! Sincerely, Nan ------ Wi <- function(n) { fun <- function(w,i){
2011 Nov 18
3
tip: large plots
Hi all, I'm working with a bunch of large graphs, and stumbled across something useful. Probably many of you know this, but I didn't and so others might benefit. Using pch="." speeds up plotting considerably over using symbols. > x <- runif(1000000) > y <- runif(1000000) > system.time(plot(x, y, pch=".")) user system elapsed 1.042 0.030 1.077
2009 Nov 19
1
Performance of 'by' and 'ddply' on a large data frame
I've only recently started using R. One of the problems I come up against is after having extracted a large dataset (>5M rows) out of database, I realize I need another variable. In this case I have data frame with dates. I want to find the minimum date for each value of x1 and add that minimum date to my data.frame. > randomdf <- function(p) { data.frame(x1=sample(1:10^4, 10^p,
2004 Feb 11
7
large fonts on plots
Hi all, I need to enlarge te fonts used oo R-plots (plots, histograms, ...) in labels and titles etc. I seem to be unable to figure out how to do it. The problem is that the titles of the plots are simply unreadable when I insert them into my LaTeX text, since they are relatively small compared to the entire plot. I am sure it is pretty simple, can anybody give me a hint ? Please reply
2012 Apr 19
1
combining large list of data.frames
It's normal for me to create a list of data.frames and then use do.call('rbind', list(...)) to create a single data.frame. However, I've noticed as the size of the list grows large, it is perhaps better to do this in chunks. As an example here's a list of 20,000 similar data.frames. # create list of data.frames dat <- vector("list", 20000) for(i in
2013 May 25
2
Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?
Hi, in my packages/functions/code I tend to remove large temporary variables as soon as possible, e.g. large intermediate vectors used in iterations. I sometimes also have the habit of doing this to make it explicit in the source code when a temporary object is no longer needed. However, I did notice that this can add a noticeable overhead when the rest of the iteration step does not take that
2010 Feb 12
1
paired wilcox test on each row of a large dataframe
hI I have to calculate V statistic for each row of a large dataframe (28000). I can not use multtest package for paired wilcox test. I have been using for loop which are. Is there a way to speed the computation with another method like using apply or tapply? My data set looks like this: 11573_MB 11911_MB 11966_MB 12091_MB 12168_MB 12420_MB................ cg00000292