Displaying 20 results from an estimated 10000 matches similar to: "Too large a data set to be handled by R?"
2009 May 21
3
index to select rows of a large matrix
Dear R Users,
I have created a 1500 x 20000 data frame - DataSeq. Each of the 1500
rows represents a data sequence.
I have another data frame iData that stores the information of these
1500 data sequences in the same order, for example, condition, gender,
etc.
If I use "subset" to select certain groups within iData according to
some criteria that I have set, e.g. condition, gender
Then
2009 Apr 22
4
read.table or read.csv without row index?
Hello all,
Probably my concepts about the data.frame and matrix and array in R
are not clear, I need some clarification to help me understand them
better.
>M <- read.table("test1.csv",sep=",",row.names=NULL,header=T)
gives me: M as
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
1 9 11 14 15 18 20 20 20 20 20
2 3 4 8 9 11 12 14 15 15 15
3 4 5 8 8 9 9 9 9 9 9
4 4
2008 Dec 08
3
Transforming a string to a variable's name? help me newbie...
Dear all,
I'm a newbie in R.
I have a 45x2x2x8 design.
A dataframe stores the metadata of trials. And each trial has its own
data file: I used "read.table" to import every trial into R as a
dataframe (variable).
Now I dynamically ask R to retrieve trials that fit certain selection
criteria, so I use "subset", e.g.
tmptrialinfo <- subset(trialinfo, (Subject==24 &
2009 May 19
2
Replace / swap values of subset of a data.frame
Dear R users,
I have 1 data.frame of 1500x80 - data1. I found out that there are a
few cells of data that I have misplace, and I need to fix the ordering
of them.
In an attempt trying to swap column 22 & 23 of the Subject with
misplaced data, I did the following:
> data2 <- data1
> subset(data1,(Subject==25 & Session==1))[,22] <- subset(data2,(Subject==25 &
2008 Dec 11
3
Resampling physiological data using R?
Dear all R users,
I am going to use R to process some of my physiological data about eye.
The problem is the recording machine does not sample in a reliably
constant rate: the time intervals between data sampled can vary from
9msec to ~120msec, while most around in the 15-30msec range.
The below is a fraction of a single data file of a trial:
Time CursorX CursorY Pupilsize
1811543 -1 -1 -1
2009 Jun 01
3
Within Subject ANOVA question
Dear R users,
I have copied for following table from an article on "Using confidence
intervals in within-subject designs":
Subject 1sec 2sec 5sec
1 10 13 13 12.00
2 6 8 8 7.33
3 11 14 14 13.00
4 22 23 25 23.33
5 16 18 20 18.00
6 15 17 17 16.33
7 1 1 4 2.00
8 12 15 17 14.67
9 9 12 12 11.00
10 8 9 12 9.67
I rearranged the data this way:
2008 Dec 15
2
cannot allocate vector of size... restructuring suggestion please...
Dear R Users,
I was running some data analysis scripts and ran into this error:
Error: cannot allocate vector of size 27.6 Mb
Doing a "memory.size(max=TRUE)" will give me:
[1] 1506.812
The current situation is:
I'm working on a Windows Vista 32bit laptop with 4GB RAM (effectively
3GB I assume...)
I have a data file of 450Mb loaded into R and have around 1500
data.frames floating
2008 Dec 10
2
Multpile (45x8) graphs of the same page / device: titles crammed
Dear R users,
I'm trying to plot 45x8 graphs on the same pdf / device for the sake
of visual comparison.
par(mfcol=c(45,8))
par(mai=c(0,0,0,0))
In ?title, I can see there are cex and font settings: I set cex = 0.01
and font = 1: it is still very large, and then I tried setting font <
1, e.g. font = 0.5, Then I get an error:
Error in title(paste(Ppercent, "% ",
2009 May 22
1
Paste Strings as logical for functions?
Dear R Users,
I have some dynamic selection rules that I want to pass around for my functions:
>rules <- paste(g$TrialList==1 & g$Session==2)
>myfunction <- function(rules) {
> index <- which(rules)
> anotherFunction(index)
> }
However, I can't find a way to pass around these selection rules
easily (for subset, for which, etc)
Please let me know if you have
2009 Jul 22
1
R help - howto suppress chm help and use text/latex help?
Dear R Users,
I've installed R with the chm option, but eventually I found I am more
used to the normal way of latex help.
Is there any argument to suppress chm help by default when starting R?
So that I don't have to type something like >help(rle,chmhelp=NULL)
I know the last resort would be to reinstall R.
Thanks in advance!
- J
2009 Dec 02
1
Calling R (GNU R) functions from Common Lisp, how?
Hi Lisp users,
I'm a user of both Common Lisp and R (GNU R).
I found R has a rich collection of statistical and numerical
computation functions, while it is not as extensible as Lisp (Common
Lisp).
I considered Lisp-Stat but its only implementation is not in the
usuall Common Lisp, and the available functions in CRAN are far richer
than Lisp-Stat currently has.
I want to know if there is
2016 Mar 16
2
match and unique
Is the phrase "index <- match(x, sort(unique(x)))" reliable, in the sense that it will
never return NA?
Context: Calculation of survival curves involves the concept of unique death times. I've
had reported cases in the past where survfit failed, and it was due to the fact that two
"differ by machine precision" values would sometimes match and sometimes not,
2018 Mar 30
2
sorting large msets
Hello, is there a way to optimize sorting by certain values
for queries which return a huge amount of results?
For example, I just want a simple query that gives me the 200
most recent emails out of millions. The elapsed time for
get_mset increases as the number of documents ($n * 2000)
increases.
I suppose I could store a pre-sorted set using SQLite or
similar. Thanks in advance for any
2010 Aug 20
1
Problem to compute a function with very large numbers
Dear R users,
I have been trying to compute the following function and need it to work
with n=15000, but it would only compute for smaller ns, such as n=1000 and
not above. I was wondering if anyone would have a solution for this problem!
Thank you very much for your kind support!
Sincerely,
Nan
------
Wi <- function(n) {
fun <- function(w,i){
2011 Nov 18
3
tip: large plots
Hi all,
I'm working with a bunch of large graphs, and stumbled across
something useful. Probably many of you know this, but I didn't and so
others might benefit.
Using pch="." speeds up plotting considerably over using symbols.
> x <- runif(1000000)
> y <- runif(1000000)
> system.time(plot(x, y, pch="."))
user system elapsed
1.042 0.030 1.077
2009 Nov 19
1
Performance of 'by' and 'ddply' on a large data frame
I've only recently started using R. One of the problems I come up
against is after having extracted a large dataset (>5M rows) out of
database, I realize I need another variable. In this case I have data
frame with dates. I want to find the minimum date for each value of x1
and add that minimum date to my data.frame.
> randomdf <- function(p) {
data.frame(x1=sample(1:10^4, 10^p,
2004 Feb 11
7
large fonts on plots
Hi all,
I need to enlarge te fonts used oo R-plots (plots, histograms, ...) in
labels and titles etc.
I seem to be unable to figure out how to do it. The problem is that the
titles of the plots are simply unreadable when I insert them into my LaTeX
text, since they are relatively small compared to the entire plot.
I am sure it is pretty simple, can anybody give me a hint ?
Please reply
2012 Apr 19
1
combining large list of data.frames
It's normal for me to create a list of data.frames and then use
do.call('rbind', list(...)) to create a single data.frame. However,
I've noticed as the size of the list grows large, it is perhaps better
to do this in chunks. As an example here's a list of 20,000 similar
data.frames.
# create list of data.frames
dat <- vector("list", 20000)
for(i in
Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?
2013 May 25
2
Assigning NULL to large variables is much faster than rm() - any reason why I should still use rm()?
Hi,
in my packages/functions/code I tend to remove large temporary
variables as soon as possible, e.g. large intermediate vectors used in
iterations. I sometimes also have the habit of doing this to make it
explicit in the source code when a temporary object is no longer
needed. However, I did notice that this can add a noticeable overhead
when the rest of the iteration step does not take that
2010 Feb 12
1
paired wilcox test on each row of a large dataframe
hI
I have to calculate V statistic for each row of a large dataframe (28000). I
can not use multtest package for paired wilcox test. I have been using for
loop which are. Is there a way to speed the computation with another method
like using apply or tapply?
My data set looks like this:
11573_MB 11911_MB 11966_MB 12091_MB 12168_MB
12420_MB................
cg00000292