similar to: How to separate huge dataset into chunks

Displaying 20 results from an estimated 10000 matches similar to: "How to separate huge dataset into chunks"

2009 Apr 17
2
Manipulate single line in textfile
Hello all, Is it possible to modify a single line in a textfile? I know it is possible to load the whole text file, do the change, and save this as a new file. However, this is not practical in my case, because the document is huge and cannot be fully loaded in R. Any idea? Best, Guillaume
2008 Oct 16
2
Matrix starting at [0,0] instead of [1,1]?
Hello all, When I create a matrix, is there a way to make it start at [0,0], instead of [1,1]? That way, a 2x2 matrix would go from [0,0] to [1,1], instead of [1,1] to [2,2]. Best, Guillaume
2009 Mar 19
1
How do I add a variable to a text file?
Hello all, I have a 2.0 GB dataset that I can't load into R, due to memory issues. The dataset itself is in a tab-delimited .txt file with 25 variables. I have a variable I'd like to add to the dataset. How do I do this? Best, Guillaume
2010 Feb 28
1
Combining 2 columns into 1 column many times in a very large dataset
*Combining 2 columns into 1 column many times in a very large dataset* The clumsy solutions I am working on are not going to be very fast if I can get them to work and the true dataset is ~1500 X 45000 so they need to be efficient. I've searched the R help files and the archives for this list and have some possible workable solutions for 2) and 3) but not my question 1). However, I include
2006 Sep 20
3
Spliting a huge vector
Dear R users, I have a huge vector that I would like to split into unequal slices. However, the only way I can do this is to create another huge vector to define the groups that are used to split the original vector, e.g. # my vector is this a.vector <- seq(2, by=5, length=100) # indices where I would like to slice my vector cut.values <- c(30, 50, 100, 109, 300, 601, 803) # so I have to
2011 Jan 28
3
read.table() versus scan()
I need to import a large number of simple, space-delimited text files with a few columns of data each. The one quirk is that some rows are missing data and some contain junk text at the end of each line. A typical file might look like: a b c d 1 2 3 x 4 5 6 7 8 9 x 1 2 3 x c c 4 5 6 x 7 8 9 x I'm trying to avoid having to pre-process the text files, as they all sit on an ftp site that I
2005 Jun 06
1
AW: Reading huge chunks of data from MySQL into Windows R
In my (limited) experience R is more powerful concerning data manipulation. An example: I have a vector holding a user id. Some user ids can appear more than once. Doing SELECT COUNT(DISTINCT userid) on MySQL will take approx. 15 min. Doing length(unique(userid)) will take (almost) no time... So I think the other way round will serve best: Do everything in R and avoid using SQL on the database...
2013 May 16
1
[LLVMdev] Undoing DAG Combiner patterns
A better way to handle this is to a td pattern to match "add n, -c" to a subtraction. I believe several targets do something similar to this. Evan On May 16, 2013, at 7:12 AM, Tom Stellard <tom at stellard.net> wrote: > On Thu, May 16, 2013 at 02:03:14AM +0000, Martin Filteau wrote: >> Hi all, >> >> It's the first LLVM backend we do for our asynchronous
2007 Sep 29
1
templates with same name before extension are cached
Hi all, I was just wondering if this is the intended behavior. Here is my setup: controller def index respond_to do |f| f.xml { render :xml => true } f.html { render :layout => :none } end end In my views I have a file for each type index.herb index.xerb The first request I send is cached and interferes with the other one. For example, if I send an xml request
2007 Jul 10
1
Lattice: vertical barchart
barchart(Titanic, stack=F) produces a very nice horizontal barchart. Each panel has four groups of two bars. barchart(Titanic, stack=F, horizontal=F) doesn't produce the results I would have expected, as it produces this warning message: Warning message: y should be numeric in: bwplot.formula(x = as.formula(form), data = list(Class = c(1, And it results in each panel having 22 groups of
2013 May 16
2
[LLVMdev] Undoing DAG Combiner patterns
Hi all, It's the first LLVM backend we do for our asynchronous DSP. So, I apologize if this is a trivial question! The target-independent DAG combiner performs the following transformation: sub n, c -> add n, -c For our target, negative constants are more costly to encode. What is the best place to revert to a sub instruction? Kind regards, -- Martin
2005 Oct 19
3
diag() problem
Hi I have a matrix "u", for which diag() gives an error: u <- structure(c(5.42334674128216, -2.31319389204264, -5.83059042218476, -1.64112369640695, -2.31319389212801, 3.22737617646609, 1.85200668021569, -0.57102273078531, -5.83059042231881, 1.85200668008156, 11.9488923894962, -3.5525537165941, -1.64112369587405, -0.571022730886046, -3.55255371755604,
2005 Jun 06
3
Reading huge chunks of data from MySQL into Windows R
Dear List, I'm trying to use R under Windows on a huge database in MySQL via ODBC (technical reasons for this...). Now I want to read tables with some 160.000.000 entries into R. I would be lucky if anyone out there has some good hints what to consider concerning memory management. I'm not sure about the best methods reading such huge files into R. for the moment I spilt the whole
2007 Jun 01
2
Getting names of objects passed with "..."
Is there a tidy way to get the names of objects passed to a function via the "..." argument? rbind/cbind does what I want: test.func1 <- function(...) { nms <- rownames(rbind(..., deparse.level=1)) print(nms) } x <- "some stuff" second <- "more stuff" test.func1(first=x, second) [1] "first" "second" The usual
2015 May 12
2
Why is the diag function so slow (for extraction)?
>>>>> Steve Bronder <sbronder at stevebronder.com> >>>>> on Thu, 7 May 2015 11:49:49 -0400 writes: > Is it possible to replace c() with .subset()? It would be possible, but I think "entirely" wrong. .subset() is documented to be an internal function not to be used "lightly" and more to the point it is documented to *NOT*
1999 Aug 18
2
diag()
I would like to suggest a slight modification to diag(). In the case where x is a matrix with both row names and column names the same, it would be reasonable if the resulting vector also had those names. I often use diag() on variance matrices, where this modification is helpful. The modification requires replacing if (is.matrix(x) && nargs() == 1) return(c(x)[1 +
2011 Feb 19
1
Accessing Package NEWS (NEWS.Rd)
Okay. So, after having spent quite some time never really tracking down why my package NEWS files were unacceptable to readNEWS(), I noticed that there was recent (to me anyway) development that allowed the NEWS to be done as an Rd file. Sweet! A more standard format... I converted a NEWS file in one of my unreleased packages to Rd format. checkNEWS() gave it a thumbs up. But then it went south.
2015 May 05
3
Why is the diag function so slow (for extraction)?
Looks like the c(x)[...] bit used to be as.matrix(x)[...]. Not sure why the change was made many years ago, but this was before names were handled explicitly. It would definitely be better to not force the duplicate, at least in the case where we are sure c() and [ would not dispatch. Best, luke On Mon, 4 May 2015, peter dalgaard wrote: > >> On 04 May 2015, at 19:59 , franknarf
2008 May 08
3
Setting matrix dimnames in a list
Hey All, I was wondering if I could solicit a little input on what I'm trying to do here. I have a list of matrices, and I want to set their dimnames, but all I can come up with is this: x <- matrix(1:4,2) y <- matrix(5:8,2) z <- list(x,y) nm <- c("a","b") nms <- list(nm,nm) z <- lapply(z,function(x)dimnames(x)<-nms) As you can see, this
2015 May 13
1
Why is the diag function so slow (for extraction)?
As kindly pointed out to me (oh my decaying gray matter), is.object() is better suited for this test; $ svn diff src/library/base/R/diag.R Index: src/library/base/R/diag.R =================================================================== --- src/library/base/R/diag.R (revision 68345) +++ src/library/base/R/diag.R (working copy) @@ -23,9 +23,11 @@ stop("'nrow' or