similar to: Reading in large file in pieces

Displaying 20 results from an estimated 60000 matches similar to: "Reading in large file in pieces"

2009 May 09
5
Reading large files quickly
I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks!
2005 Feb 25
4
read.table
I have a commonly recurring problem and wondered if folks would share tips. I routinely get tab-delimited text files that I need to read in. In very many cases, I get: > a <- read.table('junk.txt.txt',header=T,skip=10,sep="\t") Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 67 did not have 88 elements I am typically able to go
2012 Feb 02
9
sqldf for Very Large Tab Delimited Files
Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an
2005 Jun 06
3
Reading huge chunks of data from MySQL into Windows R
Dear List, I'm trying to use R under Windows on a huge database in MySQL via ODBC (technical reasons for this...). Now I want to read tables with some 160.000.000 entries into R. I would be lucky if anyone out there has some good hints what to consider concerning memory management. I'm not sure about the best methods reading such huge files into R. for the moment I spilt the whole
2006 Sep 03
1
Unexpected source() behavior in R-devel
Why am I seeing the following in R-devel (sept 2, 2006 build) on opensuse 10.1? I'm sure it is something simple I am missing, but I just don't see it (output below). Thanks, Sean > readLines(url("http://www.bioconductor.org/biocLite.R")) [1] "source(\"http://bioconductor.org/getBioC.R\")" [2] ""
2005 Jun 03
1
reading tables into R. .
Hi, The file I am reading is a text file, whose contents are a matrix that has 15 rows and 58 columns. The first row has column names, and the first column has row names, so the format is correct as far as using read.table is concerned. The other values in the table are all float values (numeric). So when I read in the file using data1 <- read.table("HAL001_HAL0015_Signals.txt"), it
2007 Feb 02
5
reading very large files
Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs <- scan("myfile", what= character(), sep = "\n") trs<- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle
2006 Apr 24
1
Handling large dataset & dataframe [Broadcast]
Here's a skeletal example. Embellish as needed: p <- 5 n <- 300 set.seed(1) dat <- cbind(rnorm(n), matrix(runif(n * p), n, p)) write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE) xtx <- matrix(0, p + 1, p + 1) xty <- numeric(p + 1) f <- file("c:/temp/big.txt", open="r") for (i in 1:3) { x <- matrix(scan(f, nlines=100), 100,
2017 Sep 02
5
readLines() segfaults on large file & question on how to work around
Hi: I have a 2.1GB JSON file. Typically I use readLines() and jsonlite:fromJSON() to extract data from a JSON file. When I try and read in this file using readLines() R segfaults. I believe the two salient issues with this file are 1). Its size 2). It is a single line (no line breaks) I can reproduce this issue as follows #Generate a big file with no line breaks # In R >
2005 Jul 18
1
read large amount of data
Hi, I have a dataset with 2194651x135, in which all the numbers are 0,1,2, and is bar-delimited. I used the following approach which can handle 100,000 lines: t<-scan('fv', sep='|', nlines=100000) t1<-matrix(t, nrow=135, ncol=100000) t2<-t(t1) t3<-as.data.frame(t2) I changed my plan into using stratified sampling with replacement (col 2 is my class variable: 1 or 2).
2011 Sep 06
3
Memory fragmentation and PCI passthrough
Hello, I''ve hit known problem with dynamic memory management - memory fragmentation... This dynamic memory management basically does xl mem-set to balance memory. After some time of running system, xen memory is so fragmented that it is impossible to start new VM with PCI device. Sometimes it crashes during boot (no 64MB contiguous memory for SWIOTLB), or later - eg. iwlagn cannot
2017 Sep 02
1
readLines() segfaults on large file & question on how to work around
Thank you for your suggestion. Unfortunately, while R doesn't segfault calling readr::read_file() on the test file I described, I get the error message: Error in read_file_(ds, locale) : negative length vectors are not allowed Jen On Sat, Sep 2, 2017 at 1:38 PM, Ista Zahn <istazahn at gmail.com> wrote: > As s work-around I suggest readr::read_file. > > --Ista > > >
2010 Feb 27
3
Best Hardware & OS For Large Data Sets
Greetings, I am acquiring a new computer in order to conduct data analysis. I currently have a 32-bit Vista OS with 3G of RAM and I consistently run into memory allocation problems. I will likely be required to run Windows 7 on the new system, but have flexibility as far as hardware goes. Can people recommend the best hardware to minimize memory allocation problems? I am leaning towards dual
2009 May 28
6
Users with large (4GB) inboxes crippling dovecot
Hi all, I'm new here and would very much appreciate any help you can give me. We are running a rather outdated mail server that until recently has been running beautifully. Under the pretense of "it is ain't broke, don't fix it" it hasn't been updated so is running Fedora Core 4 and dovecot v0.99.14. What is happening is that as users log in (via thunderbird), they
2007 Jan 05
7
Hitting Files per Directory Limits with Ferret?
Hey all! We''ve been using Ferret to great success these past six months. But recently we''ved tried adding many new ContentItems (only thing being index by Ferret at the moment), and things came crashing to a halt. ferret gem: 0.10.9 acts_as_ferret plugin (not sure which version) How we''re using the plugin: class ContentItem < ActiveRecord::Base acts_as_ferret
2010 Nov 15
5
How to Read a Large CSV into a Database with R
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory,
2013 Dec 09
2
How can I find nonstandard or control characters in a large file?
I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes
2004 Dec 06
6
how to get how many lines there are in a file.
hi all If I wanna get the total number of lines in a big file without reading the file's content into R as matrix or data frame, any methods or functions? thanks in advance. Regards
2012 Apr 19
2
Dependency-aware scripting tools for R
There are numerous tools like scons, make, ruffus, ant, rake, etc. that can be used to build complex pipelines based on task dependencies. These tools are written in a variety of languages, but I have not seen such a thing for R. Is anyone aware of a package available? The goal is to be able to develop robust bioinformatic pipelines driven by scripts written in R. Thanks, Sean
2005 May 02
2
RMySQL query: why result takes so much memory in R ?
Hi I just started with RMySQL. I have a database with roughly 12 millions rows/records and 8 columns/fields. From all 12 millions of records I want to import 3 fields only. The fields are specified as:id int(11), group char(15), measurement float(4,2). Why does this take > 1G RAM? I run R on suse linux, with 1G RAM and with the code below it even fills the whole 1G of swap. I just