thr3ads.net - similar to: "Reading in large file in pieces"

Displaying 20 results from an estimated 60000 matches similar to: "Reading in large file in pieces"

2009 May 09

Reading large files quickly

I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks!

read.table

2005 Feb 25

read.table

I have a commonly recurring problem and wondered if folks would share tips. I routinely get tab-delimited text files that I need to read in. In very many cases, I get: > a <- read.table('junk.txt.txt',header=T,skip=10,sep="\t") Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 67 did not have 88 elements I am typically able to go

sqldf for Very Large Tab Delimited Files

2012 Feb 02

sqldf for Very Large Tab Delimited Files

Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an

Reading huge chunks of data from MySQL into Windows R

2005 Jun 06

Reading huge chunks of data from MySQL into Windows R

Dear List, I'm trying to use R under Windows on a huge database in MySQL via ODBC (technical reasons for this...). Now I want to read tables with some 160.000.000 entries into R. I would be lucky if anyone out there has some good hints what to consider concerning memory management. I'm not sure about the best methods reading such huge files into R. for the moment I spilt the whole

Unexpected source() behavior in R-devel

2006 Sep 03

Unexpected source() behavior in R-devel

Why am I seeing the following in R-devel (sept 2, 2006 build) on opensuse 10.1? I'm sure it is something simple I am missing, but I just don't see it (output below). Thanks, Sean > readLines(url("http://www.bioconductor.org/biocLite.R")) [1] "source(\"http://bioconductor.org/getBioC.R\")" [2] ""

reading tables into R. .

2005 Jun 03

reading tables into R. .

Hi, The file I am reading is a text file, whose contents are a matrix that has 15 rows and 58 columns. The first row has column names, and the first column has row names, so the format is correct as far as using read.table is concerned. The other values in the table are all float values (numeric). So when I read in the file using data1 <- read.table("HAL001_HAL0015_Signals.txt"), it

reading very large files

2007 Feb 02

reading very large files

Hi all, I have a large file (1.8 GB) with 900,000 lines that I would like to read. Each line is a string characters. Specifically I would like to randomly select 3000 lines. For smaller files, what I'm doing is: trs <- scan("myfile", what= character(), sep = "\n") trs<- trs[sample(length(trs), 3000)] And this works OK; however my computer seems not able to handle

Handling large dataset & dataframe [Broadcast]

2006 Apr 24

Handling large dataset & dataframe [Broadcast]

Here's a skeletal example. Embellish as needed: p <- 5 n <- 300 set.seed(1) dat <- cbind(rnorm(n), matrix(runif(n * p), n, p)) write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE) xtx <- matrix(0, p + 1, p + 1) xty <- numeric(p + 1) f <- file("c:/temp/big.txt", open="r") for (i in 1:3) { x <- matrix(scan(f, nlines=100), 100,

read large amount of data

2005 Jul 18

read large amount of data

Hi, I have a dataset with 2194651x135, in which all the numbers are 0,1,2, and is bar-delimited. I used the following approach which can handle 100,000 lines: t<-scan('fv', sep='|', nlines=100000) t1<-matrix(t, nrow=135, ncol=100000) t2<-t(t1) t3<-as.data.frame(t2) I changed my plan into using stratified sampling with replacement (col 2 is my class variable: 1 or 2).

Best Hardware & OS For Large Data Sets

2010 Feb 27

Best Hardware & OS For Large Data Sets

Greetings, I am acquiring a new computer in order to conduct data analysis. I currently have a 32-bit Vista OS with 3G of RAM and I consistently run into memory allocation problems. I will likely be required to run Windows 7 on the new system, but have flexibility as far as hardware goes. Can people recommend the best hardware to minimize memory allocation problems? I am leaning towards dual

Users with large (4GB) inboxes crippling dovecot

2009 May 28

Users with large (4GB) inboxes crippling dovecot

Hi all, I'm new here and would very much appreciate any help you can give me. We are running a rather outdated mail server that until recently has been running beautifully. Under the pretense of "it is ain't broke, don't fix it" it hasn't been updated so is running Fedora Core 4 and dovecot v0.99.14. What is happening is that as users log in (via thunderbird), they

Hitting Files per Directory Limits with Ferret?

2007 Jan 05

Hitting Files per Directory Limits with Ferret?

Hey all! We''ve been using Ferret to great success these past six months. But recently we''ved tried adding many new ContentItems (only thing being index by Ferret at the moment), and things came crashing to a halt. ferret gem: 0.10.9 acts_as_ferret plugin (not sure which version) How we''re using the plugin: class ContentItem < ActiveRecord::Base acts_as_ferret

How to Read a Large CSV into a Database with R

2010 Nov 15

How to Read a Large CSV into a Database with R

Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory,

How can I find nonstandard or control characters in a large file?

2013 Dec 09

How can I find nonstandard or control characters in a large file?

I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes

how to get how many lines there are in a file.

2004 Dec 06

how to get how many lines there are in a file.

hi all If I wanna get the total number of lines in a big file without reading the file's content into R as matrix or data frame, any methods or functions? thanks in advance. Regards

Dependency-aware scripting tools for R

2012 Apr 19

Dependency-aware scripting tools for R

There are numerous tools like scons, make, ruffus, ant, rake, etc. that can be used to build complex pipelines based on task dependencies. These tools are written in a variety of languages, but I have not seen such a thing for R. Is anyone aware of a package available? The goal is to be able to develop robust bioinformatic pipelines driven by scripts written in R. Thanks, Sean

readLines() segfaults on large file & question on how to work around

2017 Sep 02

readLines() segfaults on large file & question on how to work around

Hi: I have a 2.1GB JSON file. Typically I use readLines() and jsonlite:fromJSON() to extract data from a JSON file. When I try and read in this file using readLines() R segfaults. I believe the two salient issues with this file are 1). Its size 2). It is a single line (no line breaks) I can reproduce this issue as follows #Generate a big file with no line breaks # In R >

RMySQL query: why result takes so much memory in R ?

2005 May 02

RMySQL query: why result takes so much memory in R ?

Hi I just started with RMySQL. I have a database with roughly 12 millions rows/records and 8 columns/fields. From all 12 millions of records I want to import 3 fields only. The fields are specified as:id int(11), group char(15), measurement float(4,2). Why does this take > 1G RAM? I run R on suse linux, with 1G RAM and with the code below it even fills the whole 1G of swap. I just

how to replace text...

2005 Apr 29

how to replace text...

if I have.... QQQQ<-priceIts("QQQQ",quote="Close") QQQQ<-priceIts("QQQQ",quote="Close");plot(QQQQ) and then i want to do the same thing but say with IBM instead of QQQQ is there an easy way like replace qqqq/ibm Thanks in advance./Jonathan

Repeated measures

2004 Oct 06

Repeated measures

I have a data set in which I have 5000 repeated measures on 6 subjects over time (varying intervals, but measurements for all individuals are at the same times). There are two states, a "resting" state (the majority of the time), and a perturbed state. I have a continuous measurement at each time point for each of the individuals. I would like to determine the "state"

similar to: Reading in large file in pieces