Displaying 20 results from an estimated 60000 matches similar to: "Reading in large file in pieces"
2009 May 09
5
Reading large files quickly
I'm finding that readLines() and read.fwf() take nearly two hours to
work through a 3.5 GB file, even when reading in large (100 MB) chunks.
The unix command wc by contrast processes the same file in three
minutes. Is there a faster way to read files in R?
Thanks!
2005 Feb 25
4
read.table
I have a commonly recurring problem and wondered if folks would share
tips. I routinely get tab-delimited text files that I need to read in.
In very many cases, I get:
> a <- read.table('junk.txt.txt',header=T,skip=10,sep="\t")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec, :
line 67 did not have 88 elements
I am typically able to go
2012 Feb 02
9
sqldf for Very Large Tab Delimited Files
Hi All,
I have a very (very) large tab-delimited text file without headers. There
are only 8 columns and millions of rows. I want to make numerous pieces of
this file by sub-setting it for individual stations. Station is given as in
the first column. I am trying to learn and use sqldf package for this but am
stuck in a couple of places.
To simulate my requirement, I have taken iris dataset as an
2005 Jun 06
3
Reading huge chunks of data from MySQL into Windows R
Dear List,
I'm trying to use R under Windows on a huge database in MySQL via ODBC
(technical reasons for this...). Now I want to read tables with some
160.000.000 entries into R. I would be lucky if anyone out there has
some good hints what to consider concerning memory management. I'm not
sure about the best methods reading such huge files into R. for the
moment I spilt the whole
2006 Sep 03
1
Unexpected source() behavior in R-devel
Why am I seeing the following in R-devel (sept 2, 2006 build) on opensuse
10.1? I'm sure it is something simple I am missing, but I just don't see it
(output below).
Thanks,
Sean
> readLines(url("http://www.bioconductor.org/biocLite.R"))
[1] "source(\"http://bioconductor.org/getBioC.R\")"
[2] ""
2005 Jun 03
1
reading tables into R. .
Hi,
The file I am reading is a text file, whose contents are a matrix that has 15 rows and 58 columns. The first row has column names, and the first column has row names, so the format is correct as far as using read.table is concerned. The other values in the table are all float values (numeric). So when I read in the file using data1 <- read.table("HAL001_HAL0015_Signals.txt"), it
2007 Feb 02
5
reading very large files
Hi all,
I have a large file (1.8 GB) with 900,000 lines that I would like to read.
Each line is a string characters. Specifically I would like to randomly
select 3000 lines. For smaller files, what I'm doing is:
trs <- scan("myfile", what= character(), sep = "\n")
trs<- trs[sample(length(trs), 3000)]
And this works OK; however my computer seems not able to handle
2006 Apr 24
1
Handling large dataset & dataframe [Broadcast]
Here's a skeletal example. Embellish as needed:
p <- 5
n <- 300
set.seed(1)
dat <- cbind(rnorm(n), matrix(runif(n * p), n, p))
write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE)
xtx <- matrix(0, p + 1, p + 1)
xty <- numeric(p + 1)
f <- file("c:/temp/big.txt", open="r")
for (i in 1:3) {
x <- matrix(scan(f, nlines=100), 100,
2005 Jul 18
1
read large amount of data
Hi,
I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
and is bar-delimited.
I used the following approach which can handle 100,000 lines:
t<-scan('fv', sep='|', nlines=100000)
t1<-matrix(t, nrow=135, ncol=100000)
t2<-t(t1)
t3<-as.data.frame(t2)
I changed my plan into using stratified sampling with replacement (col
2 is my class variable: 1 or 2).
2010 Feb 27
3
Best Hardware & OS For Large Data Sets
Greetings,
I am acquiring a new computer in order to conduct data analysis. I
currently have a 32-bit Vista OS with 3G of RAM and I consistently run into
memory allocation problems. I will likely be required to run Windows 7 on
the new system, but have flexibility as far as hardware goes. Can people
recommend the best hardware to minimize memory allocation problems? I am
leaning towards dual
2009 May 28
6
Users with large (4GB) inboxes crippling dovecot
Hi all,
I'm new here and would very much appreciate any help you can give me.
We are running a rather outdated mail server that until recently has been
running beautifully. Under the pretense of "it is ain't broke, don't fix
it" it hasn't been updated so is running Fedora Core 4 and dovecot v0.99.14.
What is happening is that as users log in (via thunderbird), they
2007 Jan 05
7
Hitting Files per Directory Limits with Ferret?
Hey all!
We''ve been using Ferret to great success these past six months. But
recently we''ved tried adding many new ContentItems (only thing being
index by Ferret at the moment), and things came crashing to a halt.
ferret gem: 0.10.9
acts_as_ferret plugin (not sure which version)
How we''re using the plugin:
class ContentItem < ActiveRecord::Base
acts_as_ferret
2010 Nov 15
5
How to Read a Large CSV into a Database with R
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to
insert a very large CSV file into a SQLite database. I'm pretty new to
working with databases in R, so I apologize if I'm overlooking something
obvious here.
I'm trying to work with the American Community Survey data, which is two
1.3GB csv files. I have enough RAM to read one of them into memory,
2013 Dec 09
2
How can I find nonstandard or control characters in a large file?
I have a humongous csv file containing census data, far too big to read into
RAM. I have been trying to extract individual columns from this file using
the colbycol package. This works for certain subsets of the columns, but not
for others. I have not yet been able to precisely identify the problem
columns, as there are 731 columns and running colbycol on the file on my old
slow machine takes
2004 Dec 06
6
how to get how many lines there are in a file.
hi all
If I wanna get the total number of lines in a big file without reading
the file's content into R as matrix or data frame, any methods or
functions?
thanks in advance.
Regards
2012 Apr 19
2
Dependency-aware scripting tools for R
There are numerous tools like scons, make, ruffus, ant, rake, etc.
that can be used to build complex pipelines based on task
dependencies. These tools are written in a variety of languages, but
I have not seen such a thing for R. Is anyone aware of a package
available? The goal is to be able to develop robust bioinformatic
pipelines driven by scripts written in R.
Thanks,
Sean
2017 Sep 02
5
readLines() segfaults on large file & question on how to work around
Hi:
I have a 2.1GB JSON file. Typically I use readLines() and
jsonlite:fromJSON() to extract data from a JSON file.
When I try and read in this file using readLines() R segfaults.
I believe the two salient issues with this file are
1). Its size
2). It is a single line (no line breaks)
I can reproduce this issue as follows
#Generate a big file with no line breaks
# In R
>
2005 May 02
2
RMySQL query: why result takes so much memory in R ?
Hi
I just started with RMySQL. I have a database with roughly 12 millions
rows/records and 8 columns/fields.
From all 12 millions of records I want to import 3 fields only.
The fields are specified as:id int(11), group char(15), measurement
float(4,2).
Why does this take > 1G RAM? I run R on suse linux, with 1G RAM and with
the code below it even fills the whole 1G of swap. I just
2005 Apr 29
2
how to replace text...
if I have....
QQQQ<-priceIts("QQQQ",quote="Close")
QQQQ<-priceIts("QQQQ",quote="Close");plot(QQQQ)
and then i want to do the same thing but say with IBM instead of QQQQ
is there an easy way like replace qqqq/ibm
Thanks in advance./Jonathan
2004 Oct 06
2
Repeated measures
I have a data set in which I have 5000 repeated measures on 6 subjects
over time (varying intervals, but measurements for all individuals are
at the same times). There are two states, a "resting" state (the
majority of the time), and a perturbed state. I have a continuous
measurement at each time point for each of the individuals. I would
like to determine the "state"