Displaying 20 results from an estimated 8000 matches similar to: "Dealing With Extremely Large Files"
2009 May 09
5
Reading large files quickly
I'm finding that readLines() and read.fwf() take nearly two hours to
work through a 3.5 GB file, even when reading in large (100 MB) chunks.
The unix command wc by contrast processes the same file in three
minutes. Is there a faster way to read files in R?
Thanks!
2012 Mar 08
4
Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi there,
I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine)
- to then delete a substantial number of rows & then convert to a .csv file.
Upon the first attempt the computer crashed (at some point last night).
I'm rerunning this now & am closely monitoring Processor/CPU/Memory.
Apart from this crash being a computer issue alone (possibly), is R equipped
to
2010 Nov 15
5
How to Read a Large CSV into a Database with R
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to
insert a very large CSV file into a SQLite database. I'm pretty new to
working with databases in R, so I apologize if I'm overlooking something
obvious here.
I'm trying to work with the American Community Survey data, which is two
1.3GB csv files. I have enough RAM to read one of them into memory,
2007 Sep 07
2
Automatic detachment of dependent packages
Dear All,
When one loads certain packages, some other dependent packages are
loaded as well. Is there some way of detaching them automatically when
one detaches the first package loaded? For instance,
> library(sqldf)
Loading required package: RSQLite
Loading required package: DBI
Loading required package: gsubfn
Loading required package: proto
but
> detach(package:sqldf)
>
>
2010 Jul 28
1
sqldf 0.3-5 package or tcltk problem
This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit.
This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens:
> r=sqldf("select ... ")
Loading required package: tcltk
Loading Tcl/Tk interface ...
Then
2012 Feb 02
9
sqldf for Very Large Tab Delimited Files
Hi All,
I have a very (very) large tab-delimited text file without headers. There
are only 8 columns and millions of rows. I want to make numerous pieces of
this file by sub-setting it for individual stations. Station is given as in
the first column. I am trying to learn and use sqldf package for this but am
stuck in a couple of places.
To simulate my requirement, I have taken iris dataset as an
2008 Aug 21
2
Large data sets with R (binding to hadoop available?)
Dear R community,
I find R fantastic and use R whenever I can for my data analytic
needs. Certain data sets, however, are so large that other tools
seem to be needed to pre-process data such that it can be brought
into R for further analysis.
Questions I have for the many expert contributors on this list are:
1. How do others handle situations of large data sets (gigabytes,
terabytes)
2007 Sep 07
3
Delete query in sqldf?
Dear All,
Is sqldf equipped with delete queries? I have tried delete queries but
with no success.
Thanks in advance,
Paul
2009 Mar 16
1
errors when install RSQLite
Dear all,
I am trying to install RSQLite package since I want to install "sqldf", and
I used
>> install.packages("RSQLite") first, which gave Error message as below:
make: *** [RS-DBI.o] Error 1
chmod: cannot access `/usr/lib/R/library/RSQLite/libs/*': No such file or
directory
ERROR: compilation failed for package 'RSQLite'
** Removing
2009 Jan 16
5
Value Lookup from File without Slurping
Dear all,
I have a repository file (let's call it repo.txt)
that contain two columns like this:
# tag value
AAA 0.2
AAT 0.3
AAC 0.02
AAG 0.02
ATA 0.3
ATT 0.7
Given another query vector
> qr <- c("AAC", "ATT")
I would like to find the corresponding value for each query above,
yielding:
0.02
0.7
However, I want to avoid slurping whole repo.txt
2007 Jul 19
1
package NULL not found
In performing Rcmd check I am getting this output regarding using
Argument '' and a NULL package not found and it stops with an error:
* using log directory 'C:/Rpkgs/sqldf.Rcheck'
* using ARGUMENT '
' __ignored__ R version 2.5.1 (2007-06-27)
* checking for file 'sqldf/DESCRIPTION' ... OK
* this is package 'sqldf' version '0.1-0'
* checking package
2009 Mar 30
1
Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite
Dear all,
I'm trying to import a csv file into sqlite3 and from there into
R. Everything looks fine exepct that R outputs the character values in
an odd fashion: they are shown as "\"CHARACTER\"" instead of
"CHARACTER", but only if I show the character variable as a
vector. Does someone know why this happens? Below is a sample
code. The first part is written in
2010 Nov 01
1
sqldf hanging on macintosh - works on windows
Have a long script that runs fine on windows (32 bit). When I try to run in
on two different macs (64 bit), however, it hangs with identical behavior.
I start with:
library(sqldf)
This results in messages:
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required
2009 Nov 30
1
RSQLite does not read very large values correctly
Hello,
I am trying to import data from an SQLite database to R.
Unfortunately, I seem to get wrong data when I try to import very large
numbers.
For example:
I look at the database via SQLiteStudio(v.1.1.3) and I see the following
values:
OrderID Day TimeToclose
1 2009-11-25 29467907000
2 2009-11-25 29467907000
3 2009-11-25 29467907000
Now I run this R Code:
>
2007 Sep 07
5
SQL like function?
Hi RUsers,
I am wonder if I can search observations whose IDs matches any of the
values in another vector, such as in MySQL. While I am learing MySQL for
future database management, I appreciate if anyone could give me a hint.
Suppose I have one 5*1 vector containing observation IDs and
frequencies, and one 3*1 vector containing observation IDs.
observation<-c(1,2,3,4,5)
ID<-c(1,3,4)
2008 Nov 22
5
What's the BEST way in R to adapt this vector?
Goal:
Suppose you have a vector that is a discrete variable with values ranging
from 1 to 3, and length of 10. We'll use this as the example:
y <- c(1,2,3,1,2,3,1,2,3,1)
...and suppose you want your new vector (y.new) to be equal in length to the
possible discrete values (3) times the length (10), and formatted in such a
way that if y[1] == 1, then y.new[1:3] == c(1,0,0), and if y[2] ==
2007 Aug 01
1
New R package sqldf
sqldf is an R package for running SQL select
statements on one or more R data frames. It is
optimized for convenience making it useful
for ad hoc queries against R data frames.
Given an SQL select statement whose tables
are the names of R data frames it:
- sets up the database (by default it transparently
sets up an in memory SQLite database using RSQLite;
however, MySQL via RMySQL, can be
2007 Aug 01
1
New R package sqldf
sqldf is an R package for running SQL select
statements on one or more R data frames. It is
optimized for convenience making it useful
for ad hoc queries against R data frames.
Given an SQL select statement whose tables
are the names of R data frames it:
- sets up the database (by default it transparently
sets up an in memory SQLite database using RSQLite;
however, MySQL via RMySQL, can be
2008 Aug 25
8
SQL Primer for R
Dear R wizards:
I decided to take the advice in the R data import/export manual and
want to learn how to work with SQL for large data sets. I am trying
SQLite with the DBI and RSQLite database interfaces. Speed is nice.
Alas, I am struggling to find a tutorial that is geared for the kind
of standard operations that I would want in R. Simple things:
* how to determine the number of rows in a
2007 Aug 31
2
size limitations in R
I am a SAS user currently evaluating R as a possible addition or even
replacement for SAS. The difficulty I have come across
straight away is R's apparent difficulty in handling relatively large data
files. Whilst I would not expect it to handle
datasets with millions of records, I still really need to be able to work
with dataset with 100,000+ records and 100+
variables. Yet, when reading