similar to: Dealing With Extremely Large Files

Displaying 20 results from an estimated 8000 matches similar to: "Dealing With Extremely Large Files"

2009 May 09
5
Reading large files quickly
I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks!
2012 Mar 08
4
Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi there, I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) - to then delete a substantial number of rows & then convert to a .csv file. Upon the first attempt the computer crashed (at some point last night). I'm rerunning this now & am closely monitoring Processor/CPU/Memory. Apart from this crash being a computer issue alone (possibly), is R equipped to
2010 Nov 15
5
How to Read a Large CSV into a Database with R
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory,
2007 Sep 07
2
Automatic detachment of dependent packages
Dear All, When one loads certain packages, some other dependent packages are loaded as well. Is there some way of detaching them automatically when one detaches the first package loaded? For instance, > library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto but > detach(package:sqldf) > >
2010 Jul 28
1
sqldf 0.3-5 package or tcltk problem
This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit. This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens: > r=sqldf("select ... ") Loading required package: tcltk Loading Tcl/Tk interface ... Then
2012 Feb 02
9
sqldf for Very Large Tab Delimited Files
Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an
2008 Aug 21
2
Large data sets with R (binding to hadoop available?)
Dear R community, I find R fantastic and use R whenever I can for my data analytic needs. Certain data sets, however, are so large that other tools seem to be needed to pre-process data such that it can be brought into R for further analysis. Questions I have for the many expert contributors on this list are: 1. How do others handle situations of large data sets (gigabytes, terabytes)
2007 Sep 07
3
Delete query in sqldf?
Dear All, Is sqldf equipped with delete queries? I have tried delete queries but with no success. Thanks in advance, Paul
2009 Mar 16
1
errors when install RSQLite
Dear all, I am trying to install RSQLite package since I want to install "sqldf", and I used >> install.packages("RSQLite") first, which gave Error message as below: make: *** [RS-DBI.o] Error 1 chmod: cannot access `/usr/lib/R/library/RSQLite/libs/*': No such file or directory ERROR: compilation failed for package 'RSQLite' ** Removing
2009 Jan 16
5
Value Lookup from File without Slurping
Dear all, I have a repository file (let's call it repo.txt) that contain two columns like this: # tag value AAA 0.2 AAT 0.3 AAC 0.02 AAG 0.02 ATA 0.3 ATT 0.7 Given another query vector > qr <- c("AAC", "ATT") I would like to find the corresponding value for each query above, yielding: 0.02 0.7 However, I want to avoid slurping whole repo.txt
2007 Jul 19
1
package NULL not found
In performing Rcmd check I am getting this output regarding using Argument '' and a NULL package not found and it stops with an error: * using log directory 'C:/Rpkgs/sqldf.Rcheck' * using ARGUMENT ' ' __ignored__ R version 2.5.1 (2007-06-27) * checking for file 'sqldf/DESCRIPTION' ... OK * this is package 'sqldf' version '0.1-0' * checking package
2009 Mar 30
1
Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite
Dear all, I'm trying to import a csv file into sqlite3 and from there into R. Everything looks fine exepct that R outputs the character values in an odd fashion: they are shown as "\"CHARACTER\"" instead of "CHARACTER", but only if I show the character variable as a vector. Does someone know why this happens? Below is a sample code. The first part is written in
2010 Nov 01
1
sqldf hanging on macintosh - works on windows
Have a long script that runs fine on windows (32 bit). When I try to run in on two different macs (64 bit), however, it hangs with identical behavior. I start with: library(sqldf) This results in messages: Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required
2009 Nov 30
1
RSQLite does not read very large values correctly
Hello, I am trying to import data from an SQLite database to R. Unfortunately, I seem to get wrong data when I try to import very large numbers. For example: I look at the database via SQLiteStudio(v.1.1.3) and I see the following values: OrderID Day TimeToclose 1 2009-11-25 29467907000 2 2009-11-25 29467907000 3 2009-11-25 29467907000 Now I run this R Code: >
2007 Sep 07
5
SQL like function?
Hi RUsers, I am wonder if I can search observations whose IDs matches any of the values in another vector, such as in MySQL. While I am learing MySQL for future database management, I appreciate if anyone could give me a hint. Suppose I have one 5*1 vector containing observation IDs and frequencies, and one 3*1 vector containing observation IDs. observation<-c(1,2,3,4,5) ID<-c(1,3,4)
2008 Nov 22
5
What's the BEST way in R to adapt this vector?
Goal: Suppose you have a vector that is a discrete variable with values ranging from 1 to 3, and length of 10. We'll use this as the example: y <- c(1,2,3,1,2,3,1,2,3,1) ...and suppose you want your new vector (y.new) to be equal in length to the possible discrete values (3) times the length (10), and formatted in such a way that if y[1] == 1, then y.new[1:3] == c(1,0,0), and if y[2] ==
2007 Aug 01
1
New R package sqldf
sqldf is an R package for running SQL select statements on one or more R data frames. It is optimized for convenience making it useful for ad hoc queries against R data frames. Given an SQL select statement whose tables are the names of R data frames it: - sets up the database (by default it transparently sets up an in memory SQLite database using RSQLite; however, MySQL via RMySQL, can be
2007 Aug 01
1
New R package sqldf
sqldf is an R package for running SQL select statements on one or more R data frames. It is optimized for convenience making it useful for ad hoc queries against R data frames. Given an SQL select statement whose tables are the names of R data frames it: - sets up the database (by default it transparently sets up an in memory SQLite database using RSQLite; however, MySQL via RMySQL, can be
2008 Aug 25
8
SQL Primer for R
Dear R wizards: I decided to take the advice in the R data import/export manual and want to learn how to work with SQL for large data sets. I am trying SQLite with the DBI and RSQLite database interfaces. Speed is nice. Alas, I am struggling to find a tutorial that is geared for the kind of standard operations that I would want in R. Simple things: * how to determine the number of rows in a
2007 Aug 31
2
size limitations in R
I am a SAS user currently evaluating R as a possible addition or even replacement for SAS. The difficulty I have come across straight away is R's apparent difficulty in handling relatively large data files. Whilst I would not expect it to handle datasets with millions of records, I still really need to be able to work with dataset with 100,000+ records and 100+ variables. Yet, when reading