thr3ads.net - similar to: "Dealing With Extremely Large Files"

Displaying 20 results from an estimated 8000 matches similar to: "Dealing With Extremely Large Files"

2009 May 09

Reading large files quickly

I'm finding that readLines() and read.fwf() take nearly two hours to work through a 3.5 GB file, even when reading in large (100 MB) chunks. The unix command wc by contrast processes the same file in three minutes. Is there a faster way to read files in R? Thanks!

Reading in 9.6GB .DAT File - OK with 64-bit R?

2012 Mar 08

Reading in 9.6GB .DAT File - OK with 64-bit R?

Hi there, I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) - to then delete a substantial number of rows & then convert to a .csv file. Upon the first attempt the computer crashed (at some point last night). I'm rerunning this now & am closely monitoring Processor/CPU/Memory. Apart from this crash being a computer issue alone (possibly), is R equipped to

How to Read a Large CSV into a Database with R

2010 Nov 15

How to Read a Large CSV into a Database with R

Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory,

Automatic detachment of dependent packages

2007 Sep 07

Automatic detachment of dependent packages

Dear All, When one loads certain packages, some other dependent packages are loaded as well. Is there some way of detaching them automatically when one detaches the first package loaded? For instance, > library(sqldf) Loading required package: RSQLite Loading required package: DBI Loading required package: gsubfn Loading required package: proto but > detach(package:sqldf) > >

sqldf 0.3-5 package or tcltk problem

2010 Jul 28

sqldf 0.3-5 package or tcltk problem

This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit. This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens: > r=sqldf("select ... ") Loading required package: tcltk Loading Tcl/Tk interface ... Then

sqldf for Very Large Tab Delimited Files

2012 Feb 02

sqldf for Very Large Tab Delimited Files

Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an

Large data sets with R (binding to hadoop available?)

2008 Aug 21

Large data sets with R (binding to hadoop available?)

Dear R community, I find R fantastic and use R whenever I can for my data analytic needs. Certain data sets, however, are so large that other tools seem to be needed to pre-process data such that it can be brought into R for further analysis. Questions I have for the many expert contributors on this list are: 1. How do others handle situations of large data sets (gigabytes, terabytes)

Delete query in sqldf?

2007 Sep 07

Delete query in sqldf?

Dear All, Is sqldf equipped with delete queries? I have tried delete queries but with no success. Thanks in advance, Paul

errors when install RSQLite

2009 Mar 16

errors when install RSQLite

Dear all, I am trying to install RSQLite package since I want to install "sqldf", and I used >> install.packages("RSQLite") first, which gave Error message as below: make: *** [RS-DBI.o] Error 1 chmod: cannot access `/usr/lib/R/library/RSQLite/libs/*': No such file or directory ERROR: compilation failed for package 'RSQLite' ** Removing

Value Lookup from File without Slurping

2009 Jan 16

Value Lookup from File without Slurping

Dear all, I have a repository file (let's call it repo.txt) that contain two columns like this: # tag value AAA 0.2 AAT 0.3 AAC 0.02 AAG 0.02 ATA 0.3 ATT 0.7 Given another query vector > qr <- c("AAC", "ATT") I would like to find the corresponding value for each query above, yielding: 0.02 0.7 However, I want to avoid slurping whole repo.txt

package NULL not found

2007 Jul 19

package NULL not found

In performing Rcmd check I am getting this output regarding using Argument '' and a NULL package not found and it stops with an error: * using log directory 'C:/Rpkgs/sqldf.Rcheck' * using ARGUMENT ' ' __ignored__ R version 2.5.1 (2007-06-27) * checking for file 'sqldf/DESCRIPTION' ... OK * this is package 'sqldf' version '0.1-0' * checking package

Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite

2009 Mar 30

Importing csv file with character values into sqlite3 and subsequent problem in R / RSQLite

Dear all, I'm trying to import a csv file into sqlite3 and from there into R. Everything looks fine exepct that R outputs the character values in an odd fashion: they are shown as "\"CHARACTER\"" instead of "CHARACTER", but only if I show the character variable as a vector. Does someone know why this happens? Below is a sample code. The first part is written in

sqldf hanging on macintosh - works on windows

2010 Nov 01

sqldf hanging on macintosh - works on windows

Have a long script that runs fine on windows (32 bit). When I try to run in on two different macs (64 bit), however, it hangs with identical behavior. I start with: library(sqldf) This results in messages: Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required

RSQLite does not read very large values correctly

2009 Nov 30

RSQLite does not read very large values correctly

Hello, I am trying to import data from an SQLite database to R. Unfortunately, I seem to get wrong data when I try to import very large numbers. For example: I look at the database via SQLiteStudio(v.1.1.3) and I see the following values: OrderID Day TimeToclose 1 2009-11-25 29467907000 2 2009-11-25 29467907000 3 2009-11-25 29467907000 Now I run this R Code: >

SQL like function?

2007 Sep 07

SQL like function?

Hi RUsers, I am wonder if I can search observations whose IDs matches any of the values in another vector, such as in MySQL. While I am learing MySQL for future database management, I appreciate if anyone could give me a hint. Suppose I have one 5*1 vector containing observation IDs and frequencies, and one 3*1 vector containing observation IDs. observation<-c(1,2,3,4,5) ID<-c(1,3,4)

What's the BEST way in R to adapt this vector?

2008 Nov 22

What's the BEST way in R to adapt this vector?

Goal: Suppose you have a vector that is a discrete variable with values ranging from 1 to 3, and length of 10. We'll use this as the example: y <- c(1,2,3,1,2,3,1,2,3,1) ...and suppose you want your new vector (y.new) to be equal in length to the possible discrete values (3) times the length (10), and formatted in such a way that if y[1] == 1, then y.new[1:3] == c(1,0,0), and if y[2] ==

New R package sqldf

2007 Aug 01

New R package sqldf

sqldf is an R package for running SQL select statements on one or more R data frames. It is optimized for convenience making it useful for ad hoc queries against R data frames. Given an SQL select statement whose tables are the names of R data frames it: - sets up the database (by default it transparently sets up an in memory SQLite database using RSQLite; however, MySQL via RMySQL, can be

New R package sqldf

2007 Aug 01

New R package sqldf

SQL Primer for R

2008 Aug 25

SQL Primer for R

Dear R wizards: I decided to take the advice in the R data import/export manual and want to learn how to work with SQL for large data sets. I am trying SQLite with the DBI and RSQLite database interfaces. Speed is nice. Alas, I am struggling to find a tutorial that is geared for the kind of standard operations that I would want in R. Simple things: * how to determine the number of rows in a

size limitations in R

2007 Aug 31

size limitations in R

I am a SAS user currently evaluating R as a possible addition or even replacement for SAS. The difficulty I have come across straight away is R's apparent difficulty in handling relatively large data files. Whilst I would not expect it to handle datasets with millions of records, I still really need to be able to work with dataset with 100,000+ records and 100+ variables. Yet, when reading

similar to: Dealing With Extremely Large Files