thr3ads.net - R help - [R] Scanning only specific columns into R from a VERY large file [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Josh B

2010-Apr-16 22:12 UTC

[R] Scanning only specific columns into R from a VERY large file

Hi,

I turn to you, the R Sages, once again for help. You've never let me down!

(1) Please make the following toy files:

x <- read.table(textConnection("var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.210000 2 9 3 8"), header = TRUE)

y <- read.table(textConnection("var.3 var.1000"), header = TRUE)

write.csv(x, file = "x.csv")
write.csv(y, file = "y.csv")

(2) Pretend you are starting with the files "x.csv" and
"y.csv." They come from another source -- an online database. Pretend
that these files are much, much, much larger. Specifically:
    (a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. 
    (b) "y.csv" contains just header titles. Pretend that there are 90
header titles in "y.csv" in total. These header titles are a subset of
the header titles in "x.csv."

(3) What I want to do is scan (or import, or whatever the appropriate word is)
only a subset of the columns from "x.csv" into an R. Specifically, I
only want to scan the columns of data from "x.csv" into R that are
indicated in the file "y.csv." I still want to scan in all 210000 rows
from "x.csv," but only for the aforementioned columns listed in
"y.csv."

Can you guys recommend a strategy for me? I think I need to use the scan
command, based on the hugeness of "x.csv," but I don't know what
exactly to do. Specific code that gets the job done would be the most useful.

Thank you very much in advance!
Josh



      
	[[alternative HTML version deleted]]

Sharpie

2010-Apr-16 22:21 UTC

head link

[R] Scanning only specific columns into R from a VERY large file

Josh B-3 wrote:> 
> Hi,
> 
> I turn to you, the R Sages, once again for help. You've never let me
down!
> 
> (1) Please make the following toy files:
> 
> x <- read.table(textConnection("var.1 var.2 var.3 var.1000
> indv.1 1 5 9 7
> indv.210000 2 9 3 8"), header = TRUE)
> 
> y <- read.table(textConnection("var.3 var.1000"), header =
TRUE)
> 
> write.csv(x, file = "x.csv")
> write.csv(y, file = "y.csv")
> 
> (2) Pretend you are starting with the files "x.csv" and
"y.csv." They come
> from another source -- an online database. Pretend that these files are
> much, much, much larger. Specifically: 
>     (a) Pretend that "x.csv" contains 1000 columns by 210,000
rows.
>     (b) "y.csv" contains just header titles. Pretend that there
are 90
> header titles in "y.csv" in total. These header titles are a
subset of the
> header titles in "x.csv."
> 
> (3) What I want to do is scan (or import, or whatever the appropriate word
> is) only a subset of the columns from "x.csv" into an R.
Specifically, I
> only want to scan the columns of data from "x.csv" into R that
are
> indicated in the file "y.csv." I still want to scan in all 210000
rows
> from "x.csv," but only for the aforementioned columns listed in
"y.csv."
> 
> Can you guys recommend a strategy for me? I think I need to use the scan
> command, based on the hugeness of "x.csv," but I don't know
what exactly
> to do. Specific code that gets the job done would be the most useful. 
> 
> Thank you very much in advance!
> Josh
> 
read.csv.sql() from the sqldf package looks like it may do what you want- it
allows you to filter what gets read in from a CSV file using SQL statements,
something like:

  SELECT list,of,column,names FROM file


Hope this helps!

-Charlie

-----
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context:
http://n4.nabble.com/Scanning-only-specific-columns-into-R-from-a-VERY-large-file-tp2013646p2013653.html
Sent from the R help mailing list archive at Nabble.com.

Rubén Roa

2010-Apr-19 06:39 UTC

head link

[R] Scanning only specific columns into R from a VERY large file

-----Mensaje original-----
De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] En
nombre de Josh B
Enviado el: s?bado, 17 de abril de 2010 0:12
Para: R Help
Asunto: [R] Scanning only specific columns into R from a VERY large file

Hi,

I turn to you, the R Sages, once again for help. You've never let me down!

(1) Please make the following toy files:

x <- read.table(textConnection("var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.210000 2 9 3 8"), header = TRUE)

y <- read.table(textConnection("var.3 var.1000"), header = TRUE)

write.csv(x, file = "x.csv")
write.csv(y, file = "y.csv")

(2) Pretend you are starting with the files "x.csv" and
"y.csv." They come from another source -- an online database. Pretend
that these files are much, much, much larger. Specifically:
    (a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. 
    (b) "y.csv" contains just header titles. Pretend that there are 90
header titles in "y.csv" in total. These header titles are a subset of
the header titles in "x.csv."

(3) What I want to do is scan (or import, or whatever the appropriate word is)
only a subset of the columns from "x.csv" into an R. Specifically, I
only want to scan the columns of data from "x.csv" into R that are
indicated in the file "y.csv." I still want to scan in all 210000 rows
from "x.csv," but only for the aforementioned columns listed in
"y.csv."

Can you guys recommend a strategy for me? I think I need to use the scan
command, based on the hugeness of "x.csv," but I don't know what
exactly to do. Specific code that gets the job done would be the most useful.

Thank you very much in advance!
Josh

-----------
Try with something like

do.call("cbind",scan(file="yourfile.csv",what=list(NULL,NULL,....,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE))

you have to work out how to set up the list of parameter 'what' to read
the headers of 'y'. In the above the only columns read are those
indicated by a '0'.

HTH

Ruben 


____________________________________________________________________________________

Dr. Rub?n Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

Apparently Analagous Threads

Search for more maybe matching threads

R help - Apr 2010 - Scanning only specific columns into R from a VERY large file

[R] Scanning only specific columns into R from a VERY large file

[R] Scanning only specific columns into R from a VERY large file

[R] Scanning only specific columns into R from a VERY large file

Apparently Analagous Threads