thr3ads.net - R help - [R] Reading a specific column of a csv file in a loop [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Sergio René Araujo Enciso

2011-Nov-08 10:04 UTC

[R] Reading a specific column of a csv file in a loop

Dear all:

I have two larges files with 2000 columns. For each file I am
performing a loop to extract the "i"th element of each file and create
a data frame with both "i"th elements in order to perform further
analysis. I am not extracting all the "i"th elements but only certain
which I am indicating on a vector called "d".

See  an example of my  code below

### generate an example for the CSV files, the original files contain
more than 2000 columns, here for the sake of simplicity they have only
10 columns
M1<-matrix(rnorm(1000), nrow=100, ncol=10,
dimnames=list(seq(1:100),letters[1:10]))
M2<-matrix(rnorm(1000), nrow=100, ncol=10,
dimnames=list(seq(1:100),letters[1:10]))
write.table(M1, file="M1.csv", sep=",")
write.table(M2, file="M2.csv", sep=",")

### the vector containing the "i" elements to be read
d<-c(1,4,7,8)
P1<-read.table("M1.csv", header=TRUE)
P2<-read.table("M1.csv", header=TRUE)
for (i in d) {
M<-data.frame(P1[i],P2[i])
rm(list=setdiff(ls(),"d"))
}

As the files are quite large, I want to include "read.table" within
the loop so as it only read the "i"th element. I know that there is
the option "colClasses" for which I have to create a vector with zeros
for all the columns I do not want to load. Nonetheless I have no idea
how to make this vector to change in the loop, so as the only element
with no zeros is the "i"th element following the vector "d".
Any ideas
how to do this? Or is there anz other approach to load only an
specific element?

best regards,

Sergio Ren?

Gabor Grothendieck

2011-Nov-08 10:47 UTC

head link

[R] Reading a specific column of a csv file in a loop

2011/11/8 Sergio Ren? Araujo Enciso <araujo.enciso at
gmail.com>:> Dear all:
>
> I have two larges files with 2000 columns. For each file I am
> performing a loop to extract the "i"th element of each file and
create
> a data frame with both "i"th elements in order to perform further
> analysis. I am not extracting all the "i"th elements but only
certain
> which I am indicating on a vector called "d".
>
> See ?an example of my ?code below
>
> ### generate an example for the CSV files, the original files contain
> more than 2000 columns, here for the sake of simplicity they have only
> 10 columns
> M1<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> M2<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> write.table(M1, file="M1.csv", sep=",")
> write.table(M2, file="M2.csv", sep=",")
>
> ### the vector containing the "i" elements to be read
> d<-c(1,4,7,8)
> P1<-read.table("M1.csv", header=TRUE)
> P2<-read.table("M1.csv", header=TRUE)
> for (i in d) {
> M<-data.frame(P1[i],P2[i])
> rm(list=setdiff(ls(),"d"))
> }
>
> As the files are quite large, I want to include "read.table"
within
> the loop so as it only read the "i"th element. I know that there
is
> the option "colClasses" for which I have to create a vector with
zeros
> for all the columns I do not want to load. Nonetheless I have no idea
> how to make this vector to change in the loop, so as the only element
> with no zeros is the "i"th element following the vector
"d". Any ideas
> how to do this? Or is there anz other approach to load only an
> specific element?
>
Its a bit messy if there are row names so lets generate M1.csv like this:

write.csv(M1, file = "M1.csv", row.names = FALSE)

Then we can do this:

nc <- ncol(read.csv("M1.csv", nrows = 1))
colClasses <- replace(rep("NULL", nc), d, NA)
M1.subset <- read.csv("M1.csv", colClasses = colClasses)

or using the same M1.csv that we just generated try this which uses
sqldf with the H2 backend:

library(sqldf)
library(RH2)

M0 <- read.csv("M1.csv", nrows = 1)[0L, ]
M1.subset.h2 <- sqldf(c("insert into M0 (select * from
csvread('M1.csv'))",
        "select a, d, g, h from M0"))

This is referred to as Alternative 3 in FAQ#10 Example 6a on the sqldf
home page:
http://sqldf.googlecode.com
Alternative 1 and Alternative 2 listed there could also be tried.

(Note that although sqldf has a read.csv.sql command we did not use it
here since that command only works with the sqlite back end and the
RSQLite driver has a max of 999 columns.)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Jim Holtman

2011-Nov-08 10:54 UTC

head link

[R] Reading a specific column of a csv file in a loop

have you considered reading in the data and then creating objects for each
column and then saving (save) each to disk.  That way you incur the expense of
the read once and now have quick access (?load) to the column as you need them.

You could also use a database for this.

On Nov 8, 2011, at 5:04, Sergio Ren? Araujo Enciso <araujo.enciso at
gmail.com> wrote:
> Dear all:
> 
> I have two larges files with 2000 columns. For each file I am
> performing a loop to extract the "i"th element of each file and
create
> a data frame with both "i"th elements in order to perform further
> analysis. I am not extracting all the "i"th elements but only
certain
> which I am indicating on a vector called "d".
> 
> See  an example of my  code below
> 
> ### generate an example for the CSV files, the original files contain
> more than 2000 columns, here for the sake of simplicity they have only
> 10 columns
> M1<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> M2<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> write.table(M1, file="M1.csv", sep=",")
> write.table(M2, file="M2.csv", sep=",")
> 
> ### the vector containing the "i" elements to be read
> d<-c(1,4,7,8)
> P1<-read.table("M1.csv", header=TRUE)
> P2<-read.table("M1.csv", header=TRUE)
> for (i in d) {
> M<-data.frame(P1[i],P2[i])
> rm(list=setdiff(ls(),"d"))
> }
> 
> As the files are quite large, I want to include "read.table"
within
> the loop so as it only read the "i"th element. I know that there
is
> the option "colClasses" for which I have to create a vector with
zeros
> for all the columns I do not want to load. Nonetheless I have no idea
> how to make this vector to change in the loop, so as the only element
> with no zeros is the "i"th element following the vector
"d". Any ideas
> how to do this? Or is there anz other approach to load only an
> specific element?
> 
> best regards,
> 
> Sergio Ren?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jan van der Laan

2011-Nov-15 20:06 UTC

head link

[R] Reading a specific column of a csv file in a loop

Yet another solution. This time using the LaF package:

library(LaF)
d<-c(1,4,7,8)
P1 <- laf_open_csv("M1.csv", column_types=rep("double",
10), skip=1)
P2 <- laf_open_csv("M2.csv", column_types=rep("double",
10), skip=1)
for (i in d) {
   M<-data.frame(P1[, i],P2[, i])
}

(The skip=1 is needed as laf_open_csv doesn't read headers)

Jan



On 11/08/2011 11:04 AM, Sergio Ren? Araujo Enciso wrote:> Dear all:
>
> I have two larges files with 2000 columns. For each file I am
> performing a loop to extract the "i"th element of each file and
create
> a data frame with both "i"th elements in order to perform further
> analysis. I am not extracting all the "i"th elements but only
certain
> which I am indicating on a vector called "d".
>
> See  an example of my  code below
>
> ### generate an example for the CSV files, the original files contain
> more than 2000 columns, here for the sake of simplicity they have only
> 10 columns
> M1<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> M2<-matrix(rnorm(1000), nrow=100, ncol=10,
> dimnames=list(seq(1:100),letters[1:10]))
> write.table(M1, file="M1.csv", sep=",")
> write.table(M2, file="M2.csv", sep=",")
>
> ### the vector containing the "i" elements to be read
> d<-c(1,4,7,8)
> P1<-read.table("M1.csv", header=TRUE)
> P2<-read.table("M1.csv", header=TRUE)
> for (i in d) {
> M<-data.frame(P1[i],P2[i])
> rm(list=setdiff(ls(),"d"))
> }
>
> As the files are quite large, I want to include "read.table"
within
> the loop so as it only read the "i"th element. I know that there
is
> the option "colClasses" for which I have to create a vector with
zeros
> for all the columns I do not want to load. Nonetheless I have no idea
> how to make this vector to change in the loop, so as the only element
> with no zeros is the "i"th element following the vector
"d". Any ideas
> how to do this? Or is there anz other approach to load only an
> specific element?
>
> best regards,
>
> Sergio Ren?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Nov 2011 - Reading a specific column of a csv file in a loop

[R] Reading a specific column of a csv file in a loop

[R] Reading a specific column of a csv file in a loop

[R] Reading a specific column of a csv file in a loop

[R] Reading a specific column of a csv file in a loop

Reasonably Related Threads