i have a set of files that i am reading into R one at a time and applying
to a function that i have written
where each is a 'table' n (columns) x 10000 (rows)
n varies across the files and most of the rows only have data in the first
few columns
currently i am reading them in with the command:
read.table(file="2.75.0.997.1", header=FALSE, sep="",
skip=13, fill=,
row.names=1, nrows=10000)->list
***and it works fine
however we are now working with a huge table.
i was wondering if there is a more efficient way to read this in
IDEALLY i would like to have it as a list where each element is a row from
the input file, eliminating all of the NA's that the above approach results
in , such that i would have a list with 10000 elements and each of variable
length from 1:n
any help greatly appreciated
jimi adams
Department of Sociology
The Ohio State University
300 Bricker Hall
190 N. Oval Mall
Columbus, OH 43210-1353
614-688-4261
our mind has a remarkable ability to think of contents as being independent
of the act of thinking
-georg simmel
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 29 Apr 2002, jimi adams wrote:> i have a set of files that i am reading into R one at a time and applying > to a function that i have written > where each is a 'table' n (columns) x 10000 (rows) > n varies across the files and most of the rows only have data in the first > few columns > currently i am reading them in with the command: > read.table(file="2.75.0.997.1", header=FALSE, sep="", skip=13, fill=, > row.names=1, nrows=10000)->list > > ***and it works fine > however we are now working with a huge table. > i was wondering if there is a more efficient way to read this in > > IDEALLY i would like to have it as a list where each element is a row from > the input file, eliminating all of the NA's that the above approach results > in , such that i would have a list with 10000 elements and each of variable > length from 1:n >You could declare a list with 10000 elements as data<-vector("list",10000) and then open a connection to the file and read one line at a time: a<-file("2.75.0.997.1") open(a) for(i in 1:10000) data[[i]]<-scan(a,nlines=1) I don't know if that would be more efficient, but it would use less memory. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
i previously sent in the message below
and i got several responses back, that work, however, now i am running into
a different problem
i used the following line to read in the file:
temp.file<- readLines("2.75.0.997.1")
i was then recommended to use:
lapply(strsplit(temp.file,"*", as.numeric)
to convert this to a list
the only problem is that the file that i am reading in has values ranging
from 1:10000, and this splits it out into individual numeric characters...
not the initial values (e.g., 876 returns as 8, 7, & 6)
i think i figured out how to do this if the values were all of the same
length, but they are not, so i am wondering if there is some sort of split
command that is equivalent to what sep="" does when writing...rather
than
being defined by a specific numeric value.
ultimately what i want is:
if the initial file which looks like:
1 412 2000
2 4
3 8888
...
to become a list:
[1]
412 2000
[2]
4
[3]
8888
...
thanks in advance.
***************************
i have a set of files that i am reading into R one at a time and applying
to a function that i have written
where each is a 'table' n (columns) x 10000 (rows)
n varies across the files and most of the rows only have data in the first
few columns
currently i am reading them in with the command:
read.table(file="2.75.0.997.1", header=FALSE, sep="",
skip=13, fill=,
row.names=1, nrows=10000)->list
***and it works fine
however we are now working with a huge table.
i was wondering if there is a more efficient way to read this in
IDEALLY i would like to have it as a list where each element is a row from
the input file, eliminating all of the NA's that the above approach results
in , such that i would have a list with 10000 elements and each of variable
length from 1:n
any help greatly appreciated
jimi adams
Department of Sociology
The Ohio State University
300 Bricker Hall
190 N. Oval Mall
Columbus, OH 43210-1353
614-688-4261
our mind has a remarkable ability to think of contents as being independent
of the act of thinking
-georg simmel
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._