I have a comma delimited file with 62 fields of which some are comments. There are about 1.5 million records/lines. Sme of the fields which has comments and which i do not need have 40 characters. Of the 62 fields, I will need at most 12 fields. What's best way to read in the fields I need. If I read the entire file at once I will run out of memory. Could anyone please suggest some solution? Thanks, Babu. [[alternative HTML version deleted]]
On 02.08.2013 05:29, Babu Guha wrote:> I have a comma delimited file with 62 fields of which some are comments. > There are about 1.5 million records/lines. Sme of the fields which has > comments and which i do not need have 40 characters. Of the 62 fields, I > will need at most 12 fields. What's best way to read in the fields I need. > If I read the entire file at once I will run out of memory. Could anyone > please suggest some solution?See ?read.table and its argument colClasses: read.table(file, colClasses=c("numeric", "NULL", "factor")) Will read the first column as a numeric one, skip the second column and take the thirs one as a factor. Best, Uwe Ligges> > Thanks, > Babu. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 08/02/2013 01:29 PM, Babu Guha wrote:> I have a comma delimited file with 62 fields of which some are comments. > There are about 1.5 million records/lines. Sme of the fields which has > comments and which i do not need have 40 characters. Of the 62 fields, I > will need at most 12 fields. What's best way to read in the fields I need. > If I read the entire file at once I will run out of memory. Could anyone > please suggest some solution? >Hi Babu, Assuming that you know which fields you want, you could process the file line by line: # say your file is "mydata.csv" and you want lines 1 to 12 mycon<-file("mydata.csv",open="r") # assume you have exactly 1.5 million lines mydata<-matrix(NA,nrow=1500000,ncol=12) inputline<-"start" lineindex<-1 while(nchar(inputline)) { # read a line inputline<-readLines(mycon,1) if(nchar(inputline)) { mydata[lineindex,]<- unlist(sapply(strsplit(inputline,","),"[",1:12)) lineindex<-lineindex+1 } } close(mycon) Jim