Colleagues, This past week, I asked the following question: I have a file that looks that this: TABLE NO. 1 PTID TIME AMT FORM PERIOD IPRED CWRES EVID CP PRED RES WRES 2.0010E+03 3.9375E-01 5.0000E+03 2.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 2.0010E+03 8.9583E-01 5.0000E+03 2.0000E+00 0.0000E+00 3.3389E+00 0.0000E+00 1.0000E+00 0.0000E+00 3.5321E+00 0.0000E+00 0.0000E+00 2.0010E+03 1.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 5.8164E+00 0.0000E+00 1.0000E+00 0.0000E+00 5.9300E+00 0.0000E+00 0.0000E+00 2.0010E+03 1.9167E+00 5.0000E+03 2.0000E+00 0.0000E+00 8.3633E+00 0.0000E+00 1.0000E+00 0.0000E+00 8.7011E+00 0.0000E+00 0.0000E+00 2.0010E+03 2.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.0092E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.0324E+01 0.0000E+00 0.0000E+00 2.0010E+03 2.9375E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1490E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1688E+01 0.0000E+00 0.0000E+00 2.0010E+03 3.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.2940E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.3236E+01 0.0000E+00 0.0000E+00 2.0010E+03 4.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1267E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1324E+01 0.0000E+00 0.0000E+00 The file is reasonably large (> 10^6 lines) and the two line header is repeated periodically in the file. I need to read this file in as a data frame. Note that the number of columns, the column headers, and the number of replicates of the headers are not known in advance. I received a number of replies, many of them quite useful. Of these, one beat out all the others in my benchmarking using files ranging from 10^5 to 10^6 lines. That version, provided by Jim Holtman, was: x <- read.table(FILE, as.is = TRUE, skip=1, fill=TRUE, header = TRUE) x[] <- lapply(x, as.numeric) x <- x[!is.na(x[,1]), ] Other versions involved readLines, following by edits, following by cat (or write) to a temp file, then read.table again. The overhead with invoking readLines, write/cat, and read.table was substantially larger than the strategy of read.table / as.numeric / indexing Thanks for the input from many folks. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
All, Can someone describe what x[] <- lapply(x, as.numeric) I see that it is putting the list elements into a data frame. The results for lapply are a list, so how does this become a data frame. Thanks, Juliet On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis <fisher at plessthan.com> wrote:> Colleagues, > > This past week, I asked the following question: > > I have a file that looks that this: > > TABLE NO. 1 > PTID TIME AMT FORM PERIOD IPRED CWRES EVID CP PRED RES WRES > 2.0010E+03 3.9375E-01 5.0000E+03 2.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 > 2.0010E+03 8.9583E-01 5.0000E+03 2.0000E+00 0.0000E+00 3.3389E+00 0.0000E+00 1.0000E+00 0.0000E+00 3.5321E+00 0.0000E+00 0.0000E+00 > 2.0010E+03 1.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 5.8164E+00 0.0000E+00 1.0000E+00 0.0000E+00 5.9300E+00 0.0000E+00 0.0000E+00 > 2.0010E+03 1.9167E+00 5.0000E+03 2.0000E+00 0.0000E+00 8.3633E+00 0.0000E+00 1.0000E+00 0.0000E+00 8.7011E+00 0.0000E+00 0.0000E+00 > 2.0010E+03 2.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.0092E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.0324E+01 0.0000E+00 0.0000E+00 > 2.0010E+03 2.9375E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1490E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1688E+01 0.0000E+00 0.0000E+00 > 2.0010E+03 3.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.2940E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.3236E+01 0.0000E+00 0.0000E+00 > 2.0010E+03 4.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1267E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1324E+01 0.0000E+00 0.0000E+00 > > The file is reasonably large (> 10^6 lines) and the two line header is repeated periodically in the file. > I need to read this file in as a data frame. Note that the number of columns, the column headers, and the number of replicates of the headers are not known in advance. > > I received a number of replies, many of them quite useful. Of these, one beat out all the others in my benchmarking using files ranging from 10^5 to 10^6 lines. > That version, provided by Jim Holtman, was: > x <- read.table(FILE, as.is = TRUE, skip=1, fill=TRUE, header = TRUE) > x[] <- lapply(x, as.numeric) > x <- x[!is.na(x[,1]), ] > > Other versions involved readLines, following by edits, following by cat (or write) to a temp file, then read.table again. > The overhead with invoking readLines, write/cat, and read.table was substantially larger than the strategy of read.table / as.numeric / indexing > > Thanks for the input from many folks. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, Because x[] keeps the dimensions, unlike just x. Hope this helps, Rui Barradas Em 06-12-2012 16:24, Juliet Hannah escreveu:> All, > > Can someone describe what > > x[] <- lapply(x, as.numeric) > > I see that it is putting the list elements into a data frame. The > results for lapply are a list, so how does this become > a data frame. > > Thanks, > > Juliet > > > On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis <fisher at plessthan.com> wrote: >> Colleagues, >> >> This past week, I asked the following question: >> >> I have a file that looks that this: >> >> TABLE NO. 1 >> PTID TIME AMT FORM PERIOD IPRED CWRES EVID CP PRED RES WRES >> 2.0010E+03 3.9375E-01 5.0000E+03 2.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 >> 2.0010E+03 8.9583E-01 5.0000E+03 2.0000E+00 0.0000E+00 3.3389E+00 0.0000E+00 1.0000E+00 0.0000E+00 3.5321E+00 0.0000E+00 0.0000E+00 >> 2.0010E+03 1.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 5.8164E+00 0.0000E+00 1.0000E+00 0.0000E+00 5.9300E+00 0.0000E+00 0.0000E+00 >> 2.0010E+03 1.9167E+00 5.0000E+03 2.0000E+00 0.0000E+00 8.3633E+00 0.0000E+00 1.0000E+00 0.0000E+00 8.7011E+00 0.0000E+00 0.0000E+00 >> 2.0010E+03 2.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.0092E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.0324E+01 0.0000E+00 0.0000E+00 >> 2.0010E+03 2.9375E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1490E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1688E+01 0.0000E+00 0.0000E+00 >> 2.0010E+03 3.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.2940E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.3236E+01 0.0000E+00 0.0000E+00 >> 2.0010E+03 4.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 1.1267E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1324E+01 0.0000E+00 0.0000E+00 >> >> The file is reasonably large (> 10^6 lines) and the two line header is repeated periodically in the file. >> I need to read this file in as a data frame. Note that the number of columns, the column headers, and the number of replicates of the headers are not known in advance. >> >> I received a number of replies, many of them quite useful. Of these, one beat out all the others in my benchmarking using files ranging from 10^5 to 10^6 lines. >> That version, provided by Jim Holtman, was: >> x <- read.table(FILE, as.is = TRUE, skip=1, fill=TRUE, header = TRUE) >> x[] <- lapply(x, as.numeric) >> x <- x[!is.na(x[,1]), ] >> >> Other versions involved readLines, following by edits, following by cat (or write) to a temp file, then read.table again. >> The overhead with invoking readLines, write/cat, and read.table was substantially larger than the strategy of read.table / as.numeric / indexing >> >> Thanks for the input from many folks. >> >> Dennis >> >> Dennis Fisher MD >> P < (The "P Less Than" Company) >> Phone: 1-866-PLessThan (1-866-753-7784) >> Fax: 1-866-PLessThan (1-866-753-7784) >> www.PLessThan.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks, it does help. Is it possible to elaborate on how specifically why this syntax preserves dimensions. It this correct to just say that even though lapply returns a list, x[] forces x to have the same dimensions? On Thu, Dec 6, 2012 at 11:53 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Because x[] keeps the dimensions, unlike just x. > > Hope this helps, > > Rui Barradas > Em 06-12-2012 16:24, Juliet Hannah escreveu: > >> All, >> >> Can someone describe what >> >> x[] <- lapply(x, as.numeric) >> >> I see that it is putting the list elements into a data frame. The >> results for lapply are a list, so how does this become >> a data frame. >> >> Thanks, >> >> Juliet >> >> >> On Mon, Dec 3, 2012 at 5:49 PM, Fisher Dennis <fisher at plessthan.com> >> wrote: >>> >>> Colleagues, >>> >>> This past week, I asked the following question: >>> >>> I have a file that looks that this: >>> >>> TABLE NO. 1 >>> PTID TIME AMT FORM PERIOD >>> IPRED CWRES EVID CP PRED RES WRES >>> 2.0010E+03 3.9375E-01 5.0000E+03 2.0000E+00 0.0000E+00 >>> 0.0000E+00 0.0000E+00 1.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 8.9583E-01 5.0000E+03 2.0000E+00 0.0000E+00 >>> 3.3389E+00 0.0000E+00 1.0000E+00 0.0000E+00 3.5321E+00 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 1.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 5.8164E+00 0.0000E+00 1.0000E+00 0.0000E+00 5.9300E+00 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 1.9167E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 8.3633E+00 0.0000E+00 1.0000E+00 0.0000E+00 8.7011E+00 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 2.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 1.0092E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.0324E+01 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 2.9375E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 1.1490E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1688E+01 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 3.4167E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 1.2940E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.3236E+01 0.0000E+00 >>> 0.0000E+00 >>> 2.0010E+03 4.4583E+00 5.0000E+03 2.0000E+00 0.0000E+00 >>> 1.1267E+01 0.0000E+00 1.0000E+00 0.0000E+00 1.1324E+01 0.0000E+00 >>> 0.0000E+00 >>> >>> The file is reasonably large (> 10^6 lines) and the two line >>> header is repeated periodically in the file. >>> I need to read this file in as a data frame. Note that the >>> number of columns, the column headers, and the number of replicates of the >>> headers are not known in advance. >>> >>> I received a number of replies, many of them quite useful. Of these, one >>> beat out all the others in my benchmarking using files ranging from 10^5 to >>> 10^6 lines. >>> That version, provided by Jim Holtman, was: >>> x <- read.table(FILE, as.is = TRUE, skip=1, >>> fill=TRUE, header = TRUE) >>> x[] <- lapply(x, as.numeric) >>> x <- x[!is.na(x[,1]), ] >>> >>> Other versions involved readLines, following by edits, following by cat >>> (or write) to a temp file, then read.table again. >>> The overhead with invoking readLines, write/cat, and read.table was >>> substantially larger than the strategy of read.table / as.numeric / indexing >>> >>> Thanks for the input from many folks. >>> >>> Dennis >>> >>> Dennis Fisher MD >>> P < (The "P Less Than" Company) >>> Phone: 1-866-PLessThan (1-866-753-7784) >>> Fax: 1-866-PLessThan (1-866-753-7784) >>> www.PLessThan.com >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >