On Sun, Feb 12, 2012 at 10:35 AM, Bert Gunter <gunter.berton at gene.com>
wrote:> Folks:
>
> Suppose I wish to input a text file with variable length lines and
> possible whitespace as is and then parse the resulting character
> vector in R. Each line of text is terminated with "\n" (newline
> character).
>
> Is there any reason to prefer one or the other of:
>
> scan (filename, what ="a",sep ="\n") ?##or
> readLines(filename)
>
> If it makes a difference, I'm on Windows.
>
> Many thanks for any advice/insight.
It depends on whether we need to retain the information regarding
which elements were on the same line or not. In the first case we
retain that info and in the second case we lose it:
> lapply(readLines(textConnection(text)), function(x) scan(text = x))
Read 2 items
Read 3 items
[[1]]
[1] 1 2
[[2]]
[1] 3 4 5
> text <- "1 2\n3 4 5"
> out <- scan(text = text); out
Read 5 items
[1] 1 2 3 4 5
If we did want to get back the info we lost in the last instance we
need to re-read it:
> num.flds <- count.fields(textConnection(text))
> tapply(out, rep(seq_along(num.flds), num.flds), c)
$`1`
[1] 1 2
$`2`
[1] 3 4 5
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com