On Thu, 29 Sep 2022, Nick Wray writes:
> ---------- Forwarded message ---------
> From: Nick Wray <nickmwray at gmail.com>
> Date: Thu, 29 Sept 2022 at 15:32
> Subject: Re: [R] Reading very large text files into R
> To: Ben Tupper <btupper at bigelow.org>
>
>
> Hi Ben
> Beneath is an example of the text (also in an attachment) and it's the
"B",
> of which there are quite a few scattered throughout the text doc which
> causes the reading in error message (btw I don't need the
"RAIN" column or
> the 1's after it or the last four elements). I have also attached the
> snippet as text file
>
> 1980-01-01 10:00, 225620, RAIN, 1, 1, WAHRAIN, 5091, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 226918, RAIN, 1, 1, WAHRAIN, 5124, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 228562, RAIN, 1, 1, WAHRAIN, 491, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 231581, RAIN, 1, 1, WAHRAIN, 5213, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 232671, RAIN, 1, 1, WAHRAIN, 487, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 232913, RAIN, 1, 1, WAHRAIN, 5243, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 234362, RAIN, 1, 1, WAHRAIN, 5265, 1001, 0, , 10009, 0, ,
> , B
> 1980-01-01 10:00, 234682, RAIN, 1, 1, WAHRAIN, 5271, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 235389, RAIN, 1, 1, WAHRAIN, 5279, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 236466, RAIN, 1, 1, WAHRAIN, 497, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 243350, RAIN, 1, 1, SREW, 484, 1001, 0, , 9, 0, , ,
> 1980-01-01 10:00, 243350, RAIN, 1, 1, WAHRAIN, 484, 1001, 0, 0, 9, 9, , ,
>
> Thanks Nick
>
> On Thu, 29 Sept 2022 at 15:12, Ben Tupper <btupper at bigelow.org>
wrote:
>
>> Hi Nick,
>>
>> It's hard to know without seeing at least a snippet of the data.
>> Could you do the following and paste the result into a plain text
>> email? If you don't set your email client to plain text (from rich
>> text or html) then we are apt to see a jumble of output on our email
>> clients.
>>
>>
>> ## start
>> x <- readLines(filename, n = 20)
>> cat(x, sep = "\n")
>> ## end
>>
>> Cheers,
>> Ben
>>
>>
>> On Thu, Sep 29, 2022 at 9:54 AM Nick Wray <nickmwray at
gmail.com> wrote:
>> >
>> > Hello I may be offending the R purists with this question but it
is
>> > linked to R, as will become clear. I have very large data sets
from the
>> UK
>> > Met Office in notepad form. Unfortunately, I can?t read them
directly
>> > into R because, for some reason, although most lines in the text
doc
>> > consist of 15 elements, every so often there is a sixteenth one
and R
>> > doesn?t like this and gives me an error message because it has
assumed
>> that
>> > every line has 15 elements and doesn?t like finding one with more.
I
>> have
>> > tried playing around with the text document, inserting an extra
element
>> > into the top line etc, but to no avail.
>> >
>> > Also unfortunately you need access permission from the Met Office
to get
>> > the files in question so this link probably won?t work:
>> >
>> > https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
>> >
>> > So what I have done is simply to copy and paste the text docs into
excel
>> > csv and then read them in, which is time-consuming but works.
However
>> the
>> > later datasets are over the excel limit of 1048576 lines. I can
paste in
>> > the first 1048576 lines but then trying to isolate the remainder
of the
>> > text doc to paste it into a second csv doc is proving v difficult
? the
>> > only way I have found is to scroll down by hand and that?s taking
ages.
>> I
>> > cannot find another way of editing the notepad text doc to get rid
of the
>> > part which I have already copied and pasted.
>> >
>> > Can anyone help with a)ideally being able to simply read the text
tables
>> > into R or b)suggest a way of editing out the bits of the text
file I
>> have
>> > already pasted in without laborious scrolling?
>> >
>> > Thanks Nick Wray
>> >
[...]
>>
>> --
>> Ben Tupper (he/him)
>> Bigelow Laboratory for Ocean Science
>> East Boothbay, Maine
>> http://www.bigelow.org/
>> https://eco.bigelow.org
>>
>
Maybe I have missed it, but could you please show how
you tried to read the table?
When I use your file with
read.table("sample text.txt", header = FALSE, sep = ",")
I get
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
V12 V13 V14 V15
## 1 1980-01-01 10:00 225620 RAIN 1 1 WAHRAIN 5091 1001 0 NA 9
0 NA NA
## 2 1980-01-01 10:00 226918 RAIN 1 1 WAHRAIN 5124 1001 0 NA 9
0 NA NA
## ## .....
## 7 1980-01-01 10:00 234362 RAIN 1 1 WAHRAIN 5265 1001 0 NA 10009
0 NA NA B
## 8 1980-01-01 10:00 234682 RAIN 1 1 WAHRAIN 5271 1001 0 NA 9
0 NA NA
--
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net