thr3ads.net - R help - [R] Reading very large text files into R [Sep 2022]

If this information is useful, please help other people find it:
Share via:

Nick Wray

2022-Sep-29 13:54 UTC

[R] Reading very large text files into R

Hello   I may be offending the R purists with this question but it is
linked to R, as will become clear.  I have very large data sets from the UK
Met Office in notepad form.  Unfortunately,  I can?t read them directly
into R because, for some reason, although most lines in the text doc
consist of 15 elements, every so often there is a sixteenth one and R
doesn?t like this and gives me an error message because it has assumed that
every line has 15 elements and doesn?t like finding one with more.  I have
tried playing around with the text document, inserting an extra element
into the top line etc, but to no avail.

Also unfortunately you need access permission from the Met Office to get
the files in question so this link probably won?t work:

https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1

So what I have done is simply to copy and paste the text docs into excel
csv and then read them in, which is time-consuming but works.  However the
later datasets are over the excel limit of 1048576 lines.  I can paste in
the first 1048576 lines but then trying to isolate the remainder of the
text doc to paste it into a second csv doc is proving v difficult ? the
only way I have found is to scroll down by hand and that?s taking ages.  I
cannot find another way of editing the notepad text doc to get rid of the
part which I have already copied and pasted.

Can anyone help with a)ideally being able to simply read the text tables
into R  or b)suggest a way of editing out the bits of the text file I have
already pasted in without laborious scrolling?

Thanks Nick Wray

	[[alternative HTML version deleted]]

Ivan Krylov

2022-Sep-29 14:09 UTC

head link

[R] Reading very large text files into R

? Thu, 29 Sep 2022 14:54:10 +0100
Nick Wray <nickmwray at gmail.com> ?????:
> although most lines in the text doc consist of 15 elements, every so
> often there is a sixteenth one and R doesn?t like this and gives me
> an error message
Does the fill = TRUE argument of read.table() help?

If not, could you construct and share a small file with the same kind
of problem (16th field) but without the data one has to apply for
access to? (E.g. cut out a few lines from the original file, then
replace all digits.)

-- 
Best regards,
Ivan

Ben Tupper

2022-Sep-29 14:12 UTC

head link

[R] Reading very large text files into R

Hi Nick,

It's hard to know without seeing at least a snippet of the data.
Could you do the following and paste the result into a plain text
email?  If you don't set your email client to plain text (from rich
text or html) then we are apt to see a jumble of output on our email
clients.


## start
x <- readLines(filename, n = 20)
cat(x, sep = "\n")
## end

Cheers,
Ben


On Thu, Sep 29, 2022 at 9:54 AM Nick Wray <nickmwray at gmail.com>
wrote:>
> Hello   I may be offending the R purists with this question but it is
> linked to R, as will become clear.  I have very large data sets from the UK
> Met Office in notepad form.  Unfortunately,  I can?t read them directly
> into R because, for some reason, although most lines in the text doc
> consist of 15 elements, every so often there is a sixteenth one and R
> doesn?t like this and gives me an error message because it has assumed that
> every line has 15 elements and doesn?t like finding one with more.  I have
> tried playing around with the text document, inserting an extra element
> into the top line etc, but to no avail.
>
> Also unfortunately you need access permission from the Met Office to get
> the files in question so this link probably won?t work:
>
> https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
>
> So what I have done is simply to copy and paste the text docs into excel
> csv and then read them in, which is time-consuming but works.  However the
> later datasets are over the excel limit of 1048576 lines.  I can paste in
> the first 1048576 lines but then trying to isolate the remainder of the
> text doc to paste it into a second csv doc is proving v difficult ? the
> only way I have found is to scroll down by hand and that?s taking ages.  I
> cannot find another way of editing the notepad text doc to get rid of the
> part which I have already copied and pasted.
>
> Can anyone help with a)ideally being able to simply read the text tables
> into R  or b)suggest a way of editing out the bits of the text file I have
> already pasted in without laborious scrolling?
>
> Thanks Nick Wray
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Ben Tupper (he/him)
Bigelow Laboratory for Ocean Science
East Boothbay, Maine
http://www.bigelow.org/
https://eco.bigelow.org

Jan van der Laan

2022-Sep-29 14:38 UTC

head link

[R] Reading very large text files into R

You're sure the extra column is indeed an extra column? According to the 
documentation 
(https://artefacts.ceda.ac.uk/badc_datadocs/ukmo-midas/RH_Table.html) 
there should be 15 columns.

Could it, for example, be that one of the columns contains records with 
commas?

Jan



On 29-09-2022 15:54, Nick Wray wrote:> Hello   I may be offending the R purists with this question but it is
> linked to R, as will become clear.  I have very large data sets from the UK
> Met Office in notepad form.  Unfortunately,  I can?t read them directly
> into R because, for some reason, although most lines in the text doc
> consist of 15 elements, every so often there is a sixteenth one and R
> doesn?t like this and gives me an error message because it has assumed that
> every line has 15 elements and doesn?t like finding one with more.  I have
> tried playing around with the text document, inserting an extra element
> into the top line etc, but to no avail.
> 
> Also unfortunately you need access permission from the Met Office to get
> the files in question so this link probably won?t work:
> 
> https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
> 
> So what I have done is simply to copy and paste the text docs into excel
> csv and then read them in, which is time-consuming but works.  However the
> later datasets are over the excel limit of 1048576 lines.  I can paste in
> the first 1048576 lines but then trying to isolate the remainder of the
> text doc to paste it into a second csv doc is proving v difficult ? the
> only way I have found is to scroll down by hand and that?s taking ages.  I
> cannot find another way of editing the notepad text doc to get rid of the
> part which I have already copied and pasted.
> 
> Can anyone help with a)ideally being able to simply read the text tables
> into R  or b)suggest a way of editing out the bits of the text file I have
> already pasted in without laborious scrolling?
> 
> Thanks Nick Wray
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2022-Sep-29 15:16 UTC

head link

[R] Reading very large text files into R

I had no trouble reading your text snippet with
read.csv(text "... your text... ")

There were 15 columns. The last column was all empty except for the row
containing the "B".

So there seems to be some confusion here.

-- Bert






On Thu, Sep 29, 2022 at 6:54 AM Nick Wray <nickmwray at gmail.com> wrote:
> Hello   I may be offending the R purists with this question but it is
> linked to R, as will become clear.  I have very large data sets from the UK
> Met Office in notepad form.  Unfortunately,  I can?t read them directly
> into R because, for some reason, although most lines in the text doc
> consist of 15 elements, every so often there is a sixteenth one and R
> doesn?t like this and gives me an error message because it has assumed that
> every line has 15 elements and doesn?t like finding one with more.  I have
> tried playing around with the text document, inserting an extra element
> into the top line etc, but to no avail.
>
> Also unfortunately you need access permission from the Met Office to get
> the files in question so this link probably won?t work:
>
> https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
>
> So what I have done is simply to copy and paste the text docs into excel
> csv and then read them in, which is time-consuming but works.  However the
> later datasets are over the excel limit of 1048576 lines.  I can paste in
> the first 1048576 lines but then trying to isolate the remainder of the
> text doc to paste it into a second csv doc is proving v difficult ? the
> only way I have found is to scroll down by hand and that?s taking ages.  I
> cannot find another way of editing the notepad text doc to get rid of the
> part which I have already copied and pasted.
>
> Can anyone help with a)ideally being able to simply read the text tables
> into R  or b)suggest a way of editing out the bits of the text file I have
> already pasted in without laborious scrolling?
>
> Thanks Nick Wray
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Richard O'Keefe

2022-Sep-30 04:07 UTC

head link

[R] Reading very large text files into R

If I had this problem, in the old days I'd've whipped up
a tiny AWK script.  These days I might use xsv or qsv.
BUT
first I would want to know why these extra fields are
present and what they signify.  Are they good data that
happen not to be described in the documentation?  Do
they represent a defect in the generation process?  What
other discrepancies are there?  If the data *format*
cannot be fully trusted, what does that say about the
data *content*?  Do other data sets from the same source
have the same issue?  Is it possible to compare this
version of the data with an earlier version?

On Fri, 30 Sept 2022 at 02:54, Nick Wray <nickmwray at gmail.com> wrote:
> Hello   I may be offending the R purists with this question but it is
> linked to R, as will become clear.  I have very large data sets from the UK
> Met Office in notepad form.  Unfortunately,  I can?t read them directly
> into R because, for some reason, although most lines in the text doc
> consist of 15 elements, every so often there is a sixteenth one and R
> doesn?t like this and gives me an error message because it has assumed that
> every line has 15 elements and doesn?t like finding one with more.  I have
> tried playing around with the text document, inserting an extra element
> into the top line etc, but to no avail.
>
> Also unfortunately you need access permission from the Met Office to get
> the files in question so this link probably won?t work:
>
> https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
>
> So what I have done is simply to copy and paste the text docs into excel
> csv and then read them in, which is time-consuming but works.  However the
> later datasets are over the excel limit of 1048576 lines.  I can paste in
> the first 1048576 lines but then trying to isolate the remainder of the
> text doc to paste it into a second csv doc is proving v difficult ? the
> only way I have found is to scroll down by hand and that?s taking ages.  I
> cannot find another way of editing the notepad text doc to get rid of the
> part which I have already copied and pasted.
>
> Can anyone help with a)ideally being able to simply read the text tables
> into R  or b)suggest a way of editing out the bits of the text file I have
> already pasted in without laborious scrolling?
>
> Thanks Nick Wray
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Sep 2022 - Reading very large text files into R

[R] Reading very large text files into R

[R] Reading very large text files into R

[R] Reading very large text files into R

[R] Reading very large text files into R

[R] Reading very large text files into R

[R] Reading very large text files into R