thr3ads.net - R help - [R] removing non-table lines [Sep 2022]

If this information is useful, please help other people find it:
Share via:

Nick Wray

2022-Sep-18 19:39 UTC

[R] removing non-table lines

Hello - I am having to download lots of rainfall and temperature data in
csv form from the UK Met Office.  The data isn't a problem - it's in
nice
columns and can be read into R easily - the problem is that in each csv
there are 60 or so lines of information first which are not part of the
columnar data.  If I read the whole csv into R the column data is now
longer in columns but in some disorganised form - if I manually delete all
the text lines above and download I get a nice neat data table.  As the
text lines can't be identified in R by line numbers etc I can't find a
way
of deleting them in R and atm have to do it by hand which is slow.  It
might be possible to write a complicated and dirty algorithm to rearrange
the meteorological data back into columns but I suspect that it might be
hard to get right and consistent across every csv sheet and any errors
might be hard to spot.   I can't find anything on the net about this - has
anyone else had to deal with this problem and if so do they have any
solutions using R?
Thanks Nick Wray

	[[alternative HTML version deleted]]

CALUM POLWART

2022-Sep-18 19:45 UTC

head link

[R] removing non-table lines

Can you provide a sample of say the first 3 rows then the last 2 rows
before the CSV starts.

Are there always the same number of lines at the top? Or can it vary
depending what non-sense the Met Office decided to contaminate it with?

This should be solvable with some sample data.

Base R or Tidyverse? Any limitations on packages (e.g. stringr?)

On Sun, 18 Sep 2022, 20:40 Nick Wray, <nickmwray at gmail.com> wrote:
> Hello - I am having to download lots of rainfall and temperature data in
> csv form from the UK Met Office.  The data isn't a problem - it's
in nice
> columns and can be read into R easily - the problem is that in each csv
> there are 60 or so lines of information first which are not part of the
> columnar data.  If I read the whole csv into R the column data is now
> longer in columns but in some disorganised form - if I manually delete all
> the text lines above and download I get a nice neat data table.  As the
> text lines can't be identified in R by line numbers etc I can't
find a way
> of deleting them in R and atm have to do it by hand which is slow.  It
> might be possible to write a complicated and dirty algorithm to rearrange
> the meteorological data back into columns but I suspect that it might be
> hard to get right and consistent across every csv sheet and any errors
> might be hard to spot.   I can't find anything on the net about this -
has
> anyone else had to deal with this problem and if so do they have any
> solutions using R?
> Thanks Nick Wray
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2022-Sep-18 21:31 UTC

head link

[R] removing non-table lines

Use the skip parameter if the number of header lines is always the same.

On September 18, 2022 12:39:50 PM PDT, Nick Wray <nickmwray at gmail.com>
wrote:>Hello - I am having to download lots of rainfall and temperature data in
>csv form from the UK Met Office.  The data isn't a problem - it's in
nice
>columns and can be read into R easily - the problem is that in each csv
>there are 60 or so lines of information first which are not part of the
>columnar data.  If I read the whole csv into R the column data is now
>longer in columns but in some disorganised form - if I manually delete all
>the text lines above and download I get a nice neat data table.  As the
>text lines can't be identified in R by line numbers etc I can't find
a way
>of deleting them in R and atm have to do it by hand which is slow.  It
>might be possible to write a complicated and dirty algorithm to rearrange
>the meteorological data back into columns but I suspect that it might be
>hard to get right and consistent across every csv sheet and any errors
>might be hard to spot.   I can't find anything on the net about this -
has
>anyone else had to deal with this problem and if so do they have any
>solutions using R?
>Thanks Nick Wray
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Rui Barradas

2022-Sep-18 22:18 UTC

head link

[R] removing non-table lines

Helo,

Unfortunatelly there are many files with a non tabular data section 
followed by the data. R's read.table has a skip argument:

skip	
integer: the number of lines of the data file to skip before beginning 
to read data.

If you do not know how many lines to skip because it's not always the 
same number, here are some ideas.

Is there a pattern in the initial section? Maybe a end-of-section line 
or maybe the text lines come in a specified order and a last line in 
that order can be detected with a regex.

Is there a pattern in the tables' column headers? Once again a regex 
might be the solution.

Is the number of initial lines variable because there are file versions? 
If there are, did the versions evolve over time, a frequent case?

What you describe is not unfrequent, it's always a nuisance and error 
prone but it should be solvable once patterns are found. Inspect a small 
number of files with a text editor and try to find both common points 
and differences. That's half way to a solution.

Hope this helps,

Rui Barradas

?s 20:39 de 18/09/2022, Nick Wray escreveu:> Hello - I am having to download lots of rainfall and temperature data in
> csv form from the UK Met Office.  The data isn't a problem - it's
in nice
> columns and can be read into R easily - the problem is that in each csv
> there are 60 or so lines of information first which are not part of the
> columnar data.  If I read the whole csv into R the column data is now
> longer in columns but in some disorganised form - if I manually delete all
> the text lines above and download I get a nice neat data table.  As the
> text lines can't be identified in R by line numbers etc I can't
find a way
> of deleting them in R and atm have to do it by hand which is slow.  It
> might be possible to write a complicated and dirty algorithm to rearrange
> the meteorological data back into columns but I suspect that it might be
> hard to get right and consistent across every csv sheet and any errors
> might be hard to spot.   I can't find anything on the net about this -
has
> anyone else had to deal with this problem and if so do they have any
> solutions using R?
> Thanks Nick Wray
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2022 - removing non-table lines

[R] removing non-table lines

[R] removing non-table lines

[R] removing non-table lines

[R] removing non-table lines