thr3ads.net - R help - [R] Reading data from a worksheet on the Internet [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Nilza BARROS

2012-Feb-12 00:49 UTC

[R] Reading data from a worksheet on the Internet

Dear R-users,

I have to read data from a worksheet that is available on the Internet. I
have been doing this by copying the worksheet from the browser.
But I would like to be able to copy the data automatically using the url
command.

But when using  "url" command the result is the source code, I mean, a
html
code.
I see that the data I need is in the source code but before thinking about
reading the data from the html code I wonder if there is a package or
anoher way to extract these data since reading  from the code will demand
many work and it can be not so accurate.

Below one can see the from where I am trying to export the data:

dados<-url("
http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r
")
I am looking forward  any help.

Thanks in advance ,

Nilza Barros

	[[alternative HTML version deleted]]

CIURANA EUGENE (R)

2012-Feb-12 12:44 UTC

head link

[R] [R-sig-DB] Reading data from a worksheet on the Internet

On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote:
> I haveto read data from a worksheet that is available on the Internet.
I>have been doing this by copying the worksheet from the
browser.> But Iwould like to be able to copy the data automatically using the
url>
command.>
> But when using "url" command the result is the source
code, I mean, a html> code.
> I see that the data I need is in the
source code but before thinking about> reading the data from the html
code I wonder if there is a package or> anoher way to extract these
data since reading from the code will demand> many work and it can be
not so accurate.>
> Below one can see the from where I am trying to
export the data:>
>
dadoshttp://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r>")

Hi Nilza,

The URL that you posted points at a document that has
another document within it, in a frame. These files are Excel dumps into
HTML. To view the actual data you need the URIs for each data set. Those
appear at the bottom of the listing, under sc1201_arquivos/sheet001.htm
and sheet002.htm. Your code must fetch these files, not the one at
http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm [1]
which only "wraps" them. Most of what you see on the file that you
linked isn't HTML - it's JavaScript and style information for the data
living on the two separate HTML documents.

You can do this in R using
the RCurl and XML libraries, by pulling the specific files for each data
source. If this is a one-time thing, I'd suggest just coding something
simple that loads the data for each file. If this is something you'll
execute periodically, you'll need a bit more code to extract the
internal data sheets (e.g. the "planhilas" at the bottom), then
extracting the actual data.

Let me know if you want this as a one-time
thing, or as a reusable program. If you don't know how to use RCurl and
XML to parse HTML I'll be happy to help with that too. I'd just like to
know more about the scope of your question.

Cheers,

pr3d

pr3d4t0r at #R, ##java, #awk, #pyton
irc.freeenode.net

--
pr3d4t0r at
#R, ##java, #awk, #pyton
irc.freeenode.net

Links:
------
[1]
http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm

[[alternative HTML version deleted]]

Nilza BARROS

2012-Feb-12 18:24 UTC

head link

[R] [R-sig-DB] Reading data from a worksheet on the Internet

Hi,

I really appreciate your help. I definitively need a reusable program since
I have been asking to  someone to extract these data from the Internet
everyday.  That's the reason why I am trying to do a program to do that
Related to the url I sent, I have just realized that although I had written
 the one related to only worksheet (PLANILHA2) when I copy it to my browse
it is showed the link with both worksheets.


I am going to read about Rcurl and XML libraries but I hope you can help me
too.

Thanks in advance
Nilza Barros


On Sun, Feb 12, 2012 at 10:42 AM, CIURANA EUGENE (R)
<r.user@ciurana.eu>wrote:
> **
>
> On Sat, 11 Feb 2012 22:49:07 -0200, Nilza BARROS wrote:
>
> I have to read data from a worksheet that is available on the Internet. I
> have been doing this by copying the worksheet from the browser.
> But I would like to be able to copy the data automatically using the url
> command.
>
> But when using  "url" command the result is the source code, I
mean, a html
> code.
> I see that the data I need is in the source code but before thinking about
> reading the data from the html code I wonder if there is a package or
> anoher way to extract these data since reading  from the code will demand
> many work and it can be not so accurate.
>
> Below one can see the from where I am trying to export the data:
>
>
dadoshttp://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1201_arquivos/sheet002.htm","r
> ")
>
>
>
> Hi Nilza,
>
> The URL that you posted points at a document that has another document
> within it, in a frame.  These files are Excel dumps into HTML.  To view the
> actual data you need the URIs for each data set.  Those appear at the
> bottom of the listing, under sc1201_arquivos/sheet001.htm and sheet002.htm.
>  Your code must fetch these files, not the one at
> http://www.mar.mil.br/dhn/chm/meteo/prev/dados/pnboia/sc1202.htm which
> only "wraps" them.  Most of what you see on the file that you
linked isn't
> HTML - it's JavaScript and style information for the data living on the
two
> separate HTML documents.
>
> You can do this in R using the RCurl and XML libraries, by pulling the
> specific files for each data source.  If this is a one-time thing, I'd
> suggest just coding something simple that loads the data for each file.  If
> this is something you'll execute periodically, you'll need a bit
more code
> to extract the internal data sheets (e.g. the "planhilas" at the
bottom),
> then extracting the actual data.
>
> Let me know if you want this as a one-time thing, or as a reusable
> program.  If you don't know how to use RCurl and XML to parse HTML
I'll be
> happy to help with that too.  I'd just like to know more about the
scope of
> your question.
>
> Cheers,
>
> pr3d
>
> --
> pr3d4t0r at #R, ##java, #awk, #pytonirc.freeenode.net
>
>

-- 
Abraço,
Nilza Barros

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R help - Feb 2012 - Reading data from a worksheet on the Internet

[R] Reading data from a worksheet on the Internet

[R] [R-sig-DB] Reading data from a worksheet on the Internet

[R] [R-sig-DB] Reading data from a worksheet on the Internet

Maybe Matching Threads