thr3ads.net - R help - [R] Grap Element from Web Page [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Sparks, John James

2013-Aug-14 05:34 UTC

[R] Grap Element from Web Page

Dear R Helpers,

I would like to pull the CIK number from the web page

http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany

If you put this web page into your browser you will see the CIK number in
red on the left side of the page near the top.

When I try the basic
require(scrapeR)
require(XML)
require(RCurl)
doc
<-htmlTreeParse("http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany")
str(doc)

I get a large number of items in the data frame that I don't know how to
interpret.  Both
tables <- readHTMLTable(doc)

and

list<-xmlToList(doc)

result in errors.

Any (positive) guidance would be much appreciated.

--John J. Sparks, Ph.D.

Jeffrey Dick

2013-Aug-14 09:19 UTC

head link

[R] Grap Element from Web Page

Hi,

There are many occurrences of the CIK number in the page source. This pulls
out the first node containing it:

node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" )
>From there you can extract the number. Here's one way to do it.
strsplit(strsplit(unlist(node)[[5]], "CIK=")[[1]][2],
"&type")[[1]][1]

Jeff


On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James <jspark4@uic.edu>
wrote:
> Dear R Helpers,
>
> I would like to pull the CIK number from the web page
>
>
>
http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
>
> If you put this web page into your browser you will see the CIK number in
> red on the left side of the page near the top.
>
> When I try the basic
> require(scrapeR)
> require(XML)
> require(RCurl)
> doc
> <-htmlTreeParse("
>
http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
> ")
> str(doc)
>
> I get a large number of items in the data frame that I don't know how
to
> interpret.  Both
> tables <- readHTMLTable(doc)
>
> and
>
> list<-xmlToList(doc)
>
> result in errors.
>
> Any (positive) guidance would be much appreciated.
>
> --John J. Sparks, Ph.D.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Aug 2013 - Grap Element from Web Page

[R] Grap Element from Web Page

[R] Grap Element from Web Page