?getNodeSet may help
steve
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of eric
Sent: Saturday, May 28, 2011 5:03 PM
To: r-help at r-project.org
Subject: [R] newbie xml parsing question
I am trying to read some data off the zillow site. Newbie to xml, html,
parsing and the xml package. I've been able to load the web page I'm
interested with the following code but I'm not sure of the next step to get
the information I'm interested in into R :
library(XML)
url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown,
PA_rb"
doc <-doc <- htmlTreeParse(url1, isURL=TRUE) doc
I'd like to be able to pull the following information into R
href home details string :
/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-m
ap-bubble-address}
value for Zestimate \ Price: $239,000
Beds : 3
Baths: 1.0
Sqft :1630
I noticed all that information is in "doc". The section of doc where
the
information is contained is shown below. How do I go about extracting this
information and getting it into R for the general case where the address in
url will change ?
LatLong.createFromDegrees(40.187567, -75.125861),
"<div class=\"map-bubble property-bubble\"> <div
class=\"search-result\">
<div class=\"plisting\"> <div
id=\"bubble-photoex-up\" class=\"photoex
hide\"> <div class=\"photoex-photos\"> </div>
<div class=\"mapsViews
hide\">
</div> </div> <div id=\"property-zpid\"
class=\"hide\">9933810</div> <div
id=\"property-home-info\"> <div id=\"pinfo-block\"
class=\"property-info\">
<div class=\"adr\">
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site
-map-bubble-address}\"
236 Arundel Ave, Horsham, PA </div> <ul
class=\"value-info\"> <li
class=\"type-allHomes\">
Zestimate<sup>®</sup>: $239,000 \"#\"
<div id=\"zest-tip-bubble_toggleArea\" class=\"tooltip
hide\"> Close <dl>
<dt>Zestimate</dt> <dd> A
<strong>Zestimate®</strong> home valuation is
Zillow's estimated market value. It is not an appraisal. Use it as a
starting point to determine a home's value. <a
href=\"/wikipages/What-is-a-Zestimate/\"
href=\"#\">Learn more </dd> </dl>
</div> </li> <li
class=\"secondary monthly-payment\"> Mortgage payment: $963/mo
<ul
class=\"carrot view-rates-aftertext\"> <li>
\"/mortgage-rates/#{scid=mor-site-mapbubrates}\" See rates
</li></ul>
</li>
</ul> <ul class=\"attributes\"> <li
class=\"prop-cola\">Beds: 3<br /> Baths:
1.0</li> <li class=\"prop-colb\">Sqft: 1,630<br />
Lot: 21,745</li> </ul>
</div> <ul class=\"has-photo actions clearfix\"> <li
class=\"hinfo ztsa\">
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site
-map-bubble-details}\"
Details </li> <li class=\"mapHome ztsa\"
zpid=\"9933810\"> \"#\" Views
</li> <li class=\"faves ztsa\"> <a
onclick=\"trackLink(this, 'Save',
{ 'events': 'event18', 'eVar4': 'Map Bubble' });
return
favoriteManager.addFavorite(9933810, favoriteManager.doneSaving(this),
event, true);\" class=\"not-saved\"
rel=\"nofollow\">Save </li> </ul>
</div> Close <div
id=\"bubble-photoex-down\" class=\"photoex hide\">
<div
class=\"photoex-photos\"> </div> <div
class=\"mapsViews hide\">
</div>
</div> </div> </div> <div
class=\"bubble-beak\"> </div></div>"
)
--
View this message in context:
http://r.789695.n4.nabble.com/newbie-xml-parsing-question-tp3558067p3558067.
html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.