You can use htmlTreeParse and xpathApply from the XML library. something like: xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) xmlValue(x)) should do it. Gamma wrote:> > anyone care to explain how to read a html table, it's streaming data > (updated every second) and i am looking for a suitable function. > > The imported html tables looks like this: > > [1] "<body><html><table>" > [2] "<tr><td>SEQUENCE</td> <td>EXCHANGE</td> <td>BOARD</td> <td>TIME</td> > <td>PAPER</td> <td>BID</td> <td>BID-DEPTH</td> <td>BID-DEPTH-TOTAL</td> > <td>BID-NUMBER</td> <td>OFFER</td> <td>OFFER-DEPTH</td> > <td>OFFER-DEPTH-TOTAL</td> <td>OFFER-NUMBER</td> <td>OPEN</td> > <td>HIGH</td> <td>LOW</td> <td>LAST</td> <td>CHANGE</td> > <td>CHANGE-PERCENT</td> <td>VOLUME</td> <td>VALUE</td> <td>TRADES</td> > <td>STATUS</td></tr>" > [3]"<tr><td>184311995</td><td>ST</td><td></td><td>174336</td><td>SX50PI</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>953.9600</td><td>937.9800</td><td>947.5900</td><td>2.6000</td><td>0.2751</td><td></td><td></td><td></td><td></td></tr>" > and so on to the table closing brackets. > > [15] "</table></html></body>" > > > Tried a few commands but i only get html code back, like above: > readLines(url("")), socketConnection() and url() and nothing seemingly > useful comes up with apropos("html") either. > > > Regards >-- View this message in context: http://www.nabble.com/Read-HTML-table-tf4832010.html#a13825471 Sent from the R help mailing list archive at Nabble.com.
f.jamitzky wrote:> > You can use htmlTreeParse and xpathApply from the XML library. > something like: > > xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) > xmlValue(x)) > > should do it. >Thank you, any further ideas how to transform the result into a matrix, something that R easily could search and find values, i want to use the imported data in various calculations (Rmetrics) and hope to automate the process somewhat. Another thing, the htmlTreeParse takes a while to complete, for a 15 row table it takes about 10-15 seconds, considering i am planning to use this method on multiple (15-20) tables with up to 1000 rows it might not be the ideal solution? -- View this message in context: http://www.nabble.com/Read-HTML-table-tf4832010.html#a13826367 Sent from the R help mailing list archive at Nabble.com.
For fixed numbers of columns you can use data.frame(matrix(data, nrow, ncol)) in order to parse the XML data. htmlTreeParse should be rather quick, but in case it is too slow you could use curl for downloading the data and xmlstarlet for transformation to XML. Then you can use xmlTreeParse or even read.csv to read the file into R. Gamma wrote:> > > f.jamitzky wrote: >> >> You can use htmlTreeParse and xpathApply from the XML library. >> something like: >> >> xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) >> xmlValue(x)) >> >> should do it. >> > > Thank you, any further ideas how to transform the result into a matrix, > something that R easily could search and find values, i want to use the > imported data in various calculations (Rmetrics) and hope to automate the > process somewhat. > > Another thing, the htmlTreeParse takes a while to complete, for a 15 row > table it takes about 10-15 seconds, considering i am planning to use this > method on multiple (15-20) tables with up to 1000 rows it might not be the > ideal solution? >-- View this message in context: http://www.nabble.com/Read-HTML-table-tf4832010.html#a13830637 Sent from the R help mailing list archive at Nabble.com.
f.jamitzky wrote:> > For fixed numbers of columns you can use > > data.frame(matrix(data, nrow, ncol)) > > in order to parse the XML data. > > htmlTreeParse should be rather quick, but in case it is too slow you could > use curl for downloading > the data and xmlstarlet for transformation to XML. Then you can use > xmlTreeParse or even read.csv to read the file into R. >Reading realtime data into R for further computation is a side project, i guess i am just curious if it is possible at all. I know there exist full fledged trading clients coded in Matlab for example. Thank you for helping, much appreciated. -- View this message in context: http://www.nabble.com/Read-HTML-table-tf4832010.html#a13844935 Sent from the R help mailing list archive at Nabble.com.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 theta wrote:> > f.jamitzky wrote: >> You can use htmlTreeParse and xpathApply from the XML library. >> something like: >> >> xpathApply( htmlTreeParse("http://blabla", useInt=T), "//td", function(x) >> xmlValue(x)) >> >> should do it. >> > > Thank you, any further ideas how to transform the result into a matrix, > something that R easily could search and find values, i want to use the > imported data in various calculations (Rmetrics) and hope to automate the > process somewhat. > > Another thing, the htmlTreeParse takes a while to complete, for a 15 row > table it takes about 10-15 seconds, considering i am planning to use this > method on multiple (15-20) tables with up to 1000 rows it might not be the > ideal solution?I doubt the parsing is taking very long at all. On a Linux box running virtually on my Mac, I can parse a 4566 line HTML file in .3 seconds. If you pass a URL rather than a local file, then you have to separate the download time and the parsing time to figure out where the time is consumed. And if you are going to download multiple tables from the same server in rapid succession, then you might want to use some advanced features of HTTP such as persistent connections or multiple interleaved requests. These can all be done via the RCurl package and the results fed to htmlTreeParse(). There is a paper on the RCurl web site that describes some of these advanced features. D. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHQnbB9p/Jzwa2QP4RAhxXAJ4pQz8IEge5UKZ6uwPnPa8qziR2DACffYt8 VRo1CqTGB925amKBNUcOBsI=EHd5 -----END PGP SIGNATURE-----
Possibly Parallel Threads
- Extracting text from html code using the RCurl package.
- Chinese characters encoding problem with XML
- Analyzing Publications from Pubmed via XML
- Webscraping - How to Scrape Out Text Into R As If Copied & Pasted From Webpage?
- Package XML: Parse Garmin *.tcx file problems