Hi I try to import html text and I need to split the fields at each <td> or </td> entry How can I succeed? sep = '<td>' doens't yield the right result thanks for hints
Christoph Lehmann wrote:> Hi > I try to import html text and I need to split the fields at each <td> or > </td> entry > > How can I succeed? sep = '<td>' doens't yield the right resultIf it fits pairwise together, use sep=c("<td>", "</td>") if not, you can read the whole lot with readLines and strsplit for both pattern after that, for example. Uwe Ligges> thanks for hints > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
You can import the whole thing and use on it "strsplit" ?strsplit Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre at stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Christoph Lehmann > Sent: lundi 4 avril 2005 16:51 > To: r-help at stat.math.ethz.ch > Subject: [R] scan html: sep = "<td>" > > > Hi > I try to import html text and I need to split the fields at > each <td> or > </td> entry > > How can I succeed? sep = '<td>' doens't yield the right result > > thanks for hints > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Christoph Lehmann wrote:> entry from html: > > <tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td> > 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr> > <tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td> > 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr> > > > > using > left.data<- scan(paste(path, left.file, sep = ""), what = 'character', > sep=c("<td>", "</td>")) > > > yields > > > left.data > [1] " " "tr bgcolor=#9090f0>" "td align=right>" > [4] "b>BM" "/b>" "/td>" > [7] "td> 0.952" "/td>" "td> 0.136" > [10] "/td>" "td> 6.984" "/td>" > [13] "td>0.000000" "/td>" "/tr>" > [16] " " "tr bgcolor=#9090f0>" "td align=right>" > [19] "b>BH" "/b>" "/td>" > [22] "td> 1.338" "/td>" "td> 0.136" > [25] "/td>" "td> 9.821" "/td>" > [28] "td>0.000000" "/td>" "/tr>" > > why doesn't it detect the whole '<tr> as sep? > > > Uwe Ligges wrote: > >> Christoph Lehmann wrote: >> >>> Hi >>> I try to import html text and I need to split the fields at each <td> >>> or </td> entry >>> >>> How can I succeed? sep = '<td>' doens't yield the right result >> >> >> If it fits pairwise together, use >> sep=c("<td>", "</td>")Apologies, one should not send untested code. "sep" must be a character rather than a string containg more than one character. So you may want to try out my second suggestion. Uwe Ligges>> if not, you can read the whole lot with readLines and strsplit for >> both pattern after that, for example. >> >> Uwe Ligges >> >> >> >>> thanks for hints >>> >>> ______________________________________________ >>> R-help at stat.math.ethz.ch mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide! >>> http://www.R-project.org/posting-guide.html >> >> >>