thr3ads.net - R help - [R] scan html: sep = "<td>" [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Christoph Lehmann

2005-Apr-04 14:50 UTC

[R] scan html: sep = "<td>"

Hi
I try to import html text and I need to split the fields at each <td> or 
</td> entry

How can I succeed? sep = '<td>' doens't yield the right result

thanks for hints

Uwe Ligges

2005-Apr-04 14:57 UTC

head link

[R] scan html: sep = "<td>"

Christoph Lehmann wrote:
> Hi
> I try to import html text and I need to split the fields at each <td>
or
> </td> entry
> 
> How can I succeed? sep = '<td>' doens't yield the right
result
If it fits pairwise together, use
   sep=c("<td>", "</td>")

if not, you can read the whole lot with readLines and strsplit for both 
pattern after that, for example.

Uwe Ligges


> thanks for hints
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

Eric Lecoutre

2005-Apr-04 15:11 UTC

head link

[R] scan html: sep = "<td>"

You can import the whole thing and use on it "strsplit"

?strsplit

Eric

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre at stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte   

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Christoph Lehmann
> Sent: lundi 4 avril 2005 16:51
> To: r-help at stat.math.ethz.ch
> Subject: [R] scan html: sep = "<td>"
> 
> 
> Hi
> I try to import html text and I need to split the fields at 
> each <td> or 
> </td> entry
> 
> How can I succeed? sep = '<td>' doens't yield the right
result
> 
> thanks for hints
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Uwe Ligges

2005-Apr-04 15:30 UTC

head link

[R] scan html: sep = "<td>"

Christoph Lehmann wrote:
> entry from html:
> 
>   <tr bgcolor=#9090f0><td
align="right"><b>BM</b></td><td>
> 0.952</td><td> 0.136</td><td>
6.984</td><td>0.000000</td></tr>
>   <tr bgcolor=#9090f0><td
align="right"><b>BH</b></td><td>
> 1.338</td><td> 0.136</td><td>
9.821</td><td>0.000000</td></tr>
> 
> 
> 
>  using
> left.data<- scan(paste(path, left.file, sep = ""), what =
'character',
>                sep=c("<td>", "</td>"))
> 
> 
> yields
> 
>  > left.data
>  [1] "  "                  "tr bgcolor=#9090f0>"
"td align=right>"
>  [4] "b>BM"                "/b>"                
"/td>"
>  [7] "td> 0.952"           "/td>"               
"td> 0.136"
> [10] "/td>"                "td> 6.984"          
"/td>"
> [13] "td>0.000000"         "/td>"               
"/tr>"
> [16] "  "                  "tr bgcolor=#9090f0>"
"td align=right>"
> [19] "b>BH"                "/b>"                
"/td>"
> [22] "td> 1.338"           "/td>"               
"td> 0.136"
> [25] "/td>"                "td> 9.821"          
"/td>"
> [28] "td>0.000000"         "/td>"               
"/tr>"
> 
> why doesn't it detect the whole '<tr> as sep?
> 
> 
> Uwe Ligges wrote:
> 
>> Christoph Lehmann wrote:
>>
>>> Hi
>>> I try to import html text and I need to split the fields at each
<td>
>>> or </td> entry
>>>
>>> How can I succeed? sep = '<td>' doens't yield the
right result
>>
>>
>> If it fits pairwise together, use
>>   sep=c("<td>", "</td>")
Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one 
character.

So you may want to try out my second suggestion.

Uwe Ligges




>> if not, you can read the whole lot with readLines and strsplit for 
>> both pattern after that, for example.
>>
>> Uwe Ligges
>>
>>
>>
>>> thanks for hints
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide! 
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>

Possibly Parallel Threads

Search for more reasonably related threads

R help - Apr 2005 - scan html: sep = "<td>"

[R] scan html: sep = "<td>"

[R] scan html: sep = "<td>"

[R] scan html: sep = "<td>"

[R] scan html: sep = "<td>"

Possibly Parallel Threads