Vikram Reddy
2020-Aug-10 16:44 UTC
[R] How to auto generate “anchor links “ and directory path to text search function
I have a tokenized txt document with 'div' tags and 'id' to it :
library(quanteda)
library(htmltools)
library(tidyverse)
text <- <div id="4">But how do you do?</div>
<div id="5">I see I have frightened you?sit...
?</div>
<div id="6">It was in July, 1805, and the
speaker..</div>
<div id="7">With these words she greeted Prince
Vas?li
Kur?gin...</div>
<div id="8">Anna P?vlovna had had a cough for some
days...</div>
<div id="9">She was, as she said, suffering from la
grippe....</div>
<div id="10">Petersburg, used only by the
elite.</div>
<div id="11">All her invitations without exception,
written in
French...</div>
<div id="12">?If you have nothing better to do,
Count (or
Prince).. </div>
<div id="13">?Heavens!</div>
<div id="14">what a virulent attack!?</div>
''''
<div id="2107">It was plain that this
?well??</div>
I need to auto generate this output to finish it up
<a href="C:\Users\John\Desktop\final_tokens.html#div
number"> text-
sentence </a>
Ex- When I search for the word 'good'
<a href="C:\Users\John\Desktop\final_tokens.html#49"> Our
good and
wonderful sovereign has to </a>
<a href="C:\Users\John\Desktop\final_tokens.html#73">He is
one of the
the good ones.</a>
<a href="C:\Users\John\Desktop\final_tokens.html#138">She is
rich and
of good family..</a>
the div id number should go beside # as show above.
Previously i used
make_sentences <- function(word) {
grep(word,text,value= TRUE)}
above grep worked fine with plain text before but with lot of regex I need
to modify it ,to get the anchor links directory path and div number to. is
there any solution to this maybe ?
[[alternative HTML version deleted]]