Vikram Reddy
2020-Aug-10 16:44 UTC
[R] How to auto generate “anchor links “ and directory path to text search function
I have a tokenized txt document with 'div' tags and 'id' to it : library(quanteda) library(htmltools) library(tidyverse) text <- <div id="4">But how do you do?</div> <div id="5">I see I have frightened you?sit... ?</div> <div id="6">It was in July, 1805, and the speaker..</div> <div id="7">With these words she greeted Prince Vas?li Kur?gin...</div> <div id="8">Anna P?vlovna had had a cough for some days...</div> <div id="9">She was, as she said, suffering from la grippe....</div> <div id="10">Petersburg, used only by the elite.</div> <div id="11">All her invitations without exception, written in French...</div> <div id="12">?If you have nothing better to do, Count (or Prince).. </div> <div id="13">?Heavens!</div> <div id="14">what a virulent attack!?</div> '''' <div id="2107">It was plain that this ?well??</div> I need to auto generate this output to finish it up <a href="C:\Users\John\Desktop\final_tokens.html#div number"> text- sentence </a> Ex- When I search for the word 'good' <a href="C:\Users\John\Desktop\final_tokens.html#49"> Our good and wonderful sovereign has to </a> <a href="C:\Users\John\Desktop\final_tokens.html#73">He is one of the the good ones.</a> <a href="C:\Users\John\Desktop\final_tokens.html#138">She is rich and of good family..</a> the div id number should go beside # as show above. Previously i used make_sentences <- function(word) { grep(word,text,value= TRUE)} above grep worked fine with plain text before but with lot of regex I need to modify it ,to get the anchor links directory path and div number to. is there any solution to this maybe ? [[alternative HTML version deleted]]