Dr Eberhard W Lisse
2023-Dec-29 20:25 UTC
[R] Help request: Parsing docx files for key words and appending to a spreadsheet
I would also look at https://pandoc.org perhaps which can export a number of formats... And for spreadsheets https://github.com/jqnatividad/qsv is my goto weapon. Can also read and write XLSX and others. A sample document or two would always be helpful... el On 29/12/2023 21:01, CALUM POLWART wrote:> It sounded like he looked at officeR but I would agree > > content <- officer::docx_summary("filename.docx") > > Would get the text content into an object called content. > > That object is a data.frame so you can then manipulate it. > To be more specific, we might need an example of the DF[...]>> On Fri, Dec 29, 2023 at 10:14 AM Andy <phaedrusv at gmail.com> >> wrote:[...]>>> I'd like to be able to accomplish the following: >>> >>> (1) Append the title, the month, the author, the number of >>> words, and page number(s) to a spreadsheet >>> >>> (2) Read each article and extract keywords (in the docs, >>> these are listed in 'Subject' section as a list of >>> keywords with a percentage showing the extent to which the >>> keyword features in the article (e.g., FAST FASHION (72%)) >>> and to append the keyword and the % coverage to the same >>> row in the spreadsheet. However, I want to ensure that >>> the keyword coverage meets the threshold of >= 50%; if >>> not, then pass onto the next article in the directory. >>> Rinse and repeat for the entire directory.[...]
Andy
2023-Dec-29 20:37 UTC
[R] Help request: Parsing docx files for key words and appending to a spreadsheet
Thanks - I'll have a look at these options too. I'm happy to send over a sample document, but wasn't aware if attachments are allowed. The documents come Lexis+, so require user credentials to log in, but I could upload the file somewhere if that would help? Any ideas for a good location to do so? On 29/12/2023 20:25, Dr Eberhard W Lisse wrote:> I would also look at https://pandoc.org perhaps which can > export a number of formats... > > And for spreadsheets https://github.com/jqnatividad/qsv is my > goto weapon. Can also read and write XLSX and others. > > A sample document or two would always be helpful... > > el > > On 29/12/2023 21:01, CALUM POLWART wrote: >> It sounded like he looked at officeR but I would agree >> >> content <- officer::docx_summary("filename.docx") >> >> Would get the text content into an object called content. >> >> That object is a data.frame so you can then manipulate it. >> To be more specific, we might need an example of the DF > [...] >>> On Fri, Dec 29, 2023 at 10:14 AM Andy <phaedrusv at gmail.com> >>> wrote: > [...] >>>> I'd like to be able to accomplish the following: >>>> >>>> (1) Append the title, the month, the author, the number of >>>> words, and page number(s) to a spreadsheet >>>> >>>> (2) Read each article and extract keywords (in the docs, >>>> these are listed in 'Subject' section as a list of >>>> keywords with a percentage showing the extent to which the >>>> keyword features in the article (e.g., FAST FASHION (72%)) >>>> and to append the keyword and the % coverage to the same >>>> row in the spreadsheet. However, I want to ensure that >>>> the keyword coverage meets the threshold of >= 50%; if >>>> not, then pass onto the next article in the directory. >>>> Rinse and repeat for the entire directory. > [...] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet