It?s my understanding that docx and xlsx files are zipped containers that have their data in XML files. You should try unzipping one and examining it with a viewer. You may then be able to use pkg:XML. ? David.> On Jul 1, 2016, at 3:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > No, sorry -- all I would do is search. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Jul 1, 2016 at 2:33 PM, John <miaojpm at gmail.com> wrote: >> Yes, I have done some search (e.g., tm, markdown, etc), but I can't find >> this function. >> If you know any package that works for this purpose, that would be quite >> helpful. >> Thanks, >> >> John >> >> 2016-06-28 16:50 GMT-07:00 Bert Gunter <bgunter.4567 at gmail.com>: >>> >>> Did you try searching before posting here? -- e.g. a web search or on >>> rseek.org ? >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Tue, Jun 28, 2016 at 3:53 PM, John <miaojpm at gmail.com> wrote: >>>> Hi, >>>> >>>> From time to time I highlight the word documents with red/blue color >>>> or >>>> italic/bold fonts, and I also add comments to a file. Is there a >>>> package/function to let R extract the italic/bold blue/red words and >>>> comments from a docx/doc file? >>>> >>>> I am aware that there are a few packages reading Word, but don't know >>>> which one is able to do it. >>>> >>>> Thanks, >>>> >>>> John >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I just added `docx_extract_all_cmnts()` (and a cpl other comments-related things) to the dev version of `docxtractr` (https://github.com/hrbrmstr/docxtractr). You can use `devtools::install_github("hrbrmstr/docxtractr")` to install it. There's an example in the help for that function. Give it a go and file detailed issues for other functionality you need. On Fri, Jul 1, 2016 at 11:14 PM, David Winsemius <dwinsemius at comcast.net> wrote:> It?s my understanding that docx and xlsx files are zipped containers that have their data in XML files. You should try unzipping one and examining it with a viewer. You may then be able to use pkg:XML. > > ? > David. > >> On Jul 1, 2016, at 3:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> No, sorry -- all I would do is search. >> >> -- Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Fri, Jul 1, 2016 at 2:33 PM, John <miaojpm at gmail.com> wrote: >>> Yes, I have done some search (e.g., tm, markdown, etc), but I can't find >>> this function. >>> If you know any package that works for this purpose, that would be quite >>> helpful. >>> Thanks, >>> >>> John >>> >>> 2016-06-28 16:50 GMT-07:00 Bert Gunter <bgunter.4567 at gmail.com>: >>>> >>>> Did you try searching before posting here? -- e.g. a web search or on >>>> rseek.org ? >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Tue, Jun 28, 2016 at 3:53 PM, John <miaojpm at gmail.com> wrote: >>>>> Hi, >>>>> >>>>> From time to time I highlight the word documents with red/blue color >>>>> or >>>>> italic/bold fonts, and I also add comments to a file. Is there a >>>>> package/function to let R extract the italic/bold blue/red words and >>>>> comments from a docx/doc file? >>>>> >>>>> I am aware that there are a few packages reading Word, but don't know >>>>> which one is able to do it. >>>>> >>>>> Thanks, >>>>> >>>>> John >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you, David and Bert, for the info. Thank you, Bob, for this excellent function. Allow me to request a feature: You highlighted the following text, and comment "This is the first comment". "Lorem ipsum dolor sit amet, cu sit modus voluptua accommodare, meis disputando voluptatibus eu nec, qui te modo solum delicata. Eam scripta maluisset urbanitas et, numquam disputationi in pri, vis tibique deserunt accusamus ut. Vis movet admodum probatus cu, ex pri ludus possit. Molestiae efficiendi at vix, eu labore elaboraret deterruisset mei, et eos persius nominati." Could you let the function output the above text (with the comments, of course), which you highlighted for comment? Thanks, John 2016-07-02 14:12 GMT-07:00 boB Rudis <bob at rudis.net>:> I just added `docx_extract_all_cmnts()` (and a cpl other > comments-related things) to the dev version of `docxtractr` > (https://github.com/hrbrmstr/docxtractr). You can use > `devtools::install_github("hrbrmstr/docxtractr")` to install it. > There's an example in the help for that function. > > Give it a go and file detailed issues for other functionality you need. > > On Fri, Jul 1, 2016 at 11:14 PM, David Winsemius <dwinsemius at comcast.net> > wrote: > > It?s my understanding that docx and xlsx files are zipped containers > that have their data in XML files. You should try unzipping one and > examining it with a viewer. You may then be able to use pkg:XML. > > > > ? > > David. > > > >> On Jul 1, 2016, at 3:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > >> > >> No, sorry -- all I would do is search. > >> > >> -- Bert > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Fri, Jul 1, 2016 at 2:33 PM, John <miaojpm at gmail.com> wrote: > >>> Yes, I have done some search (e.g., tm, markdown, etc), but I can't > find > >>> this function. > >>> If you know any package that works for this purpose, that would be > quite > >>> helpful. > >>> Thanks, > >>> > >>> John > >>> > >>> 2016-06-28 16:50 GMT-07:00 Bert Gunter <bgunter.4567 at gmail.com>: > >>>> > >>>> Did you try searching before posting here? -- e.g. a web search or on > >>>> rseek.org ? > >>>> > >>>> Cheers, > >>>> Bert > >>>> > >>>> > >>>> Bert Gunter > >>>> > >>>> "The trouble with having an open mind is that people keep coming along > >>>> and sticking things into it." > >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>> > >>>> > >>>> On Tue, Jun 28, 2016 at 3:53 PM, John <miaojpm at gmail.com> wrote: > >>>>> Hi, > >>>>> > >>>>> From time to time I highlight the word documents with red/blue > color > >>>>> or > >>>>> italic/bold fonts, and I also add comments to a file. Is there a > >>>>> package/function to let R extract the italic/bold blue/red words and > >>>>> comments from a docx/doc file? > >>>>> > >>>>> I am aware that there are a few packages reading Word, but don't > know > >>>>> which one is able to do it. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> John > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> ______________________________________________ > >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]