Search!
on rseek.org, the query "modify pdf documents in R" brought up the
staplr
package. A quick web search with the same query brought up the pdftools
package.
These were cursory efforts, so you may well find more. You will have to
determine whether and to what degree any meet your needs.
-- Bert
On Sat, Jun 1, 2024 at 9:16?AM Leo Mada via R-help <r-help at
r-project.org>
wrote:
> Dear R-Users,
>
> Are there any packages that enable the modifications of highlighted areas
> / annotations in pdf documents?
>
> It seems feasible - I have explored some R code (see below). However, I
> would rather avoid to reinvent the wheel.
>
> The problem:
> When highlighting pdf-documents with Microsoft Edge, the bounding box is
> sometimes misplaced, and quite ugly so. It also lacks the ability to draw
> lines or arrows.
>
> On the other hand, I did not get used to Acrobat Reader: it usually
> involves much more effort to add specific highlights. Lines can be drawn,
> but are NOT straight!
>
> Are there tools to change the size/position of highlights?
> Or to add highlights and underline words?
> Changing position/size manually by editing the data in the pdf-document
> is possible. Changing the color is more trickier (somehow possible in
> Microsoft Edger; though the direct approach to rewrite the actual stream is
> better). Maybe there are some tools to do it?
>
> Some R code is below.
>
> Sincerely,
>
> Leonard
> #########
>
> library(zip)
>
> con = file("_some_pdf_.pdf", "rb")
>
> NL = 0
> # - very dirty hack;
> # - assumes Annotations are in the last fragment/chunk;
> while(TRUE) {
> tmp = readBin(con, "raw", 1024*128 + 515);
> if(length(tmp) == 0) break;
> x = tmp;
> # isNL = (x == 10) | (x == 13);
> isNL = (x == 13);
> isNL = isNL & (x[which(isNL) + 1] == 10);
> NL = NL + sum(isNL);
> }
>
> close(con)
>
> idP = which(isNL)
>
> idS = 935; # will vary with pdf and Annotations and ...;
> nLast = 4; # usually 2 chunks
> idx = idP[seq(idS, length.out = nLast)]
>
> # Check: Right position?
> # tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
> # intToUtf8(tmp)
>
> tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
> intToUtf8(tmp$output)
>
> # Output of inflate: an Example
> # "/GS gs .56078434 .87058824 .97647059 rg\n
> # 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h
> f\n"
>
> # Note: /BBox[ 337.298 171.83 364.322 183.836]
>
> The raw pdf data:
>
> 1948 0 obj
> <</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA
1/F
> 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298
> 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322
> 186]/Subtype/Highlight/Type/Annot>>
> endobj
> 1949 0 obj
> <</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType
> 1/Length 86/Matrix[ 1 0 0 1 0
0]/Resources<</ExtGState<</GS<</AIS
> false/BM/Multiply/CA 1/Type/ExtGState/ca
> 1>>>>>>/Subtype/Form/Type/XObject>>stream
> x?E??
> ?0 ??)~ ?? ??? ? P@ ?K?"??t???? j?C? ?T#?B??z?
> W?H?? 9(A? ?
> K????? _ i??mz dR ?
> endstream
> endobj
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]