Dear R-Users, Are there any packages that enable the modifications of highlighted areas / annotations in pdf documents? It seems feasible - I have explored some R code (see below). However, I would rather avoid to reinvent the wheel. The problem: When highlighting pdf-documents with Microsoft Edge, the bounding box is sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines or arrows. On the other hand, I did not get used to Acrobat Reader: it usually involves much more effort to add specific highlights. Lines can be drawn, but are NOT straight! Are there tools to change the size/position of highlights? Or to add highlights and underline words? Changing position/size manually by editing the data in the pdf-document is possible. Changing the color is more trickier (somehow possible in Microsoft Edger; though the direct approach to rewrite the actual stream is better). Maybe there are some tools to do it? Some R code is below. Sincerely, Leonard ######### library(zip) con = file("_some_pdf_.pdf", "rb") NL = 0 # - very dirty hack; # - assumes Annotations are in the last fragment/chunk; while(TRUE) { tmp = readBin(con, "raw", 1024*128 + 515); ??????if(length(tmp) == 0) break; ??????x = tmp; ??????# isNL = (x == 10) | (x == 13); ??????isNL = (x == 13); ??????isNL = isNL & (x[which(isNL) + 1] == 10); NL = NL + sum(isNL); } close(con) idP = which(isNL) idS = 935; # will vary with pdf and Annotations and ...; nLast = 4; # usually 2 chunks idx = idP[seq(idS, length.out = nLast)] # Check: Right position? # tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)] # intToUtf8(tmp) tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)]) intToUtf8(tmp$output) # Output of inflate: an Example # "/GS gs .56078434 .87058824 .97647059 rg\n # 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n" # Note: /BBox[ 337.298 171.83 364.322 183.836] The raw pdf data: 1948 0 obj <</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>> endobj 1949 0 obj <</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA 1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream x?E?? ?0 ??)~?? ?????P@?K?"??t?????j?C???T#?B??z? W?H??9(A???? K?????_i??mz dR? endstream endobj [[alternative HTML version deleted]]
Search! on rseek.org, the query "modify pdf documents in R" brought up the staplr package. A quick web search with the same query brought up the pdftools package. These were cursory efforts, so you may well find more. You will have to determine whether and to what degree any meet your needs. -- Bert On Sat, Jun 1, 2024 at 9:16?AM Leo Mada via R-help <r-help at r-project.org> wrote:> Dear R-Users, > > Are there any packages that enable the modifications of highlighted areas > / annotations in pdf documents? > > It seems feasible - I have explored some R code (see below). However, I > would rather avoid to reinvent the wheel. > > The problem: > When highlighting pdf-documents with Microsoft Edge, the bounding box is > sometimes misplaced, and quite ugly so. It also lacks the ability to draw > lines or arrows. > > On the other hand, I did not get used to Acrobat Reader: it usually > involves much more effort to add specific highlights. Lines can be drawn, > but are NOT straight! > > Are there tools to change the size/position of highlights? > Or to add highlights and underline words? > Changing position/size manually by editing the data in the pdf-document > is possible. Changing the color is more trickier (somehow possible in > Microsoft Edger; though the direct approach to rewrite the actual stream is > better). Maybe there are some tools to do it? > > Some R code is below. > > Sincerely, > > Leonard > ######### > > library(zip) > > con = file("_some_pdf_.pdf", "rb") > > NL = 0 > # - very dirty hack; > # - assumes Annotations are in the last fragment/chunk; > while(TRUE) { > tmp = readBin(con, "raw", 1024*128 + 515); > if(length(tmp) == 0) break; > x = tmp; > # isNL = (x == 10) | (x == 13); > isNL = (x == 13); > isNL = isNL & (x[which(isNL) + 1] == 10); > NL = NL + sum(isNL); > } > > close(con) > > idP = which(isNL) > > idS = 935; # will vary with pdf and Annotations and ...; > nLast = 4; # usually 2 chunks > idx = idP[seq(idS, length.out = nLast)] > > # Check: Right position? > # tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)] > # intToUtf8(tmp) > > tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)]) > intToUtf8(tmp$output) > > # Output of inflate: an Example > # "/GS gs .56078434 .87058824 .97647059 rg\n > # 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h > f\n" > > # Note: /BBox[ 337.298 171.83 364.322 183.836] > > The raw pdf data: > > 1948 0 obj > <</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F > 4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 > 174.6 364.322 174.6]/Rect[ 337.298 174.6 364.322 > 186]/Subtype/Highlight/Type/Annot>> > endobj > 1949 0 obj > <</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType > 1/Length 86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS > false/BM/Multiply/CA 1/Type/ExtGState/ca > 1>>>>>>/Subtype/Form/Type/XObject>>stream > x?E?? > ?0 ??)~ ?? ??? ? P@ ?K?"??t???? j?C? ?T#?B??z? > W?H?? 9(A? ? > K????? _ i??mz dR ? > endstream > endobj > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
? Sat, 1 Jun 2024 16:16:23 +0000 Leo Mada via R-help <r-help at r-project.org> ?????:> When highlighting pdf-documents with Microsoft Edge, the bounding box > is sometimes misplaced, and quite ugly so. It also lacks the ability > to draw lines or arrows. > > On the other hand, I did not get used to Acrobat Reader: it usually > involves much more effort to add specific highlights. Lines can be > drawn, but are NOT straight!Sorry for answering a different question, but have you considered using a different PDF viewer + annotation application? Okular <https://okular.kde.org/> is free and available on Windows (including from outside Microsoft store). Its annotation features include all kinds of highlights, arrows and lines, both straight and arbitrarily-shaped, quickly available from the "annotations" panel. -- Best regards, Ivan
Reasonably Related Threads
- R and S-Plus got the different results of principal component analysis from SAS, why?
- another aov results interpretation question
- Gecko engine installation failure
- pdf render inline .... update ''main'' div
- eliminating control characters from formatted data files