Dear All, Perhaps, what I am asking is impossible, but I am asking it anyway. I have got several pdf files with rows of colored rectangles: red rectangles should be read as 0; green rectangles as 1. No other color exists. Is there some way to have R reading the colored rectangles to a matrix or data frame converting the color of the rectangles to sequences of 01? Thanks in advance, Paul
On Tue, May 12, 2009 at 12:20:56PM +0100, Paul Smith wrote:> > I have got several pdf files with rows of colored rectangles: red > rectangles should be read as 0; green rectangles as 1. No other color > exists. Is there some way to have R reading the colored rectangles to > a matrix or data frame converting the color of the rectangles to > sequences of 01? >I would not do it with R, but.. here's the general approach: 1. Convert the PDF to some raster format at high enough resolution (DPI) without any kind of compression or anti-aliasing 2. Use some image manipulation program to replace red/green with black/white 3. Save the resulting picture in ASCII PBM format 4. Parse the resulting PBM and find 0->1 and 1->0 transitions which will give you rectangle boundaries. 5. You did not specify the kind of rectangles, nor whether rows are of uniform height, so I assume uniform grid. Otherwise, position and size might also be relevant[1], the interpretation is completely up to you. [1] As in, for example, http://educ.queensu.ca/~fmc/october2001/GoldenArt3.gif (Imagine that there are only two colors instead of 4 + black lines)
On Tue, May 12, 2009 at 2:35 PM, Zeljko Vrba <zvrba at ifi.uio.no> wrote:> Aha, so you DON'T have only red and green rectangles in the picture, there is > also white in between, and some pale delimiter lines. ?Nevertheless, both > things ease the job slightly, and the approach I described should work. ?You > can even script it by using ImageMagick and ghostscript. > > If all pictures are of the same size and layout, you could even manually > (or programataically) construct a "mask" of rectangle midpoints, so you > don't have to detect transitions between red/green and white, which will > enormously simplify the job. [Note that the distance between rectangle > centers is uniform in both directions.]I thank again both Zeljko and Baptiste. Meanwhile, I found out a more or less simple way of solving the problem: 1. edited the pdf file with Inkscape, inserting 0 or 1 over the rectangles; 2. saved the pdf file; 3. opened it with Acrobat Reader, and selected, as text, the 0 and 1 inserted; 4. pasted the 0 and 1 into a text editor. And that was done! Paul