This may be an inappropriate forum for this question. If so, please point me in a better direction. A current project includes scatter plots with thousands of points. Saved as PDF files they display slowly using a pdf viewer or when included in the PDF output of a LaTeX document. Is there a process by which these plots can be 'thinned' so they show the same overall patterns but with fewer points so they display more quickly? Rasterizing them to .jpg files using 'convert' allows them to load immediately, but the bit-mapped resolution is, of course, much lower than the vector PDF format. Rich
1. Plot a random sample of the points (e.g. of rows of matrix/dataframe containing "x" and "y" columns 2. See the hexbin package 3. Check out the graphics taskview on cran: https://cran.r-project.org/web/views/Graphics.html (though it may be somewhat dated by now) 4. Internet search: e.g. on "display scatterplots with thousands of points" typical hit: https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points 5. Search/Post on stats.stackexchange.com instead. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 3, 2018 at 10:45 AM Rich Shepard <rshepard at appl-ecosys.com> wrote:> This may be an inappropriate forum for this question. If so, please > point > me in a better direction. > > A current project includes scatter plots with thousands of points. Saved > as PDF files they display slowly using a pdf viewer or when included in the > PDF output of a LaTeX document. > > Is there a process by which these plots can be 'thinned' so they show > the > same overall patterns but with fewer points so they display more quickly? > > Rasterizing them to .jpg files using 'convert' allows them to load > immediately, but the bit-mapped resolution is, of course, much lower than > the vector PDF format. > > Rich > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Mon, 3 Sep 2018, Bert Gunter wrote:> 1. Plot a random sample of the points (e.g. of rows of matrix/dataframe > containing "x" and "y" columns > > 2. See the hexbin package > > 3. Check out the graphics taskview on cran: > https://cran.r-project.org/web/views/Graphics.html > (though it may be somewhat dated by now) > > 4. Internet search: e.g. on "display scatterplots with thousands of > points" > typical hit: > https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points > > 5. Search/Post on stats.stackexchange.com instead.Bert, I did a web search without finding useful information. Probably not the best search terms. Will implement your suggestions. Thanks, Rich
If the plot is being displayed on a monitor, it is being bitmapped to the resolution of the display device regardless of how you save it. Most computer monitors are about 100dpi. If the problem is that the points are overprinting, Bert's suggestion to use hexbin() is the way to go. If the points are not substantially overprinting, you could just save the plot in raster format using an lzh compressed tif() or png() to the maximum likely resolution of the display device (take zooming into account by going up to 600dpi or 1200dpi, for example). Don't use jpg since it is lossy and you will get halos when you zoom in. You can always preserve a vector version for publication. If you have Adobe Acrobat (not Reader), you can Save As Other | Image | tiff (or png) and set the resolution before exporting. ---------------------------- David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rich Shepard Sent: Monday, September 3, 2018 12:45 PM To: r-help at r-project.org Subject: [R] Display time of PDF plots This may be an inappropriate forum for this question. If so, please point me in a better direction. A current project includes scatter plots with thousands of points. Saved as PDF files they display slowly using a pdf viewer or when included in the PDF output of a LaTeX document. Is there a process by which these plots can be 'thinned' so they show the same overall patterns but with fewer points so they display more quickly? Rasterizing them to .jpg files using 'convert' allows them to load immediately, but the bit-mapped resolution is, of course, much lower than the vector PDF format. Rich ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 3 Sep 2018, David L Carlson wrote:> If the plot is being displayed on a monitor, it is being bitmapped to the > resolution of the display device regardless of how you save it. Most > computer monitors are about 100dpi.David, I'm looking at the report on the monitor. I suspect that most readers will, too. But, some will print it.> If the problem is that the points are overprinting, Bert's suggestion to > use hexbin() is the way to go.Most look like overprints, but at the top there are discrete print characters.> If the points are not substantially overprinting, you could just save the > plot in raster format using an lzh compressed tif() or png() to the > maximum likely resolution of the display device (take zooming into account > by going up to 600dpi or 1200dpi, for example). Don't use jpg since it is > lossy and you will get halos when you zoom in.I used convert to produce .png images but, of course, bit-maps of plots and text are less sharp than are vector images.> You can always preserve a vector version for publication. If you have > Adobe Acrobat (not Reader), you can Save As Other | Image | tiff (or png) > and set the resolution before exporting.'convert', the ImageMagick tool, does this, too. Thanks, Rich
Hi Another option is to just rasterize the points (but leave the rest of the plot vector). See ... https://www.stat.auckland.ac.nz/~paul/Reports/rasterize/rasterize.html Paul On 04/09/18 06:20, Bert Gunter wrote:> 1. Plot a random sample of the points (e.g. of rows of matrix/dataframe > containing "x" and "y" columns > > 2. See the hexbin package > > 3. Check out the graphics taskview on cran: > https://cran.r-project.org/web/views/Graphics.html > (though it may be somewhat dated by now) > > 4. Internet search: e.g. on "display scatterplots with thousands of > points" > typical hit: > https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points > > 5. Search/Post on stats.stackexchange.com instead. > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Sep 3, 2018 at 10:45 AM Rich Shepard <rshepard at appl-ecosys.com> > wrote: > >> This may be an inappropriate forum for this question. If so, please >> point >> me in a better direction. >> >> A current project includes scatter plots with thousands of points. Saved >> as PDF files they display slowly using a pdf viewer or when included in the >> PDF output of a LaTeX document. >> >> Is there a process by which these plots can be 'thinned' so they show >> the >> same overall patterns but with fewer points so they display more quickly? >> >> Rasterizing them to .jpg files using 'convert' allows them to load >> immediately, but the bit-mapped resolution is, of course, much lower than >> the vector PDF format. >> >> Rich >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Mon, 3 Sep 2018, Rich Shepard wrote:> Is there a process by which these plots can be 'thinned' so they show the > same overall patterns but with fewer points so they display more quickly?Bert/Paul/David/John: Thanks very much for the suggestions. I think an appropriate way to illustrate the patterns is to plot the median and maximum for each month (for all sites). That's the important information and plotting each daily point over 13 years obscures that information. The dataframe is structured this way: str(rainfall) 'data.frame': 113569 obs. of 6 variables: $ name : chr "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" ... $ easting : num 2370575 2370575 2370575 2370575 2370575 ... $ northing: num 199338 199338 199338 199338 199338 ... $ elev : num 228 228 228 228 228 228 228 228 228 228 ... $ sampdate: Date, format: "2005-01-01" "2005-01-02" ... $ prcp : num 0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ... There are probably multiple ways of extracting the monthly median and maximum 'prcp' and I don't know how to identify the appropriate one. Is there a task view for this type of data manipulation? I've not before done anything like this and would appreciate a pointer to where I start to learn. Regards, Rich
(this is somewhat a change of subject from the original question) Rich, there functions such as aggregate() in base R. There are also many options in CRAN packages. But I tend to have difficulty getting them to do exactly what I want, and usually end up rolling my own. The idea is to split the data into groups by station and month, then calculate summary stats for each group, then recombine into a new data frame. ## untested with your data, but this kind of approach works well for me ## note that this code assumes easting, northing, and elevation are in fact unique within each group ## if they are not, you will get an ERROR ## add a 'month' variable raindf <- rainfall raindf$mon <- format(raindf$sampdate,'%Y-%m') mysum <- function(df) { data.frame( name=unique(df$name), easting=unique(df$easting), northing=unique(df$northing), elev=unique(df$elev), mon=unique(df$mon), pr.med=median(df$prcp), pr.max=max(df$prcp) ) } tmpdf <- split(raindf, paste(raindf$name, raindf$mon) ) ## at this point, you can check your summary stats function with, for example, mysum(tmpdf[[1]]) mysum(tmpdf[[2]]) ## when satisfied with mysum(), do this tmpsum <- lapply(tmpdf, mysum) ## recombine rain.by.mon <- do.call(rbind, tmpsum) ## might still want to create a numeric month to facilitate plotting ## or maybe assign each month to the first of the month, or the 15th, or end or whatever makes sense rain.by.mon$mondt <- as.Date(paste0(rain.by.mon$mon,'-1')) -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 ?On 9/4/18, 9:41 AM, "R-help on behalf of Rich Shepard" <r-help-bounces at r-project.org on behalf of rshepard at appl-ecosys.com> wrote: On Mon, 3 Sep 2018, Rich Shepard wrote: > Is there a process by which these plots can be 'thinned' so they show the > same overall patterns but with fewer points so they display more quickly? Bert/Paul/David/John: Thanks very much for the suggestions. I think an appropriate way to illustrate the patterns is to plot the median and maximum for each month (for all sites). That's the important information and plotting each daily point over 13 years obscures that information. The dataframe is structured this way: str(rainfall) 'data.frame': 113569 obs. of 6 variables: $ name : chr "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" ... $ easting : num 2370575 2370575 2370575 2370575 2370575 ... $ northing: num 199338 199338 199338 199338 199338 ... $ elev : num 228 228 228 228 228 228 228 228 228 228 ... $ sampdate: Date, format: "2005-01-01" "2005-01-02" ... $ prcp : num 0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ... There are probably multiple ways of extracting the monthly median and maximum 'prcp' and I don't know how to identify the appropriate one. Is there a task view for this type of data manipulation? I've not before done anything like this and would appreciate a pointer to where I start to learn. Regards, Rich ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.