jiho
2007-Sep-27 10:52 UTC
[R] Plotting from different data sources on the same plot (with ggplot2)
Hello everyone (and Hadley in particular), I often need to plot data from multiple datasets on the same graph. A common example is when mapping some values: I want to plot the underlying map and then add the points. I currently do it with base graphics, by recording the maximum region in which my map+point will fit, plotting both with these xlim and ylim parameters, adding par (new=T) between plot calls and setting the graphical parameters (to draw axes, titles, to set aspect ratio) by hand. This is not easy nor practical when the plots become more and more complicated. The ggplot book specifies that "[ggplot] makes it easy to combine data from multiple sources". Since I use ggplot2 as much as I can (thanks it's really really great!) I thought I would try producing such a plot with ggplot2. NB: If this is possible/easy with an other plotting package please let me know. I am not looking for something specific to maps but rather for a generic mechanism to throw several pieces of data to a graph and have the plotting routine take care of setting up axes that will fit all data on the same scale. So, now for the ggplot2 part. I have two data sources: the coordinates of the coastlines in a region of interest and the coordinated of sampling stations in a subset of this region. I want to plot the coastline as a line and the stations as points, on the same graph. I can plot them independently easily: p1 = ggplot(coast,aes(x=lon,y=lat)) + geom_path() + coord_equal(ratio=1) p1$aspect.ratio = 1 p2 = ggplot(coords,aes(x=lon,y=lat)) + geom_point() + coord_equal (ratio=1) p2$aspect.ratio = 1 but I cannot find how to combine the two graphs. I suspect this has probably to be done via different layers but I really can't find how. In particular, I would like to know how to deal with the scales: can ggplot take care of plotting the two datasets on the same coordinates system or do I have to manually record the maximal range of x and y and force ggplot to use this on both layers, as I did with base graphics? (of course I would prefer the former ;) ). To test it further with real data, here is my code and data: http://jo.irisson.free.fr/dropbox/test_ggplot2.zip A small additional precision: I would like the two datasets to stay separated. Indeed I could probably combine them and plot everything in one step by clever use of ggplot arguments. However this is just a simple example and I would like to add more in the future (like trajectories at each station, points proportional to some value at each station etc.) so I really want the different data sources to be separated and to produce the plot in several steps, otherwise it will soon become too complicated to manage. Thank you very much in advance for your help. JiHO --- http://jo.irisson.free.fr/
hadley wickham
2007-Sep-30 16:35 UTC
[R] Plotting from different data sources on the same plot (with ggplot2)
Hi JiHO,> The ggplot book specifies that "[ggplot] makes it easy to combine > data from multiple sources". Since I use ggplot2 as much as I can > (thanks it's really really great!) I thought I would try producing > such a plot with ggplot2. > > NB: If this is possible/easy with an other plotting package please > let me know. I am not looking for something specific to maps but > rather for a generic mechanism to throw several pieces of data to a > graph and have the plotting routine take care of setting up axes that > will fit all data on the same scale.I don't think it's easy with any other plotting system (although I'd be happy to be proven wrong), and was one of the motivations for the construction of ggplot.> So, now for the ggplot2 part. I have two data sources: the > coordinates of the coastlines in a region of interest and the > coordinated of sampling stations in a subset of this region. I want > to plot the coastline as a line and the stations as points, on the > same graph. I can plot them independently easily: > > p1 = ggplot(coast,aes(x=lon,y=lat)) + geom_path() + coord_equal(ratio=1) > p1$aspect.ratio = 1 > > p2 = ggplot(coords,aes(x=lon,y=lat)) + geom_point() + coord_equal > (ratio=1) > p2$aspect.ratio = 1There are a few ways you could describe the graph you want. Here's the one that I'd probably choose: ggplot(mapping = aes(x = log, y = lat)) + geom_path(data = coast) + geom_point(data = coords) + coord_equal() We don't define a default dataset in the ggplot call, but instead explicitly define the dataset in each of the layers. By default, ggplot will make sure that all the data is displayed on the plot - i.e. the x and y scales show the union of the ranges over all datasets. Does that make sense? Hadley -- http://had.co.nz/
jiho
2007-Oct-01 05:51 UTC
[R] Plotting from different data sources on the same plot (with ggplot2)
This was meant to be sent on the list: On 2007-September-30 , at 23:12 , jiho wrote:> On 2007-September-30 , at 21:01 , hadley wickham wrote: >>>> [...] >>> As expected there is nothing in the data part of the p object >>>> p$data >>> NULL >>> >>> But there is no data specification either in the layers >>>> p$layers >>> [[1]] >>> geom_path: (colour=black, size=1, linetype=1) + () >>> stat_identity: (...=) + () >>> position_identity: () >>> mapping: () >>> >>> [[2]] >>> geom_point: (shape=19, colour=black, size=2) + () >>> stat_identity: (...=) + () >>> position_identity: () >>> mapping: () >> >> Compare geom_point(data=mtcars) with str(geom_point(data =mtcars)) >> (which throws an error but you should be able to see enough). So the >> layers aren't printing out their dataset if they have one - another >> bug. I'll add it to my todo. > > I see. I did not know the `str` function. very useful. > >>> [...] >>> About the other solution: >>> >>>>> When tinkering a bit more with this I thought that the more >>>>> natural >>>>> and "ggplot" way to do it, IMHO, would be to have a new >>>>> addition (` >>>>> +`) method for the ggplot class and be able to do: >>>>> p = p1 + p2 >>>>> and have p containing both plots, on the same scale (the union >>>>> of the >>>> >>>> You were obviously pretty close to the solution already! - you >>>> just >>>> need to remove the elements that p2 already has in common with >>>> p1 and >>>> just add on the components that are different. >>> >>> I would love to be able to do so because this way I can define >>> custom >>> plot functions that all return me a ggplot object and then combine >>> these at will to get final plots (Ex: one function for the >>> coastline, >>> another for stations coordinates, another one which gets one data >>> value, yet another for bathymetry contours etc etc.). This modular >>> design would be more efficient than to have to predefine all >>> combinations in ad hoc functions (e.g. one function for coast+bathy >>> +stations, another for coast+stations only, another for coast+bathy >>> +stations+data1, another for... you get the point). >>> However I don't see what to add and what to remove from the objects. >>> Specifically, there is only "data" element in the ggplot object >>> while >>> my two objects (p1 and p2) both contain something different in >>> $data. >>> Should I define p$data as a list with p$data[[1]]=p1$data and p$data >>> [[2]]=p2$data? >> >> You can do this already : >> >> sample <- c(geom_point(data = coast), geom_path(data = streams), >> coord_equal()) >> p + sample >> >> I think the thing you are missing is that the elements in ggplot() >> are >> just defaults that can be overridden in the individual layers >> (although the bug above means that isn't working quite right at the >> moment). So just specify the dataset in the layer that you are >> adding. >> >> You can do things like: >> >> p <- ggplot(mapping = aes(x=lat, y = long)) + geom_point() >> # no data so there's nothing to plot: >> p >> >> # add on data >> p %+% coast >> p %+% coords > > That's great! > In fact I think I found exactly what I was looking for. I can just do: > p = ggplot() + coord_equal() > p$aspect.ratio = 1 > to set up the plot, and then add the layers and have ggplot take > care of resizing and laying out everything automagically: > p = p + geom_path(data=coast, mapping=aes(x=lon, y=lat)) > p = p + geom_point(data=coords, mapping=aes(x=lon, y=lat)) > p = p + geom_text(data=coords, mapping=aes(x=lon, y=lat, > label=station)) > etc... > Oh, I love ggplot ;) ! > >> The data is completely independent of the plot specification. >> This is >> very different from the other plotting models in R, so it may take a >> while to get your head around it. > > Yes, indeed. That's a completely new way of thinking (especially > given my MATLAB, Scilab background) but how powerful! I found the > whole "data mapping" concept very elegant but did not grasp all the > flexibility behind it. I wonder how mainstream it can get since so > many people are used to an other graphics paradigm. > > Anyway, I just need to define a new geom_arrow now, to plot wind > velocities arrows at several locations, and I'll be a happy man. Is > there a specific reason why '...' arguments are not passed to grid > functions or is it just to keep the complexity under control? I am > thinking in particular that: > p = ggplot(coords) + geom_segment(mapping=aes(x=lon, y=lat, > xend=lon+0.03 ,yend=lat+-0.02), arrow=arrow(length=unit > (0.1,"inches"))) > would do exactly what I want provided that the 'arrow' argument is > passed on to segmentsGrob which is used in geom_segment.JiHO --- http://jo.irisson.free.fr/
jiho
2007-Oct-01 05:53 UTC
[R] Plotting from different data sources on the same plot (with ggplot2)
This would probably also be interesting to some: On 2007-October-01 , at 00:48 , hadley wickham wrote:>> That's great! >> In fact I think I found exactly what I was looking for. I can just >> do: >> p = ggplot() + coord_equal() >> p$aspect.ratio = 1 >> to set up the plot, and then add the layers and have ggplot take care >> of resizing and laying out everything automagically: >> p = p + geom_path(data=coast, mapping=aes(x=lon, y=lat)) >> p = p + geom_point(data=coords, mapping=aes(x=lon, y=lat)) >> p = p + geom_text(data=coords, mapping=aes(x=lon, y=lat, >> label=station)) >> etc... >> Oh, I love ggplot ;) ! > > Or even less verbosely: > > p = ggplot(mapping = aes(x=lon, y= lat) + coord_equal() > p = p + geom_path(data=coast) > p = p + geom_point(data=coords) > p = p + geom_text(data=coords, aes(label = station)) > >>> The data is completely independent of the plot specification. >>> This is >>> very different from the other plotting models in R, so it may take a >>> while to get your head around it. >> >> Yes, indeed. That's a completely new way of thinking (especially >> given my MATLAB, Scilab background) but how powerful! I found the >> whole "data mapping" concept very elegant but did not grasp all the >> flexibility behind it. I wonder how mainstream it can get since so >> many people are used to an other graphics paradigm. > > It's definitely a big change, but I hope that people will see the > potential benefits and invest some time learning it. I definitely > have a lot to improve on the documentation though! > >> Anyway, I just need to define a new geom_arrow now, to plot wind >> velocities arrows at several locations, and I'll be a happy man. Is >> there a specific reason why '...' arguments are not passed to grid >> functions or is it just to keep the complexity under control? I am >> thinking in particular that: >> p = ggplot(coords) + geom_segment(mapping=aes(x=lon, >> y=lat, xend=lon >> +0.03 ,yend=lat+-0.02), arrow=arrow(length=unit(0.1,"inches"))) >> would do exactly what I want provided that the 'arrow' argument is >> passed on to segmentsGrob which is used in geom_segment. > > In general, ... doesn't get passed on to the underlying grid function > because there isn't a one-to-one mapping from geoms to grobs (take > geom_boxplot for example). However, funnily enough, I have just added > the arrows argument to geom_segment for my sister, so if you let me > know what platform you're on, I can send you an updated version.JiHO --- http://jo.irisson.free.fr/