Greetings! I am trying plot some data on a map in R. Here's the scenario. I have a variable called probworkinghealthy which contains a predicted probability of employment for every individual in my sample (about 100,000 observations). I have another variable, called a001ter, which contains the subject of residency in the Russian Federation (akin to a US state) for every individual in the sample. I have a shape file with the boundaries of all the subjects, called russia.shp. I can plot boxplots of the probability by Federal Subject using plot(probworkinghealthy ~ a001ter). I can also plot the map using plot(russia.shp) Now, I would like to plot the mean probability of employment (i.e. mean(probworkinghealthy)) on a map of Russia using color coding all the Federal Subjects. Does anyone know how to do something like that? Much appreciated, Aleks Andreev Duke University
Aleksandr Andreev <aleksandr.andreev <at> duke.edu> writes:> > Greetings! > > I am trying plot some data on a map in R. Here's the scenario. > > I have a variable called probworkinghealthy which contains a predicted > probability of employment for every individual in my sample (about > 100,000 observations). > I have another variable, called a001ter, which contains the subject of > residency in the Russian Federation (akin to a US state) for every > individual in the sample. > I have a shape file with the boundaries of all the subjects, called russia.shp. > > I can plot boxplots of the probability by Federal Subject using > plot(probworkinghealthy ~ a001ter). I can also plot the map using > plot(russia.shp)So what you need to do is to aggregate the input data frame for individuals, and assign the summary value, here mean, to the correct shapefile geometries. Probably most of what you need is in the maptools package. There is a question about the IDs used to identify the Federal Subject geometries in the shapefile. Assuming that they are named as in the individual data frame, something like: FS <- readShapePoly("russia.shp", IDvar="a001ter") will associate the geometries in the SpatialPolygonsDataFrame with the correct ID - change the IDvar argument value to suit the shapefile. If you do not have IDs in the shapefile to match the unique values of a001ter, you will need to create them.>From there, create a new data frame with row names matching the unique IDs:agg1 <- tapply(probworkinghealthy, a001ter, mean) agg2 <- data.frame(mean_probworkinghealthy=agg1, row.names=names(agg1)) and check row.names(agg2) and sapply(slot(FS, "polygons"), function(x) slot(x, "ID")) for obvious blunders. Merge using: FS1 <- spCbind(FS, agg2) (usually takes a number of tries to get the matching correct). The reason for the complications with IDs is that it is easy to merge data with geometries in the wrong order, and having to provide positive matching should help prevent this.> > Now, I would like to plot the mean probability of employment (i.e. > mean(probworkinghealthy)) on a map of Russia using color coding all > the Federal Subjects. Does anyone know how to do something like that?Then spplot(FS1, "mean_probworkinghealthy") gives a map with default class intervals and colour palette. I suggest that you also consider assigning a coordinate reference system to the geometries as Russia has a large extent, and the correct interpretation of the geometry coordinates may matter. Roger PS. You could consider following up on the R-sig-geo list, or exploring information in the Spatial Task View, or in the list archives.> > Much appreciated, > > Aleks Andreev > Duke University >
Roger Bivand Roger.Bivand at nhh.no writes:> Merge using: > FS1 <- spCbind(FS, agg2)This call fails, because: Error in spCbind(FS, agg2) : different numbers of rows The reason is because I have data in a001ter for 79 Federal Subjects, but russia.shp contains 193 labeled objects (93 Federal Subjects plus 100 Arctic Ocean Islands and other things). Is there a way I can merge without an equal number of rows, labeling the data for the Arctic islands as missing? Thanks, Aleks Andreev Duke University 2008/3/29, Aleksandr Andreev <aleksandr.andreev at duke.edu>:> Greetings! > > I am trying plot some data on a map in R. Here's the scenario. > > I have a variable called probworkinghealthy which contains a predicted > probability of employment for every individual in my sample (about > 100,000 observations). > I have another variable, called a001ter, which contains the subject of > residency in the Russian Federation (akin to a US state) for every > individual in the sample. > I have a shape file with the boundaries of all the subjects, called russia.shp. > > I can plot boxplots of the probability by Federal Subject using > plot(probworkinghealthy ~ a001ter). I can also plot the map using > plot(russia.shp) > > Now, I would like to plot the mean probability of employment (i.e. > mean(probworkinghealthy)) on a map of Russia using color coding all > the Federal Subjects. Does anyone know how to do something like that? > > Much appreciated, > > Aleks Andreev > Duke University >
Aleksandr Andreev <aleksandr.andreev <at> duke.edu> writes:> > Roger Bivand Roger.Bivand at nhh.no writes: > > > Merge using: > > FS1 <- spCbind(FS, agg2) > > This call fails, because: > Error in spCbind(FS, agg2) : different numbers of rowsThis was exactly why I emphasised care. One way to try to do this is to extract the FS data slot: FSd <- as(FS, "data.frame") and then merge() FSd and agg2, using - untried - something like: FS1d <- merge(FSd, agg2, by="row.names", all=TRUE) and check str(FS1d) for sanity, and possibly eyeball: match(row.names(FS1d), sapply(slot(FS, "polygons"), function(x) slot(x, "ID"))) before moving on to: FS1 <- SpatialPolygonsDataFrame(as(FS, "SpatialPolygons"), data=FSD1) The alternative is to subset FS to remove the geometries for which there is no data, but using merge to insert NAs ought to work too. Roger> > The reason is because I have data in a001ter for 79 Federal Subjects, > but russia.shp contains 193 labeled objects (93 Federal Subjects plus > 100 Arctic Ocean Islands and other things). Is there a way I can merge > without an equal number of rows, labeling the data for the Arctic > islands as missing? > > Thanks, > > Aleks Andreev > Duke University >