Hello, Thank you for your reply. My original data has 3109 FIPS codes. Is there a way to merge only this data into the shapefiles? I hope I am clear. Thank you also for the link, I am trying to do something like this: https://gist.github.com/reubano/1281134. Thanks again! Sincerely, Shouro On Tue, May 5, 2015 at 5:21 PM, Anthony Damico <ajdamico at gmail.com> wrote:> hi, after running each individual line of code above, check that the > object still has the expected number of records and unique county fips > codes. it looks like length( shapes$GEOID ) == 3233 but nrow( merged_data > ) == 3109. the way for you to debug this is for you to go through line by > line after creating each new object :) > > i'm also not sure it's safe to work with gis objects as you're doing, > there are some well-documented examples of working with tiger files here > https://github.com/davidbrae/swmap > > > > On Tue, May 5, 2015 at 11:00 AM, Shouro Dasgupta <shouro at gmail.com> wrote: > >> I am trying to plot data by FIPS code using county shapes files. >> >> library(data.table) >> > library(rgdal) >> > library(colourschemes) >> > library(RColorBrewer) >> > library(maptools) >> > library(maps) >> > library(ggmap) >> >> >> I have data by FIPS code which looks like this: >> > >> > >> > dput(head(max_change)) >> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >> > 5.82369276823497e-06, >> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >> > 5.60128903156804e-06), change = c(-1.47141054005866, -0.904829303986895, >> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >> -1.47141054005866 >> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), class >> > c("data.table", >> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >> > 0x0000000000110788>) >> >> >> I add leading zeroes by: >> >> max_change <- as.data.table(max_change) >> max_change$FIPS <- sprintf("%05d",as.numeric(max_change$FIPS)) >> >> I downloaded shapefiles from here: >> ftp://ftp2.census.gov/geo/tiger/TIGER2014/COUNTY/. >> >> I obtain the FIPS codes from the shapefiles and order them using: >> >> shapes_fips <- shapes$GEOID >> > shapes_fips <- as.data.table(shapes_fips) >> > setnames(shapes_fips, "shapes_fips", "FIPS") >> > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ] >> > shapes_fips$FIPS <- as.character(shapes_fips$FIPS) >> >> >> Then I merge the FIPS codes with my original dataset using: >> >> > >> > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T) >> > merged_data <- as.data.table(merged_data) >> >> >> Which looks like this: >> >> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >> > 5.82369276823497e-06, >> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >> > 5.60128903156804e-06), change = c(-1.47141054005866, -0.904829303986895, >> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >> -1.47141054005866 >> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), sorted >> > "FIPS", class = c("data.table", >> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >> > 0x0000000000110788>) >> >> >> But when I try to merged data back to the SpatialPolygonsDataFrame called >> shapes, I get the following error: >> >> shapes$change <- merged_data$change >> >> Error in `[[<-.data.frame`(`*tmp*`, name, value = c(-1.47141054005866, : >> > replacement has 3109 rows, data has 3233 >> >> >> Apologies for the messy example, what am I doing wrong? Any help will be >> greatly appreciated. Thank you! >> >> Sincerely, >> >> Shouro >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- Shouro Dasgupta PhD Candidate | Department of Economics Ca' Foscari University of Venezia ------------------------------ Junior Researcher | Fondazione Eni Enrico Mattei (FEEM) Isola di San Giorgio Maggiore, 8 | 30124 Venice, Italy Tel: +39 041 2700 436 [[alternative HTML version deleted]]
so check the unique number of fips codes in the objects before and after> merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T)also note that all.X should be all.x and you might want to use FALSE for one or both of those On Tue, May 5, 2015 at 11:40 AM, Shouro Dasgupta <shouro at gmail.com> wrote:> Hello, > > Thank you for your reply. My original data has 3109 FIPS codes. Is there a > way to merge only this data into the shapefiles? I hope I am clear. > > Thank you also for the link, I am trying to do something like this: > https://gist.github.com/reubano/1281134. > > Thanks again! > > Sincerely, > > Shouro > > On Tue, May 5, 2015 at 5:21 PM, Anthony Damico <ajdamico at gmail.com> wrote: > >> hi, after running each individual line of code above, check that the >> object still has the expected number of records and unique county fips >> codes. it looks like length( shapes$GEOID ) == 3233 but nrow( merged_data >> ) == 3109. the way for you to debug this is for you to go through line by >> line after creating each new object :) >> >> i'm also not sure it's safe to work with gis objects as you're doing, >> there are some well-documented examples of working with tiger files here >> https://github.com/davidbrae/swmap >> >> >> >> On Tue, May 5, 2015 at 11:00 AM, Shouro Dasgupta <shouro at gmail.com> >> wrote: >> >>> I am trying to plot data by FIPS code using county shapes files. >>> >>> library(data.table) >>> > library(rgdal) >>> > library(colourschemes) >>> > library(RColorBrewer) >>> > library(maptools) >>> > library(maps) >>> > library(ggmap) >>> >>> >>> I have data by FIPS code which looks like this: >>> > >>> > >>> > dput(head(max_change)) >>> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >>> > 5.82369276823497e-06, >>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >>> > 5.60128903156804e-06), change = c(-1.47141054005866, >>> -0.904829303986895, >>> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >>> -1.47141054005866 >>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), class >>> > c("data.table", >>> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >>> > 0x0000000000110788>) >>> >>> >>> I add leading zeroes by: >>> >>> max_change <- as.data.table(max_change) >>> max_change$FIPS <- sprintf("%05d",as.numeric(max_change$FIPS)) >>> >>> I downloaded shapefiles from here: >>> ftp://ftp2.census.gov/geo/tiger/TIGER2014/COUNTY/. >>> >>> I obtain the FIPS codes from the shapefiles and order them using: >>> >>> shapes_fips <- shapes$GEOID >>> > shapes_fips <- as.data.table(shapes_fips) >>> > setnames(shapes_fips, "shapes_fips", "FIPS") >>> > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ] >>> > shapes_fips$FIPS <- as.character(shapes_fips$FIPS) >>> >>> >>> Then I merge the FIPS codes with my original dataset using: >>> >>> > >>> > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T) >>> > merged_data <- as.data.table(merged_data) >>> >>> >>> Which looks like this: >>> >>> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >>> > 5.82369276823497e-06, >>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >>> > 5.60128903156804e-06), change = c(-1.47141054005866, >>> -0.904829303986895, >>> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >>> -1.47141054005866 >>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), sorted >>> > "FIPS", class = c("data.table", >>> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >>> > 0x0000000000110788>) >>> >>> >>> But when I try to merged data back to the SpatialPolygonsDataFrame called >>> shapes, I get the following error: >>> >>> shapes$change <- merged_data$change >>> >>> Error in `[[<-.data.frame`(`*tmp*`, name, value = c(-1.47141054005866, : >>> > replacement has 3109 rows, data has 3233 >>> >>> >>> Apologies for the messy example, what am I doing wrong? Any help will be >>> greatly appreciated. Thank you! >>> >>> Sincerely, >>> >>> Shouro >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > > -- > > Shouro Dasgupta > PhD Candidate | Department of Economics > Ca' Foscari University of Venezia > > ------------------------------ > > Junior Researcher | Fondazione Eni Enrico Mattei (FEEM) > Isola di San Giorgio Maggiore, 8 | 30124 Venice, Italy > Tel: +39 041 2700 436 > >[[alternative HTML version deleted]]
Joining data the way you're doing it is dangerous, Roger Bivand and others describes a standard way to do this process here: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-td7586839.html And I do an example using US Census data here, using merge(): http://spatialdemography.org/wp-content/uploads/2013/04/9.-Sparks.pdf <http://spatialdemography.org/wp-content/uploads/2013/04/9.-Sparks.pdf> look at page 134 of that pdf. Hope this helps ----- Corey Sparks, PhD Assistant Professor Department of Demography University of Texas at San Antonio 501 West C?sar E. Ch?vez Blvd Monterey Building 2.270C San Antonio, TX 78207 210-458-3166 corey.sparks 'at' utsa.edu coreysparks.weebly.com -- View this message in context: http://r.789695.n4.nabble.com/Plot-by-FIPS-Code-using-Shapefiles-tp4706830p4706840.html Sent from the R help mailing list archive at Nabble.com.
Corey Sparks <corey.sparks <at> utsa.edu> writes:> > Joining data the way you're doing it is dangerous, Roger Bivand and others > describes a standard way to do this process here: >http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-td7586839.html Quite right - the chunks Corey is referring to are: Please do refer to the vignette in the maptools package, and to previous threads which have advised that merge() should not be used, and that the row.names of the data frames be used as ID keys. Typically using match() on the row.names of the two objects will show which are not correctly aligned. and Beware that the data from the objects may be jumbled - never use merge, always use match() on the row.names vectors of the objects to ensure that the key-IDs agree. Jumbled data happens, it is important not to think "shapefile" but to think DBMS with the ID key your way of staying sane. The maptools vignette is at: http://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf or: library(maptools) vignette("combine_maptools") Here I also suspect that you'll find that there are non-unique FIPS in the county polygons file, so may need to go through maptools::unionSpatialPolygons() first. Roger> > And I do an example using US Census data here, using merge(): > http://spatialdemography.org/wp-content/uploads/2013/04/9.-Sparks.pdf > <http://spatialdemography.org/wp-content/uploads/2013/04/9.-Sparks.pdf> > > look at page 134 of that pdf. > > Hope this helps > > ----- > Corey Sparks, PhD > Assistant Professor > Department of Demography > University of Texas at San Antonio > 501 West C?sar E. Ch?vez Blvd > Monterey Building 2.270C > San Antonio, TX 78207 > 210-458-3166 > corey.sparks 'at' utsa.edu > coreysparks.weebly.com > -- > View this message in context:http://r.789695.n4.nabble.com/Plot-by-FIPS-Code-using-Shapefiles-tp4706830p4706840.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help <at> r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Anthony, Thanks again for your reply. The following worked for merge: merged_data <- merge(shapes_fips,max_change,by="FIPS",all.x=T, all.y=F) However, I think I am doing something wrong - as I have 3109 FIPS code in my original data but when I merge with the shapes file SpatialPolygonsDataFrame, its not merging properly, many NA. Is it a good idea to convert the shapefiles into data.frame/data.table for merging and then transform it back to shapefiles? This is what I have been doing: shapes <- readShapePoly("F:/GCM//tl_2014_us_county/tl_2014_us_county.shp")> shapes <- as.data.frame(shapes) > setnames(shapes, "GEOID", "FIPS") >> shapes_fips <- shapes$GEOID > shapes_fips <- as.data.table(shapes_fips) > setnames(shapes_fips, "shapes_fips", "FIPS") > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ] > shapes_fips$FIPS <- as.character(shapes_fips$FIPS) >> merged_data <- merge(shapes_fips,max_change,by="FIPS",all.x=F, all.y=T) > merged_data <- as.data.table(merged_data) > merged_data <- merged_data[with(merged_data, order(FIPS)), ] >> shapes$change <- merged_data$changeThanks again! Sincerely, Shouro On Tue, May 5, 2015 at 6:00 PM, Anthony Damico <ajdamico at gmail.com> wrote:> so check the unique number of fips codes in the objects before and after > > > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, all.y=T) > > also note that all.X should be all.x and you might want to use FALSE for > one or both of those > > > > On Tue, May 5, 2015 at 11:40 AM, Shouro Dasgupta <shouro at gmail.com> wrote: > >> Hello, >> >> Thank you for your reply. My original data has 3109 FIPS codes. Is there >> a way to merge only this data into the shapefiles? I hope I am clear. >> >> Thank you also for the link, I am trying to do something like this: >> https://gist.github.com/reubano/1281134. >> >> Thanks again! >> >> Sincerely, >> >> Shouro >> >> On Tue, May 5, 2015 at 5:21 PM, Anthony Damico <ajdamico at gmail.com> >> wrote: >> >>> hi, after running each individual line of code above, check that the >>> object still has the expected number of records and unique county fips >>> codes. it looks like length( shapes$GEOID ) == 3233 but nrow( merged_data >>> ) == 3109. the way for you to debug this is for you to go through line by >>> line after creating each new object :) >>> >>> i'm also not sure it's safe to work with gis objects as you're doing, >>> there are some well-documented examples of working with tiger files here >>> https://github.com/davidbrae/swmap >>> >>> >>> >>> On Tue, May 5, 2015 at 11:00 AM, Shouro Dasgupta <shouro at gmail.com> >>> wrote: >>> >>>> I am trying to plot data by FIPS code using county shapes files. >>>> >>>> library(data.table) >>>> > library(rgdal) >>>> > library(colourschemes) >>>> > library(RColorBrewer) >>>> > library(maptools) >>>> > library(maps) >>>> > library(ggmap) >>>> >>>> >>>> I have data by FIPS code which looks like this: >>>> > >>>> > >>>> > dput(head(max_change)) >>>> > structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >>>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >>>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >>>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >>>> > 5.82369276823497e-06, >>>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >>>> > 5.60128903156804e-06), change = c(-1.47141054005866, >>>> -0.904829303986895, >>>> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >>>> -1.47141054005866 >>>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), class >>>> > c("data.table", >>>> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >>>> > 0x0000000000110788>) >>>> >>>> >>>> I add leading zeroes by: >>>> >>>> max_change <- as.data.table(max_change) >>>> max_change$FIPS <- sprintf("%05d",as.numeric(max_change$FIPS)) >>>> >>>> I downloaded shapefiles from here: >>>> ftp://ftp2.census.gov/geo/tiger/TIGER2014/COUNTY/. >>>> >>>> I obtain the FIPS codes from the shapefiles and order them using: >>>> >>>> shapes_fips <- shapes$GEOID >>>> > shapes_fips <- as.data.table(shapes_fips) >>>> > setnames(shapes_fips, "shapes_fips", "FIPS") >>>> > shapes_fips <- shapes_fips[with(shapes_fips, order(FIPS)), ] >>>> > shapes_fips$FIPS <- as.character(shapes_fips$FIPS) >>>> >>>> >>>> Then I merge the FIPS codes with my original dataset using: >>>> >>>> > >>>> > merged_data <- merge(shapes_fips,max_change,by="FIPS",all.X=T, >>>> all.y=T) >>>> > merged_data <- as.data.table(merged_data) >>>> >>>> >>>> Which looks like this: >>>> >>>> structure(list(FIPS = c("01001", "01003", "01005", "01007", "01009", >>>> > "01011"), pred_hist = c(5.68493780563595e-06, 5.87686839563543e-06, >>>> > 5.68493780563595e-06, 5.84476370329784e-06, 5.89156133294344e-06, >>>> > 5.68493780563595e-06), pred_sim = c(5.60128903156804e-06, >>>> > 5.82369276823497e-06, >>>> > 5.60128903156804e-06, 5.75205304048323e-06, 5.80322399836766e-06, >>>> > 5.60128903156804e-06), change = c(-1.47141054005866, >>>> -0.904829303986895, >>>> > -1.47141054005866, -1.58621746782168, -1.49938750670105, >>>> -1.47141054005866 >>>> > )), .Names = c("FIPS", "pred_hist", "pred_sim", "change"), sorted >>>> > "FIPS", class = c("data.table", >>>> > "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: >>>> > 0x0000000000110788>) >>>> >>>> >>>> But when I try to merged data back to the SpatialPolygonsDataFrame >>>> called >>>> shapes, I get the following error: >>>> >>>> shapes$change <- merged_data$change >>>> >>>> Error in `[[<-.data.frame`(`*tmp*`, name, value = c(-1.47141054005866, >>>> : >>>> > replacement has 3109 rows, data has 3233 >>>> >>>> >>>> Apologies for the messy example, what am I doing wrong? Any help will >>>> be >>>> greatly appreciated. Thank you! >>>> >>>> Sincerely, >>>> >>>> Shouro >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >> >> >> -- >> >> Shouro Dasgupta >> PhD Candidate | Department of Economics >> Ca' Foscari University of Venezia >> >> ------------------------------ >> >> Junior Researcher | Fondazione Eni Enrico Mattei (FEEM) >> Isola di San Giorgio Maggiore, 8 | 30124 Venice, Italy >> Tel: +39 041 2700 436 >> >> >-- Shouro Dasgupta PhD Candidate | Department of Economics Ca' Foscari University of Venezia ------------------------------ Junior Researcher | Fondazione Eni Enrico Mattei (FEEM) Isola di San Giorgio Maggiore, 8 | 30124 Venice, Italy Tel: +39 041 2700 436 [[alternative HTML version deleted]]