Hi, In another thread ("PBSmapping and shapefiles") I asked for an easy way to read "shapefiles" and transform them in data that PBSmapping could use. One person is exploring some ways of doing this, but it is possible I'll have to do this "manually". With package "maptools" I am able to extract the information I need from a shapefile but it is formatted like this: [[1]] [,1] [,2] [1,] -55.99805 51.68817 [2,] -56.00222 51.68911 [3,] -56.01694 51.68911 [4,] -56.03781 51.68606 [5,] -56.04639 51.68759 [6,] -56.04637 51.69445 [7,] -56.03777 51.70207 [8,] -56.02301 51.70892 [9,] -56.01317 51.71578 [10,] -56.00330 51.73481 [11,] -55.99805 51.73840 attr(,"pstart") attr(,"pstart")$from [1] 1 attr(,"pstart")$to [1] 11 attr(,"nParts") [1] 1 attr(,"shpID") [1] NA [[2]] [,1] [,2] [1,] -57.76294 50.88770 [2,] -57.76292 50.88693 [3,] -57.76033 50.88163 [4,] -57.75668 50.88091 [5,] -57.75551 50.88169 [6,] -57.75562 50.88550 [7,] -57.75932 50.88775 [8,] -57.76294 50.88770 attr(,"pstart") attr(,"pstart")$from [1] 1 attr(,"pstart")$to [1] 8 attr(,"nParts") [1] 1 attr(,"shpID") [1] NA I do not quite understand the structure of this data object (list of lists I think) but at this point I resorted to printing it on the console and imported that text into Excel for further cleaning, which is easy enough. I'd like to complete the process within R to save time and to circumvent Excel's limit of around 64000 lines. But I have a hard time figuring out how to clean up this text in R. What I need to produce for PBSmapping is a file where each block of coordinates shares one ID number, called PID, and a variable POS indicates the position of each coordinate within a "shape". All other lines must disappear. So the above would become: PID POS X Y 1 1 -55.99805 51.68817 1 2 -56.00222 51.68911 1 3 -56.01694 51.68911 1 4 -56.03781 51.68606 1 5 -56.04639 51.68759 1 6 -56.04637 51.69445 1 7 -56.03777 51.70207 1 8 -56.02301 51.70892 1 9 -56.01317 51.71578 1 10 -56.00330 51.73481 1 11 -55.99805 51.73840 2 1 -57.76294 50.88770 2 2 -57.76292 50.88693 2 3 -57.76033 50.88163 2 4 -57.75668 50.88091 2 5 -57.75551 50.88169 2 6 -57.75562 50.88550 2 7 -57.75932 50.88775 2 8 -57.76294 50.88770 First I imported this text file into R: test <- read.csv2("test file.txt",header=F, sep=";", colClasses = "character") I used sep=";" to insure there would be only one variable in this file, as it contains no ";" To remove lines that do not contain coordinates, I used the fact that longitudes are expressed as negative numbers, so with my very limited knowledge of grep searches, I thought of this, which is probably not the best way to go: a <- rep("-", length(test$V1)) b <- grep(a, test$V1) this gives me a warning ("Warning message: the condition has length > 1 and only the first element will be used in: if (is.na(pattern)) {" but seems to do what I need anyway c <- seq(1, length(test$V1)) d <- c %in% b e <- test$V1[d] Partial victory, now I only have lines that look like [1,] -57.76294 50.88770 But I don't know how to go further: the number in square brackets can be used for variable POS, after removing the square brackets and the comma, but this requires a better knowledge of grep than I have. Furthermore, I don't know how to add a PID (polygon ID) variable, i.e. all lines of a polygon must have the same ID, as in the example above (i.e. each time POS == 1, a new polygon starts and PID needs to be incremented by 1, and PID is kept constant for lines where POS ! 1). Any help will be much appreciated. Sincerely, Denis Chabot
Dear Denis, I don't believe that anyone fielded your question -- my apologies if I missed a response.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot > Sent: Monday, July 25, 2005 9:46 PM > To: R list > Subject: [R] grep help needed > > Hi, > > In another thread ("PBSmapping and shapefiles") I asked for > an easy way to read "shapefiles" and transform them in data > that PBSmapping could use. One person is exploring some ways > of doing this, but it is possible I'll have to do this "manually". > > With package "maptools" I am able to extract the information > I need from a shapefile but it is formatted like this: > > [[1]] > [,1] [,2] > [1,] -55.99805 51.68817 > [2,] -56.00222 51.68911 > [3,] -56.01694 51.68911 > [4,] -56.03781 51.68606 > [5,] -56.04639 51.68759 > [6,] -56.04637 51.69445 > [7,] -56.03777 51.70207 > [8,] -56.02301 51.70892 > [9,] -56.01317 51.71578 > [10,] -56.00330 51.73481 > [11,] -55.99805 51.73840 > attr(,"pstart") > attr(,"pstart")$from > [1] 1 > > attr(,"pstart")$to > [1] 11 > > attr(,"nParts") > [1] 1 > attr(,"shpID") > [1] NA > > [[2]] > [,1] [,2] > [1,] -57.76294 50.88770 > [2,] -57.76292 50.88693 > [3,] -57.76033 50.88163 > [4,] -57.75668 50.88091 > [5,] -57.75551 50.88169 > [6,] -57.75562 50.88550 > [7,] -57.75932 50.88775 > [8,] -57.76294 50.88770 > attr(,"pstart") > attr(,"pstart")$from > [1] 1 > > attr(,"pstart")$to > [1] 8 > > attr(,"nParts") > [1] 1 > attr(,"shpID") > [1] NA > > I do not quite understand the structure of this data object > (list of lists I think)Actually, it looks like a list of matrices, each with some attributes (which, I gather, aren't of interest to you).> but at this point I resorted to > printing it on the console and imported that text into Excel > for further cleaning, which is easy enough. I'd like to > complete the process within R to save time and to circumvent > Excel's limit of around 64000 lines. But I have a hard time > figuring out how to clean up this text in R. >If I understand correctly what you want, this seems a very awkward way to proceed. Why not just extract the matrices from the list, stick on the additional columns that you want, stick the matrices together, name the columns, and then output the data to a file? M1 <- Data[[1]] # assuming that the original list is named Data M2 <- Data[[2]] M1 <- cbind(1, 1:nrow(M1), M1) M2 <- cbind(2, 1:nrow(M2), M2) M <- rbind(M1, M2) colnames(M) <- c("PID", "POS", "X", "Y") write.table(M, "Data.txt", row.names=FALSE, quote=FALSE) It wouldn't be hard to generalize this to any number of matrices and to automate the process. I hope that this helps, John> What I need to produce for PBSmapping is a file where each > block of coordinates shares one ID number, called PID, and a > variable POS indicates the position of each coordinate within > a "shape". All other lines must disappear. So the above would become: > > PID POS X Y > 1 1 -55.99805 51.68817 > 1 2 -56.00222 51.68911 > 1 3 -56.01694 51.68911 > 1 4 -56.03781 51.68606 > 1 5 -56.04639 51.68759 > 1 6 -56.04637 51.69445 > 1 7 -56.03777 51.70207 > 1 8 -56.02301 51.70892 > 1 9 -56.01317 51.71578 > 1 10 -56.00330 51.73481 > 1 11 -55.99805 51.73840 > 2 1 -57.76294 50.88770 > 2 2 -57.76292 50.88693 > 2 3 -57.76033 50.88163 > 2 4 -57.75668 50.88091 > 2 5 -57.75551 50.88169 > 2 6 -57.75562 50.88550 > 2 7 -57.75932 50.88775 > 2 8 -57.76294 50.88770 > > First I imported this text file into R: > test <- read.csv2("test file.txt",header=F, sep=";", colClasses > "character") > > I used sep=";" to insure there would be only one variable in > this file, as it contains no ";" > > To remove lines that do not contain coordinates, I used the > fact that longitudes are expressed as negative numbers, so > with my very limited knowledge of grep searches, I thought of > this, which is probably not the best way to go: > > a <- rep("-", length(test$V1)) > b <- grep(a, test$V1) > > this gives me a warning ("Warning message: > the condition has length > 1 and only the first element will be used > in: if (is.na(pattern)) {" > but seems to do what I need anyway > > c <- seq(1, length(test$V1)) > d <- c %in% b > > e <- test$V1[d] > > Partial victory, now I only have lines that look like [1,] > -57.76294 50.88770 > > But I don't know how to go further: the number in square > brackets can be used for variable POS, after removing the > square brackets and the comma, but this requires a better > knowledge of grep than I have. > Furthermore, I don't know how to add a PID (polygon ID) > variable, i.e. all lines of a polygon must have the same ID, > as in the example above (i.e. each time POS == 1, a new > polygon starts and PID needs to be incremented by 1, and PID > is kept constant for lines where POS ! 1). > > Any help will be much appreciated. > > Sincerely, > > Denis Chabot
Thanks for your help, the proposed solutions were much more elegant than what I was attempting. I adopted a slight modification of Tom Mulholland's solution with a piece from John Fox's solution, but many of you had very similar solutions. require(maptools) nc <- read.shape(system.file("shapes/sids.shp", package = "maptools") [1]) mappolys <- Map2poly(nc, as.character(nc$att.data$FIPSNO)) selected.shapes <- which(nc$att.data$SID74 > 20) # just to make it a smaller example submap <- subset(mappolys, nc$att.data$SID74 > 20) final.data <- NULL for (j in 1:length(selected.shapes)){ temp.verts <- matrix(as.vector(submap[[j]]),ncol = 2) n <- length(temp.verts[,1]) temp.order <- 1:n temp.data <- cbind(rep(j,n),temp.order,temp.verts) final.data <- rbind(final.data,temp.data) } colnames(final.data) <- c("PID", "POS", "X", "Y") final.data my.data <- as.data.frame(final.data) class(my.data) <- c("PolySet", "data.frame") attr(my.data, "projection") <- "LL" meta <- nc[2]$att.data[selected.shapes,] PID <- seq(1,length(submap)) meta.data <- cbind(PID, meta) class(meta.data) <- c("PolyData", "data.frame") attr(meta.data, "projection") <- "LL" It would be nice if a variant of this was incorporated into PBSmapping to make it easier to import data from shapefiles! Thanks again for your help, Denis Chabot Le 05-07-26 ?? 00:48, Mulholland, Tom a ??crit :>> -----Original Message----- >> From: r-help-bounces at stat.math.ethz.ch >> [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Denis Chabot >> Sent: Tuesday, 26 July 2005 10:46 AM >> To: R list >> Subject: [R] grep help needed >> >> >> Hi, >> >> In another thread ("PBSmapping and shapefiles") I asked for an easy >> way to read "shapefiles" and transform them in data that PBSmapping >> could use. One person is exploring some ways of doing this, >> but it is >> possible I'll have to do this "manually". >> >> With package "maptools" I am able to extract the information I need >> from a shapefile but it is formatted like this: >> >> [[1]] >> [,1] [,2] >> [1,] -55.99805 51.68817 >> [2,] -56.00222 51.68911 >> [3,] -56.01694 51.68911 >> [4,] -56.03781 51.68606 >> [5,] -56.04639 51.68759 >> [6,] -56.04637 51.69445 >> [7,] -56.03777 51.70207 >> [8,] -56.02301 51.70892 >> [9,] -56.01317 51.71578 >> [10,] -56.00330 51.73481 >> [11,] -55.99805 51.73840 >> attr(,"pstart") >> attr(,"pstart")$from >> [1] 1 >> >> attr(,"pstart")$to >> [1] 11 >> >> attr(,"nParts") >> [1] 1 >> attr(,"shpID") >> [1] NA >> >> [[2]] >> [,1] [,2] >> [1,] -57.76294 50.88770 >> [2,] -57.76292 50.88693 >> [3,] -57.76033 50.88163 >> [4,] -57.75668 50.88091 >> [5,] -57.75551 50.88169 >> [6,] -57.75562 50.88550 >> [7,] -57.75932 50.88775 >> [8,] -57.76294 50.88770 >> attr(,"pstart") >> attr(,"pstart")$from >> [1] 1 >> >> attr(,"pstart")$to >> [1] 8 >> >> attr(,"nParts") >> [1] 1 >> attr(,"shpID") >> [1] NA >> >> I do not quite understand the structure of this data object (list of >> lists I think) >> but at this point I resorted to printing it on the console and >> imported that text into Excel for further cleaning, which is easy >> enough. I'd like to complete the process within R to save >> time and to >> circumvent Excel's limit of around 64000 lines. But I have a hard >> time figuring out how to clean up this text in R. >> >> What I need to produce for PBSmapping is a file where each block of >> coordinates shares one ID number, called PID, and a variable POS >> indicates the position of each coordinate within a "shape". >> All other >> lines must disappear. So the above would become: >> >> PID POS X Y >> 1 1 -55.99805 51.68817 >> 1 2 -56.00222 51.68911 >> 1 3 -56.01694 51.68911 >> 1 4 -56.03781 51.68606 >> 1 5 -56.04639 51.68759 >> 1 6 -56.04637 51.69445 >> 1 7 -56.03777 51.70207 >> 1 8 -56.02301 51.70892 >> 1 9 -56.01317 51.71578 >> 1 10 -56.00330 51.73481 >> 1 11 -55.99805 51.73840 >> 2 1 -57.76294 50.88770 >> 2 2 -57.76292 50.88693 >> 2 3 -57.76033 50.88163 >> 2 4 -57.75668 50.88091 >> 2 5 -57.75551 50.88169 >> 2 6 -57.75562 50.88550 >> 2 7 -57.75932 50.88775 >> 2 8 -57.76294 50.88770 >> >> First I imported this text file into R: >> test <- read.csv2("test file.txt",header=F, sep=";", colClasses >> "character") >> >> I used sep=";" to insure there would be only one variable in this >> file, as it contains no ";" >> >> To remove lines that do not contain coordinates, I used the >> fact that >> longitudes are expressed as negative numbers, so with my very >> limited >> knowledge of grep searches, I thought of this, which is probably not >> the best way to go: >> >> a <- rep("-", length(test$V1)) >> b <- grep(a, test$V1) >> >> this gives me a warning ("Warning message: >> the condition has length > 1 and only the first element will be used >> in: if (is.na(pattern)) {" >> but seems to do what I need anyway >> >> c <- seq(1, length(test$V1)) >> d <- c %in% b >> >> e <- test$V1[d] >> >> Partial victory, now I only have lines that look like >> [1,] -57.76294 50.88770 >> >> But I don't know how to go further: the number in square >> brackets can >> be used for variable POS, after removing the square brackets and the >> comma, but this requires a better knowledge of grep than I have. >> Furthermore, I don't know how to add a PID (polygon ID) variable, >> i.e. all lines of a polygon must have the same ID, as in the example >> above (i.e. each time POS == 1, a new polygon starts and PID >> needs to >> be incremented by 1, and PID is kept constant for lines where >> POS ! 1). >> >> Any help will be much appreciated. >> >> Sincerely, >> >> Denis Chabot >