Dear all, I have two files, both of similar formats. In column 1 are Latitude values (real numbers, e.g. -179.25), column 2 has Longitude values (also real numbers) and in one of the files, column 3 has Population Density values (integers); there is no column 3 in the other file. However, the main difference between these two files is that one has fewer rows than the other. So what I'm looking to do is, 'pad out' the shorter file, by adding in the rows with those that are 'missing' from the longer file (ie. if a particular coordinate isn't present in the shorter file but is in the 'longer/master' file), and having 'zero' as its Population Density value (column C). This should result in the shorter file becoming the same length as the initially longer file, and with each file having the same coordinate values (latitude and longitude on each line). How would I do this in R? Thanks for any help offered, Steve _________________________________________________________________ The John Lewis Clearance - save up to 50% with FREE delivery
Steve ?merge and the all collection of arguments Peter Alspach> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Steve Murray > Sent: Friday, 18 July 2008 8:43 a.m. > To: r-help at r-project.org > Subject: [R] Matching Up Values > > > Dear all, > > I have two files, both of similar formats. In column 1 are > Latitude values (real numbers, e.g. -179.25), column 2 has > Longitude values (also real numbers) and in one of the files, > column 3 has Population Density values (integers); there is > no column 3 in the other file. > > However, the main difference between these two files is that > one has fewer rows than the other. So what I'm looking to do > is, 'pad out' the shorter file, by adding in the rows with > those that are 'missing' from the longer file (ie. if a > particular coordinate isn't present in the shorter file but > is in the 'longer/master' file), and having 'zero' as its > Population Density value (column C). > > This should result in the shorter file becoming the same > length as the initially longer file, and with each file > having the same coordinate values (latitude and longitude on > each line). > > How would I do this in R? > > Thanks for any help offered, > > Steve > > _________________________________________________________________ > The John Lewis Clearance - save up to 50% with FREE delivery > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail.
On 18/07/2008, at 8:42 AM, Steve Murray wrote: <snip>> So what I'm looking to do is, 'pad out' the shorter file, by adding > in the rows with those that are 'missing' from the longer file (ie. > if a particular coordinate isn't present in the shorter file but is > in the 'longer/master' file), and having 'zero' as its Population > Density value (column C).Aaaaaaarrrrrrghhhhhhhhhh!!! Zero is not the same as a missing value. Surely to gumdrops you should say ``having NA as its Population Density value''. Elsewise you are lying about your data. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
I think the approach is ok. I'm having difficulties though...! I've managed to get 'merge' working (using the 'all' function as suggested), but for some strange reason, the output file produces 12 extra rows! So now the shorter file isn't the same length as the 'master' file, it's now longer! The files are fairly sizeable (~60000 rows) so it's difficult to pin-point manually where it's lost track. Is there an obvious solution to this? I was wondering if the best thing might be to 'de-merge' the now-longer file, so that the surplus rows are removed. Is there a command therefore which will enable me to compare the now-longer file to the master file, so that any coordinate pairs which are present in the longer file but not in the (now-shorter) master file are removed? Thanks again, Steve _________________________________________________________________ Play and win great prizes with Live Search and Kung Fu Panda
Hmm, I'm having a fair few difficulties using 'merge' now. I managed to get it to work successfully before, but in this case I'm trying to shorten (as oppose to lengthen as before) a file in relation to a 'master' file. These are the commands I've been using, followed by the dimensions of the files in question - as you can see, the row numbers of the merged file don't correlate to that of the 'coordinates' file (which is what I'm aiming to get 'merged' equal to):> merge(PopDens.long, coordinates, by=c("Latitude","Longitude"), all = TRUE) -> merged > dim(PopDens.long); dim(coordinates); dim(merged)[1] 67870 3 [1] 67420 2 [1] 69849 3 One thing I tried was swapping the order of the files in the merge command, but this causes 'merged' to have the same number of rows (69849). Something else I tried was to leave out the 'all = TRUE' command, as I'm essentially attempting the shorten the file, but this makes the output file *too* short! (65441 as opposed to the intended 67420). Again, the same applies when the order of the input files are swapped.> merge(PopDens.long, coordinates, by=c("Latitude","Longitude")) -> merged > dim(PopDens.long); dim(coordinates); dim(merged)[1] 67870 3 [1] 67420 2 [1] 65441 3 Am I doing something obviously wrong? I'm pretty certain that 'coordinates' is a subset of 'PopDens.long' - so there should be equal numbers of common values when merged. Is there perhaps a more suitable function I could use, or a way of performing checks to see where I might be going wrong?! Many thanks, Steve _________________________________________________________________ 100?s of Nikon cameras to be won with Live Search