Steve Murray
2009-Aug-14 17:48 UTC
[R] Assigning values based on a separate reference (lookup) table
Dear R Users, I have a data frame of 360 rows by 720 columns (259200 values). For each value in this grid I am hoping to apply an equation to, to generate a new grid. One of the parts of the equation (called 'p') relies on reading from a separate reference table. This is Table 4 at: http://www.fao.org/docrep/s2022e/s2022e07.htm#3.1.3%20blaney%20criddle%20method (scroll down a little). Therefore, 'p' relies on the latitude of the values in the initial 360 x 720 data frame. The row names of the data frame contain the latitude values and these range from between 89.75 to -89.75 (the latter being South of the Equator). My question is, how do I go about forming a loop to read each of the 259200 values and assign it a 'p' value (from the associated reference table), based on it's latitude? My thinking was to do a series of 'if' statements, but this soon got very, very messy - any ideas which get the job done (and aren't a riddle to follow), would be most welcome. Many thanks for any advice, Steve _________________________________________________________________ [[elided Hotmail spam]]
jim holtman
2009-Aug-14 18:28 UTC
[R] Assigning values based on a separate reference (lookup) table
You might want to look at 'findInterval'. This will give you back an index based on the ranges you specify; e.g., the breaks in you table that has the 'p' in it. Here is an example if your table had 100 values between -89.5 & 89.5 and was split evenly:> x <- seq(-89.5, 89.5, length=50) > x[1] -89.500000 -85.846939 -82.193878 -78.540816 -74.887755 -71.234694 -67.581633 -63.928571 -60.275510 -56.622449 -52.969388 [12] -49.316327 -45.663265 -42.010204 -38.357143 -34.704082 -31.051020 -27.397959 -23.744898 -20.091837 -16.438776 -12.785714 [23] -9.132653 -5.479592 -1.826531 1.826531 5.479592 9.132653 12.785714 16.438776 20.091837 23.744898 27.397959 [34] 31.051020 34.704082 38.357143 42.010204 45.663265 49.316327 52.969388 56.622449 60.275510 63.928571 67.581633 [45] 71.234694 74.887755 78.540816 82.193878 85.846939 89.500000> z <- runif(100, -89.5, 89.5) > z[1] -41.9739493 -22.8898220 13.0407520 73.0691944 -53.3989343 71.3117536 79.5968731 28.7828049 23.1114139 -78.4402576 [11] -52.6305511 -57.8963413 33.4770896 -20.7454344 48.3016142 -0.4118357 38.9537130 88.0511910 -21.4737029 49.6626946 [21] 77.8122364 -51.5264887 27.1496041 -67.0256378 -41.6675003 -20.3855774 -87.1031304 -21.0525557 66.1746614 -28.5775296 [31] -3.2076593 17.8222828 -1.1561060 -56.1670493 58.5998240 30.1555461 52.6689351 -70.1780910 40.0442593 -15.8818771 [41] 57.4493866 26.3237747 50.6449644 9.4934998 5.3198049 51.7947655 -85.3237148 -4.0758184 41.5841592 34.4989486 [51] -4.0060876 64.6564964 -11.0806178 -45.6812874 -76.8484506 -71.6955573 -32.8873644 3.3355331 28.9989087 -16.6773965 [61] 73.9047904 -36.9449963 -7.3272350 -30.0013533 27.0058136 -43.3149963 -3.8404006 47.6696100 -74.4198023 67.1825181 [71] -28.8059441 60.7598227 -27.4436554 -29.7542874 -4.2331271 70.2035021 65.2167652 -19.6918717 49.6404051 82.4506215 [81] -11.6959522 38.0401275 -17.9010080 -31.2619648 46.0185995 -53.2180863 37.7906988 -67.7171461 -45.5575560 -63.8485161 [91] -46.6063347 -78.9507465 25.4695983 67.3521891 49.9257273 53.2182798 -8.0058728 -16.0949493 55.6457735 18.7830590> findInterval(z, x)[1] 14 19 29 45 10 45 47 33 31 4 11 9 34 19 38 25 36 49 19 39 46 11 32 7 14 19 1 19 43 17 24 30 25 10 41 33 39 6 36 21 41 32 [43] 39 28 26 39 2 24 36 34 24 43 22 12 4 5 16 26 33 20 45 15 23 17 32 13 24 38 5 43 17 42 17 17 24 44 43 20 39 48 22 35 20 16 [85] 38 10 35 6 13 8 12 3 32 43 39 40 23 21 40 30>>From the last statement you have the index into the 'p' and can usethat in the rest of your equation. On Fri, Aug 14, 2009 at 1:48 PM, Steve Murray<smurray444 at hotmail.com> wrote:> > Dear R Users, > > I have a data frame of 360 rows by 720 columns (259200 values). For each value in this grid I am hoping to apply an equation to, to generate a new grid. One of the parts of the equation (called 'p') relies on reading from a separate reference table. This is Table 4 at: http://www.fao.org/docrep/s2022e/s2022e07.htm#3.1.3%20blaney%20criddle%20method (scroll down a little). > > Therefore, 'p' relies on the latitude of the values in the initial 360 x 720 data frame. The row names of the data frame contain the latitude values and these range from between 89.75 to -89.75 (the latter being South of the Equator). > > My question is, how do I go about forming a loop to read each of the 259200 values and assign it a 'p' value (from the associated reference table), based on it's latitude? > > My thinking was to do a series of 'if' statements, but this soon got very, very messy - any ideas which get the job done (and aren't a riddle to follow), would be most welcome. > > Many thanks for any advice, > > Steve > > > _________________________________________________________________ > > [[elided Hotmail spam]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Daniel Malter
2009-Aug-14 18:49 UTC
[R] Assigning values based on a separate reference (lookup) table
This is just one suggestion for a solution, and I think there may be much more elegant ones out there. However, this should work. 1. Reshape your data in long format 2. Merge the table that holds the values you want to multiply your data with with your reshaped dataset (this may require transformation of the table that holds the values that are to be multiplied). 3. Perform multiplication 4. Reshape the data in long format back in wide format. Here is a stepwise example: ##Simulate random data x=rnorm(36*72) dim(x)=c(36,72) x=data.frame(x) names(x)=1:72 ##Reshape data new.x=reshape(x,idvar="index",ids=row.names(x),direction="long",times=names( x),timevar="Longitude",varying=list(names(x))) head(new.x) ##Merge data Then merge new.x with with the columns of the table you want to multiply with (see ?merge). This may require extracting data/reshaping of the multiplier table and/or transforming variables in new.x to make new.x conformable with table that it is to be merged with. my.multiplier=data.frame(1:72,rnorm(72)) names(my.multiplier)=c("index","value") new.x2=merge(new.x,my.multiplier,by.x="Longitude",by.y="index") ##Perform multiplication names(new.x2) new.x2$"1"=new.x2$"1"*new.x2$"value" ##Reshape new.x2 back to the initial wide format of the matrix new.x2=new.x2[,1:3] new.x=reshape(new.x2,idvar="index",ids=row.names(x),direction="wide",times=n ames(x),timevar="Longitude",varying=list(names(x))) head(new.x) HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Steve Murray Gesendet: Friday, August 14, 2009 1:49 PM An: r-help at r-project.org Betreff: [R] Assigning values based on a separate reference (lookup) table Dear R Users, I have a data frame of 360 rows by 720 columns (259200 values). For each value in this grid I am hoping to apply an equation to, to generate a new grid. One of the parts of the equation (called 'p') relies on reading from a separate reference table. This is Table 4 at: http://www.fao.org/docrep/s2022e/s2022e07.htm#3.1.3%20blaney%20criddle%20met hod (scroll down a little). Therefore, 'p' relies on the latitude of the values in the initial 360 x 720 data frame. The row names of the data frame contain the latitude values and these range from between 89.75 to -89.75 (the latter being South of the Equator). My question is, how do I go about forming a loop to read each of the 259200 values and assign it a 'p' value (from the associated reference table), based on it's latitude? My thinking was to do a series of 'if' statements, but this soon got very, very messy - any ideas which get the job done (and aren't a riddle to follow), would be most welcome. Many thanks for any advice, Steve _________________________________________________________________ [[elided Hotmail spam]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Daniel Malter
2009-Aug-14 18:56 UTC
[R] Assigning values based on a separate reference (lookup) table
This is just one suggestion for a solution, and I think there may be much more elegant ones out there. However, this should work. 1. Reshape your data in long format 2. Merge the table that holds the values you want to multiply your data with with your reshaped dataset (this may require transformation of the table that holds the values that are to be multiplied). 3. Perform multiplication 4. Reshape the data in long format back in wide format. Here is a stepwise example: ##Simulate random data x=rnorm(36*72) dim(x)=c(36,72) x=data.frame(x) names(x)=1:72 ##Reshape data new.x=reshape(x,idvar="index",ids=row.names(x),direction="long",times=names( x),timevar="Longitude",varying=list(names(x))) head(new.x) ##Merge data Then merge new.x with with the columns of the table you want to multiply with (see ?merge). This may require extracting data/reshaping of the multiplier table and/or transforming variables in new.x to make new.x conformable with table that it is to be merged with. my.multiplier=data.frame(1:72,rnorm(72)) names(my.multiplier)=c("index","value") new.x2=merge(new.x,my.multiplier,by.x="Longitude",by.y="index") ##Perform multiplication names(new.x2) new.x2$"1"=new.x2$"1"*new.x2$"value" ##Reshape new.x2 back to the initial wide format of the matrix new.x2=new.x2[,1:3] new.x=reshape(new.x2,idvar="index",ids=row.names(x),direction="wide",times=n ames(x),timevar="Longitude",varying=list(names(x))) head(new.x) HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Steve Murray Gesendet: Friday, August 14, 2009 1:49 PM An: r-help at r-project.org Betreff: [R] Assigning values based on a separate reference (lookup) table Dear R Users, I have a data frame of 360 rows by 720 columns (259200 values). For each value in this grid I am hoping to apply an equation to, to generate a new grid. One of the parts of the equation (called 'p') relies on reading from a separate reference table. This is Table 4 at: http://www.fao.org/docrep/s2022e/s2022e07.htm#3.1.3%20blaney%20criddle%20met hod (scroll down a little). Therefore, 'p' relies on the latitude of the values in the initial 360 x 720 data frame. The row names of the data frame contain the latitude values and these range from between 89.75 to -89.75 (the latter being South of the Equator). My question is, how do I go about forming a loop to read each of the 259200 values and assign it a 'p' value (from the associated reference table), based on it's latitude? My thinking was to do a series of 'if' statements, but this soon got very, very messy - any ideas which get the job done (and aren't a riddle to follow), would be most welcome. Many thanks for any advice, Steve _________________________________________________________________ [[elided Hotmail spam]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Steve Murray
2009-Aug-14 19:05 UTC
[R] Assigning values based on a separate reference (lookup) table
Thanks to you both for your responses. 3 I think these approaches will *nearly* do the trick, however, the problem is that the reference/lookup table is based on 'bins' of latitude values, eg.>61, 60-56, 55-51, 50-46 etc. whereas the actual data (in my 720 x 360 data frame) are not binned, e.g. 89.75, 89.25, 88.75, 88.25, 87.75 etc. - instead they 'increment' by -0.5 each time, and therefore many of the 367200 values which are in the data frame will have latitude values falling into the same 'reference' bin. It's for this reason that I think the 'merge' approach might fall down, unless there's a way of telling 'merge' that latitude can still be considered to match if they fall within a range. For example, if my 720 x 360 data frame has values whose corresponding latitude (row name) values are, say, 56.3, 55.9, 58.2, 56.8 and 57.3, then the original value in the grid needs to be assigned a 'p' value which corresponds with what is read off of the reference table from the bin 56-60. Hope this makes sense! If not, please feel free to ask for clarification. Many thanks again, Steve _________________________________________________________________ oticons.