Hi all. I have a re-occuring typical problem that I don't know how to solve efficiently. The situation is the following: I have a number of data-sets (A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a measurement (e.g. in the range 100-120). I want to compile a large table, with all availabe identifiers in all data-sets in the rows, and a column for every dataset. Now, not all datasets have a measurement for every identifier, so I want NA if the set does not contain the identifier. an example for a single dataset: #all identifiers > rep <- c(10:20) #Identifiers in my dataset (a subset of rep) > rep1 <- c(12,13,15,16,17,18) #measurements in this dataset > rep1.r <- c(112,113,115,116,117,118) #a vector which should become a column in the final table, now containing all NAs > res <- rep(NA,10) #the IDs and values of my dataset together > data <- cbind(rep1, rep1.r) data looks like this: rep1 rep1.r [1,] 12 112 [2,] 13 113 [3,] 15 115 [4,] 16 116 [5,] 17 117 [6,] 18 118 Now, I want to put the values 112, 113, 115,... in the correct rows of the final table, using the identifiers as an indicator of which row to put it in, so that I finally obtain: rep res 10 NA 11 NA 12 112 13 113 14 NA 15 115 16 116 17 117 18 118 19 NA 20 NA I try to avoid repeating 'which' a lot and filling in every identifier's observation etc, since I will be doing this for thousands of rows at once. There must be an efficient way using factors, tapply etc, but I have trouble finding it. Ideal would be if this could be done in one go, instead of looping. Any suggestions ? Thanks, Piet
merge() may be just what you want. Cheers, Pierre Piet van Remortel wrote:> Hi all. > > I have a re-occuring typical problem that I don't know how to solve > efficiently. > > The situation is the following: I have a number of data-sets > (A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a > measurement (e.g. in the range 100-120). I want to compile a large > table, with all availabe identifiers in all data-sets in the rows, and a > column for every dataset. > > Now, not all datasets have a measurement for every identifier, so I want > NA if the set does not contain the identifier. > > an example for a single dataset: > > #all identifiers > > rep <- c(10:20) > > #Identifiers in my dataset (a subset of rep) > > rep1 <- c(12,13,15,16,17,18) > > #measurements in this dataset > > rep1.r <- c(112,113,115,116,117,118) > > #a vector which should become a column in the final table, now > containing all NAs > > res <- rep(NA,10) > > #the IDs and values of my dataset together > > data <- cbind(rep1, rep1.r) > > data looks like this: > rep1 rep1.r > [1,] 12 112 > [2,] 13 113 > [3,] 15 115 > [4,] 16 116 > [5,] 17 117 > [6,] 18 118 > > Now, I want to put the values 112, 113, 115,... in the correct rows of > the final table, using the identifiers as an indicator of which row to > put it in, so that I finally obtain: > > rep res > 10 NA > 11 NA > 12 112 > 13 113 > 14 NA > 15 115 > 16 116 > 17 117 > 18 118 > 19 NA > 20 NA > > I try to avoid repeating 'which' a lot and filling in every identifier's > observation etc, since I will be doing this for thousands of rows at > once. There must be an efficient way using factors, tapply etc, but I > have trouble finding it. Ideal would be if this could be done in one > go, instead of looping. > > Any suggestions ? > > Thanks, > > Piet > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- ----------------------------------------------------------------- Pierre Kleiber, Ph.D Email: pkleiber at honlab.nmfs.hawaii.edu Fishery Biologist Tel: 808 983-5399 / (hm)808 737-7544 NOAA Fisheries Service - Honolulu Laboratory Fax: 808 983-2902 2570 Dole St., Honolulu, HI 96822-2396 ----------------------------------------------------------------- "God could have told Moses about galaxies and mitochondria and all. But behold... It was good enough for government work."
You can use merge but to do so you will need to define the common key first. This can be a rowname in the case of a matrix or names in the case of a vector. v1 <- 1:10 names(v1) <- LETTERS[1:10] v2 <- 101:105 names(v2) <- sample( LETTERS[1:10], 5 )> merge( v1, v2, by=0, all=TRUE )Row.names x y 1 A 1 NA 2 B 2 102 3 C 3 104 4 D 4 103 5 E 5 105 6 F 6 NA 7 G 7 NA 8 H 8 101 9 I 9 NA 10 J 10 NA Regards, Adai On Tue, 2005-03-29 at 22:47 +0200, Piet van Remortel wrote:> Hi all. > > I have a re-occuring typical problem that I don't know how to solve > efficiently. > > The situation is the following: I have a number of data-sets > (A,B,C,...) , consisting of an identifier (e.g. 11,12,13,...,20) and a > measurement (e.g. in the range 100-120). I want to compile a large > table, with all availabe identifiers in all data-sets in the rows, and > a column for every dataset. > > Now, not all datasets have a measurement for every identifier, so I > want NA if the set does not contain the identifier. > > an example for a single dataset: > > #all identifiers > > rep <- c(10:20) > > #Identifiers in my dataset (a subset of rep) > > rep1 <- c(12,13,15,16,17,18) > > #measurements in this dataset > > rep1.r <- c(112,113,115,116,117,118) > > #a vector which should become a column in the final table, now > containing all NAs > > res <- rep(NA,10) > > #the IDs and values of my dataset together > > data <- cbind(rep1, rep1.r) > > data looks like this: > rep1 rep1.r > [1,] 12 112 > [2,] 13 113 > [3,] 15 115 > [4,] 16 116 > [5,] 17 117 > [6,] 18 118 > > Now, I want to put the values 112, 113, 115,... in the correct rows of > the final table, using the identifiers as an indicator of which row to > put it in, so that I finally obtain: > > rep res > 10 NA > 11 NA > 12 112 > 13 113 > 14 NA > 15 115 > 16 116 > 17 117 > 18 118 > 19 NA > 20 NA > > I try to avoid repeating 'which' a lot and filling in every > identifier's observation etc, since I will be doing this for thousands > of rows at once. There must be an efficient way using factors, > tapply etc, but I have trouble finding it. Ideal would be if this > could be done in one go, instead of looping. > > Any suggestions ? > > Thanks, > > Piet > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >