Daniel Lee Rabosky
2006-Dec-05 19:41 UTC
[R] dynamic variable creation in lists and data frames
Hi I have a question about the creation of variables within lists in R. I am running simulations and am interested in two parameters, ESM and ESMM (the similarity of these names is important for my question). I do simulations to generate ESMM, then plug these values into a second simulation function to get ESM: x <- list() for (i in 1:nsimulations) { x$ESMM[i] <- do_simulation1() x$ESM[i] <- do_simulation2(x$ESMM[i]) } and I return everything as a dataframe, x <- as.data.frame(x) When I do this, I find that x$ESMM is overwritten by x$ESM for the first simulation. However, x$ESM is nonetheless correctly generated using x$ESMM. Thus, x$ESM[1] = x$ESMM[1], but for the other n-thousand simulations, ESMM is not overwritten; the error only occurs on the first instance of ESM. I think I know why this is occurring: I am creating a new variable in a list and assigning it a value, but when R can?t find the variable, it overwrites the next most similar variable (ESMM). But it still proceeds to create the new variable ESM, having overwritten x$ESMM[1]. And it doesn?t happen for subsequent simulations, because both variables then exist in the list. My questions are: 1) how different do variable names have to be to avoid this problem? What exactly is R using to decide that ESMM is the same as ESM? or 2) is there something fundamentally flawed with the manner in which I dynamically create variables in lists, without initializing them in some fashion? This approach worked fine until I noticed this issue with variables having similar names. Thanks very much in advance for your help. Dan Rabosky Dan Rabosky Department of Ecology and Evolutionary Biology Corson Hall Cornell University Ithaca, NY 14853
Marc Schwartz
2006-Dec-05 20:21 UTC
[R] dynamic variable creation in lists and data frames
On Tue, 2006-12-05 at 14:41 -0500, Daniel Lee Rabosky wrote:> Hi > > I have a question about the creation of variables within lists in R. I am > running simulations and am interested in two parameters, ESM and ESMM (the > similarity of these names is important for my question). I do simulations > to generate ESMM, then plug these values into a second simulation function > to get ESM: > > x <- list() > > for (i in 1:nsimulations) > { > x$ESMM[i] <- do_simulation1() > x$ESM[i] <- do_simulation2(x$ESMM[i]) > } > > and I return everything as a dataframe, x <- as.data.frame(x) > > When I do this, I find that x$ESMM is overwritten by x$ESM for the first > simulation. However, x$ESM is nonetheless correctly generated using > x$ESMM. > > Thus, x$ESM[1] = x$ESMM[1], but for the other n-thousand simulations, > ESMM is not overwritten; the error only occurs on the first instance of > ESM. > > I think I know why this is occurring: I am creating a new variable in a > list and assigning it a value, but when R can?t find the variable, it > overwrites the next most similar variable (ESMM). But it still proceeds > to create the new variable ESM, having overwritten x$ESMM[1]. And it > doesn?t happen for subsequent simulations, because both variables then > exist in the list. > > My questions are: > 1) how different do variable names have to be to avoid this problem? What > exactly is R using to decide that ESMM is the same as ESM? > > or > > 2) is there something fundamentally flawed with the manner in which I > dynamically create variables in lists, without initializing them in some > fashion? This approach worked fine until I noticed this issue with > variables having similar names. > > Thanks very much in advance for your help. > > Dan RaboskyThis has to do with partial matching to index data frame columns and list elements. It is the default behavior in R and if you search the archives using: RSiteSearch("partial matching") you will note prior discussions on this. A simple example:> x <- list() > xlist()> x$ESMM[1] <- 1 > x$ESMM [1] 1> x$ESM[1] <- 2 > x$ESMM [1] 2 $ESM [1] 2 Both values are changed, since x$ESM does not yet exist and the assignment partially matches x$ESMM. Then x$ESM is created. I think that in this particular situation, you might want to try: # Create a simple function that returns pairs of random samples from # 'letters', which is a:z Sim <- function() { list(ESMM = letters[sample(26, 1)], ESM = letters[sample(26, 1)]) } # Run it once> Sim()$ESMM [1] "l" $ESM [1] "z" Now use replicate() to do this 10 times. Note the default behavior is to simplify the returned values into a matrix.> x <- replicate(10, Sim()) > x[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ESMM "x" "q" "c" "f" "e" "f" "y" "d" "z" "h" ESM "u" "c" "j" "v" "u" "j" "o" "p" "g" "g" So, in your case create a function Sim() like this: Sim <- function() { ESMM <- do_simulation1() ESM <- do_simulation2(ESMM) list(ESMM = ESMM, ESM = ESM) } and then use replicate() as above. See ?replicate for more information. HTH, Marc Schwartz