Daniel Lee Rabosky
2006-Dec-05 19:41 UTC
[R] dynamic variable creation in lists and data frames
Hi
I have a question about the creation of variables within lists in R. I am
running simulations and am interested in two parameters, ESM and ESMM (the
similarity of these names is important for my question). I do simulations
to generate ESMM, then plug these values into a second simulation function
to get ESM:
x <- list()
for (i in 1:nsimulations)
{
x$ESMM[i] <- do_simulation1()
x$ESM[i] <- do_simulation2(x$ESMM[i])
}
and I return everything as a dataframe, x <- as.data.frame(x)
When I do this, I find that x$ESMM is overwritten by x$ESM for the first
simulation. However, x$ESM is nonetheless correctly generated using
x$ESMM.
Thus, x$ESM[1] = x$ESMM[1], but for the other n-thousand simulations,
ESMM is not overwritten; the error only occurs on the first instance of
ESM.
I think I know why this is occurring: I am creating a new variable in a
list and assigning it a value, but when R can?t find the variable, it
overwrites the next most similar variable (ESMM). But it still proceeds
to create the new variable ESM, having overwritten x$ESMM[1]. And it
doesn?t happen for subsequent simulations, because both variables then
exist in the list.
My questions are:
1) how different do variable names have to be to avoid this problem? What
exactly is R using to decide that ESMM is the same as ESM?
or
2) is there something fundamentally flawed with the manner in which I
dynamically create variables in lists, without initializing them in some
fashion? This approach worked fine until I noticed this issue with
variables having similar names.
Thanks very much in advance for your help.
Dan Rabosky
Dan Rabosky
Department of Ecology and Evolutionary Biology
Corson Hall
Cornell University
Ithaca, NY 14853
Marc Schwartz
2006-Dec-05 20:21 UTC
[R] dynamic variable creation in lists and data frames
On Tue, 2006-12-05 at 14:41 -0500, Daniel Lee Rabosky wrote:> Hi > > I have a question about the creation of variables within lists in R. I am > running simulations and am interested in two parameters, ESM and ESMM (the > similarity of these names is important for my question). I do simulations > to generate ESMM, then plug these values into a second simulation function > to get ESM: > > x <- list() > > for (i in 1:nsimulations) > { > x$ESMM[i] <- do_simulation1() > x$ESM[i] <- do_simulation2(x$ESMM[i]) > } > > and I return everything as a dataframe, x <- as.data.frame(x) > > When I do this, I find that x$ESMM is overwritten by x$ESM for the first > simulation. However, x$ESM is nonetheless correctly generated using > x$ESMM. > > Thus, x$ESM[1] = x$ESMM[1], but for the other n-thousand simulations, > ESMM is not overwritten; the error only occurs on the first instance of > ESM. > > I think I know why this is occurring: I am creating a new variable in a > list and assigning it a value, but when R can?t find the variable, it > overwrites the next most similar variable (ESMM). But it still proceeds > to create the new variable ESM, having overwritten x$ESMM[1]. And it > doesn?t happen for subsequent simulations, because both variables then > exist in the list. > > My questions are: > 1) how different do variable names have to be to avoid this problem? What > exactly is R using to decide that ESMM is the same as ESM? > > or > > 2) is there something fundamentally flawed with the manner in which I > dynamically create variables in lists, without initializing them in some > fashion? This approach worked fine until I noticed this issue with > variables having similar names. > > Thanks very much in advance for your help. > > Dan RaboskyThis has to do with partial matching to index data frame columns and list elements. It is the default behavior in R and if you search the archives using: RSiteSearch("partial matching") you will note prior discussions on this. A simple example:> x <- list() > xlist()> x$ESMM[1] <- 1 > x$ESMM [1] 1> x$ESM[1] <- 2 > x$ESMM [1] 2 $ESM [1] 2 Both values are changed, since x$ESM does not yet exist and the assignment partially matches x$ESMM. Then x$ESM is created. I think that in this particular situation, you might want to try: # Create a simple function that returns pairs of random samples from # 'letters', which is a:z Sim <- function() { list(ESMM = letters[sample(26, 1)], ESM = letters[sample(26, 1)]) } # Run it once> Sim()$ESMM [1] "l" $ESM [1] "z" Now use replicate() to do this 10 times. Note the default behavior is to simplify the returned values into a matrix.> x <- replicate(10, Sim()) > x[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ESMM "x" "q" "c" "f" "e" "f" "y" "d" "z" "h" ESM "u" "c" "j" "v" "u" "j" "o" "p" "g" "g" So, in your case create a function Sim() like this: Sim <- function() { ESMM <- do_simulation1() ESM <- do_simulation2(ESMM) list(ESMM = ESMM, ESM = ESM) } and then use replicate() as above. See ?replicate for more information. HTH, Marc Schwartz