Luigi Marongiu
2021-Aug-09 13:33 UTC
[R] substitute column data frame based on name stored in variable in r
You are right, vect will contain the names of the columns of the real dataframe buyt the actual simulation of the real case is more like this: ```> df = data.frame(A = 1:5, B = c(1, 2, NA, 2, NA), C = c("value is blue", "Value is red", "empty", " value is blue", " Value is green"), D = 9:13, E = c("light", "light", "heavy", "heavy", "heavy")); dfA B C D E 1 1 1 value is blue 9 light 2 2 2 Value is red 10 light 3 3 NA empty 11 heavy 4 4 2 value is blue 12 heavy 5 5 NA Value is green 13 heavy> vect = LETTERS[1:5] > df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"; dfA B C D E vect[2] 1 1 1 value is blue 9 light <NA> 2 2 2 Value is red 10 light <NA> 3 3 NA empty 11 heavy <NA> 4 4 2 value is blue 12 heavy <NA> 5 5 NA Value is green 13 heavy <NA>> df[df[[vect[2]]] == 2, vect[2]] <- "No"; dfError in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value = "No") : missing values are not allowed in subscripted assignments of data frames ``` but still, I get an extra column instead of working on column B directly. and I can't dispense the quotation marks... On Mon, Aug 9, 2021 at 1:31 PM Ivan Krylov <krylov.r00t at gmail.com> wrote:> > On Mon, 9 Aug 2021 13:16:02 +0200 > Luigi Marongiu <marongiu.luigi at gmail.com> wrote: > > > df = data.frame(VAR = ..., VAL = ...) > > vect = letters[1:5] > > What is the relation between vect and the column names of the data > frame? Is it your intention to choose rows or columns using `vect`? > > > df[df[['vect[2]']] == 2, 'vect[2]'] > > '...' creates a string literal. If you want to evaluate an R > expression, don't wrap it in quotes. > > I had assumed you wanted to put column names in the vector `vect`, but > now I'm just confused: `vect` is the same as df$VAR, not colnames(df). > What do you want to achieve? > > Again, you can access the second column with much less typing by > addressing it directly: df[[2]] > > Does it help if you consult [**] or some other tutorial on subsetting > in R? > > -- > Best regards, > Ivan > > [**] > https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors > https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Lists-- Best regards, Luigi
Ivan Krylov
2021-Aug-09 15:18 UTC
[R] substitute column data frame based on name stored in variable in r
Thanks for providing a reproducible example! On Mon, 9 Aug 2021 15:33:53 +0200 Luigi Marongiu <marongiu.luigi at gmail.com> wrote:> df[df[['vect[2]']] == 2, 'vect[2]'] <- "No"Please don't quote R expressions that you want to evaluate. 'vect[2]' is just a string, like 'hello world' or 'I want to create a new column named "vect[2]" instead of accessing the second one'.> Error in `[<-.data.frame`(`*tmp*`, df[[vect[2]]] == 2, vect[2], value > = "No") : missing values are not allowed in subscripted assignments > of data framesSince df[[2]] containts NAs, comparisons with it also contain NAs. While it's possible to subset data.frames with NAs (the rows corresponding to the NAs are returned filled with NAs of corresponding types), assignment to undefined rows is not allowed. A simple way to remove the NAs and only leave the cases where df[[vect[2]]] == 2 is TRUE would be to use which(). Compare: df[df[[vect[2]]] == 2,] df[which(df[[vect[2]]] == 2),] -- Best regards, Ivan