Hello, I have a data frame where some lines containing strings including digits. How do I select those rows and change their values? In essence, I have a data frame with different values assigned to the column "val". I am formatting everything to either "POS" and "NEG", but values entered as number should get the value "NUM". How do I change such values? -- Best regards, Luigi ``` df = data.frame(id = runif(10, 1, 100), val = c("", "POs", "Pos", "P", "Y", "13.6", "Neg", "N", "0.5", "58.4"), stringsAsFactors = FALSE) df$val[df$val == ""] = NA df$val[df$val == "POs"] = "POS" df$val[df$val == "Pos"] = "POS" df$val[df$val == "P"] = "POS" df$val[df$val == "Y"] = "POS" df$val[df$val == "Neg"] = "NEG" df$val[df$val == "N"] = "NEG" ```
? Wed, 30 Nov 2022 13:40:50 +0100 Luigi Marongiu <marongiu.luigi at gmail.com> ?????:> I am formatting everything to either "POS" and "NEG", > but values entered as number should get the value "NUM". > How do I change such values?Thanks for providing an example! One idea would be to use a regular expression to locate numbers. For example, grepl('[0-9]', df$val) will return a logical vector indexing the rows containing digits. Alternatively, grepl('^[0-9.]+$', df$val, perl = TRUE) will index all strings consisting solely of digits and decimal separators. Another idea would be to parse all of the strings as numbers and filter out those that didn't succeed. Use as.numeric() to perform the parsing, suppressWarnings() to silence the messages telling you that the parsing failed for some of the strings and is.na() to get the logical vector indexing those entries that failed to parse. -- Best regards, Ivan
?s 12:40 de 30/11/2022, Luigi Marongiu escreveu:> Hello, > I have a data frame where some lines containing strings including digits. > How do I select those rows and change their values? > > In essence, I have a data frame with different values assigned to the > column "val". I am formatting everything to either "POS" and "NEG", > but values entered as number should get the value "NUM". > How do I change such values? >Hello, Here is a way with grep. i <- grep("^P|^Y", df$val, ignore.case = TRUE) df$val[i] <- "POS" i <- grep("^N", df$val, ignore.case = TRUE) df$val[i] <- "NEG" i <- grep("\\d+", df$val) df$val[i] <- "NUM" is.na(df$val) <- df$val == "" df Hope this helps, Rui Barradas
On Wed, 30 Nov 2022 13:40:50 +0100 Luigi Marongiu <marongiu.luigi at gmail.com> wrote:> Hello, > I have a data frame where some lines containing strings including > digits. How do I select those rows and change their values? > > In essence, I have a data frame with different values assigned to the > column "val". I am formatting everything to either "POS" and "NEG", > but values entered as number should get the value "NUM". > How do I change such values? >What I do in such circumstances: suppressWarnings(X$val[!is.na(as.numeric(X$val))] <- "NUM") The "suppressWarnings()" bit is just included due to my OCD. This avoids fooling about with regular expressions, which always requires a huge amount of trial and error, and a great diminishment of the amount of hair on one's head (as a result of tearing out). Note that I have changed the name of your data frame from "df" to "X", since df() is a built-in R function (density of the F-distribution). See fortunes::fortune("might clash"). cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Thank you, those are all viable solutions. Regards Luigi On Wed, Nov 30, 2022 at 8:59 PM Rolf Turner <r.turner at auckland.ac.nz> wrote:> > > On Wed, 30 Nov 2022 13:40:50 +0100 > Luigi Marongiu <marongiu.luigi at gmail.com> wrote: > > > Hello, > > I have a data frame where some lines containing strings including > > digits. How do I select those rows and change their values? > > > > In essence, I have a data frame with different values assigned to the > > column "val". I am formatting everything to either "POS" and "NEG", > > but values entered as number should get the value "NUM". > > How do I change such values? > > > > What I do in such circumstances: > > suppressWarnings(X$val[!is.na(as.numeric(X$val))] <- "NUM") > > The "suppressWarnings()" bit is just included due to my OCD. > > This avoids fooling about with regular expressions, which always > requires a huge amount of trial and error, and a great diminishment of > the amount of hair on one's head (as a result of tearing out). > > Note that I have changed the name of your data frame from "df" to "X", > since df() is a built-in R function (density of the F-distribution). > > See fortunes::fortune("might clash"). > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 >-- Best regards, Luigi