See comment inline below: On 8/18/2018 10:06 PM, Rui Barradas wrote:> Hello, > > It also works with class "factor": > > df <- data.frame(variable = c("12.6%", "30.9%", "61.4%")) > class(df$variable) > #[1] "factor" > > as.numeric(gsub(pattern = "%", "", df$variable)) > #[1] 12.6 30.9 61.4 > > > This is because sub() and gsub() return a character vector and the > instruction becomes an equivalent of what the help page ?factor > documents in section Warning: > > To transform a factor f to approximately its original numeric values, > as.numeric(levels(f))[f] is recommended and slightly more efficient than > as.numeric(as.character(f)). > > > Also, I would still prefer > > as.numeric(sub(pattern = "%$","",df$variable)) > #[1] 12.6 30.9 61.4 > > The pattern is more strict and there is no need to search&replace > multiple occurrences of '%'.The pattern is more strict, and that could cause the conversion to fail if the process that created the strings resulted in trailing spaces. Without the '$' the conversion succeeds. df <- data.frame(variable = c("12.6% ", "30.9%", "61.4%")) as.numeric(sub('%$', '', df$variable)) [1] NA 30.9 61.4 Warning message: NAs introduced by coercion <<<snip>>> Dan -- Daniel Nordlund Port Townsend, WA USA
Hello, Inline. On 20/08/2018 01:08, Daniel Nordlund wrote:> See comment inline below: > > On 8/18/2018 10:06 PM, Rui Barradas wrote: >> Hello, >> >> It also works with class "factor": >> >> df <- data.frame(variable = c("12.6%", "30.9%", "61.4%")) >> class(df$variable) >> #[1] "factor" >> >> as.numeric(gsub(pattern = "%", "", df$variable)) >> #[1] 12.6 30.9 61.4 >> >> >> This is because sub() and gsub() return a character vector and the >> instruction becomes an equivalent of what the help page ?factor >> documents in section Warning: >> >> To transform a factor f to approximately its original numeric values, >> as.numeric(levels(f))[f] is recommended and slightly more efficient >> than as.numeric(as.character(f)). >> >> >> Also, I would still prefer >> >> as.numeric(sub(pattern = "%$","",df$variable)) >> #[1] 12.6 30.9 61.4 >> >> The pattern is more strict and there is no need to search&replace >> multiple occurrences of '%'. > > The pattern is more strict, and that could cause the conversion to fail > if the process that created the strings resulted in trailing spaces.That's true, and I had thought of that but it wasn't in the OP's problem description. The '$' could still be used with something like "%\\s*$": as.numeric(sub('%\\s*$', '', df$variable)) #[1] 12.6 30.9 61.4 Rui Barradas> Without the '$' the conversion succeeds. > > df <- data.frame(variable = c("12.6% ", "30.9%", "61.4%")) > as.numeric(sub('%$', '', df$variable)) > [1]?? NA 30.9 61.4 > Warning message: > NAs introduced by coercion > > > <<<snip>>> > > > Dan >--- This email has been checked for viruses by AVG. https://www.avg.com
????? Have you considered "Ecfun::asNumericChar" (and "Ecfun::asNumericDF")? DF <- data.frame(variable = c("12.6% ", "30.9%", "61.4%", "1")) Ecfun::asNumericChar(DF$variable) [1] 0.126 0.309 0.614 1.000 ????? If you read the documentation including the examples, you will see that many of these issues and others are handled automatically in the way that I thought was the most sensible.? If you disagree, we can discuss other examples and perhaps modify the code for those functions. ????? Spencer Graves On 2018-08-20 00:26, Rui Barradas wrote:> Hello, > > Inline. > > On 20/08/2018 01:08, Daniel Nordlund wrote: >> See comment inline below: >> >> On 8/18/2018 10:06 PM, Rui Barradas wrote: >>> Hello, >>> >>> It also works with class "factor": >>> >>> df <- data.frame(variable = c("12.6%", "30.9%", "61.4%")) >>> class(df$variable) >>> #[1] "factor" >>> >>> as.numeric(gsub(pattern = "%", "", df$variable)) >>> #[1] 12.6 30.9 61.4 >>> >>> >>> This is because sub() and gsub() return a character vector and the >>> instruction becomes an equivalent of what the help page ?factor >>> documents in section Warning: >>> >>> To transform a factor f to approximately its original numeric >>> values, as.numeric(levels(f))[f] is recommended and slightly more >>> efficient than as.numeric(as.character(f)). >>> >>> >>> Also, I would still prefer >>> >>> as.numeric(sub(pattern = "%$","",df$variable)) >>> #[1] 12.6 30.9 61.4 >>> >>> The pattern is more strict and there is no need to search&replace >>> multiple occurrences of '%'. >> >> The pattern is more strict, and that could cause the conversion to >> fail if the process that created the strings resulted in trailing >> spaces. > > That's true, and I had thought of that but it wasn't in the OP's > problem description. > The '$' could still be used with something like "%\\s*$": > > as.numeric(sub('%\\s*$', '', df$variable)) > #[1] 12.6 30.9 61.4 > > > Rui Barradas > > >> Without the '$' the conversion succeeds. >> >> df <- data.frame(variable = c("12.6% ", "30.9%", "61.4%")) >> as.numeric(sub('%$', '', df$variable)) >> [1]?? NA 30.9 61.4 >> Warning message: >> NAs introduced by coercion >> >> >> <<<snip>>> >> >> >> Dan >> > > --- > This email has been checked for viruses by AVG. > https://www.avg.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.