See comment inline below: On 8/18/2018 10:06 PM, Rui Barradas wrote:> Hello, > > It also works with class "factor": > > df <- data.frame(variable = c("12.6%", "30.9%", "61.4%")) > class(df$variable) > #[1] "factor" > > as.numeric(gsub(pattern = "%", "", df$variable)) > #[1] 12.6 30.9 61.4 > > > This is because sub() and gsub() return a character vector and the > instruction becomes an equivalent of what the help page ?factor > documents in section Warning: > > To transform a factor f to approximately its original numeric values, > as.numeric(levels(f))[f] is recommended and slightly more efficient than > as.numeric(as.character(f)). > > > Also, I would still prefer > > as.numeric(sub(pattern = "%$","",df$variable)) > #[1] 12.6 30.9 61.4 > > The pattern is more strict and there is no need to search&replace > multiple occurrences of '%'.The pattern is more strict, and that could cause the conversion to fail if the process that created the strings resulted in trailing spaces. Without the '$' the conversion succeeds. df <- data.frame(variable = c("12.6% ", "30.9%", "61.4%")) as.numeric(sub('%$', '', df$variable)) [1] NA 30.9 61.4 Warning message: NAs introduced by coercion <<<snip>>> Dan -- Daniel Nordlund Port Townsend, WA USA
Hello, Inline. On 20/08/2018 01:08, Daniel Nordlund wrote:> See comment inline below: > > On 8/18/2018 10:06 PM, Rui Barradas wrote: >> Hello, >> >> It also works with class "factor": >> >> df <- data.frame(variable = c("12.6%", "30.9%", "61.4%")) >> class(df$variable) >> #[1] "factor" >> >> as.numeric(gsub(pattern = "%", "", df$variable)) >> #[1] 12.6 30.9 61.4 >> >> >> This is because sub() and gsub() return a character vector and the >> instruction becomes an equivalent of what the help page ?factor >> documents in section Warning: >> >> To transform a factor f to approximately its original numeric values, >> as.numeric(levels(f))[f] is recommended and slightly more efficient >> than as.numeric(as.character(f)). >> >> >> Also, I would still prefer >> >> as.numeric(sub(pattern = "%$","",df$variable)) >> #[1] 12.6 30.9 61.4 >> >> The pattern is more strict and there is no need to search&replace >> multiple occurrences of '%'. > > The pattern is more strict, and that could cause the conversion to fail > if the process that created the strings resulted in trailing spaces.That's true, and I had thought of that but it wasn't in the OP's problem description. The '$' could still be used with something like "%\\s*$": as.numeric(sub('%\\s*$', '', df$variable)) #[1] 12.6 30.9 61.4 Rui Barradas> Without the '$' the conversion succeeds. > > df <- data.frame(variable = c("12.6% ", "30.9%", "61.4%")) > as.numeric(sub('%$', '', df$variable)) > [1]?? NA 30.9 61.4 > Warning message: > NAs introduced by coercion > > > <<<snip>>> > > > Dan >--- This email has been checked for viruses by AVG. https://www.avg.com
????? Have you considered "Ecfun::asNumericChar" (and
"Ecfun::asNumericDF")?
DF <- data.frame(variable = c("12.6% ", "30.9%",
"61.4%", "1"))
Ecfun::asNumericChar(DF$variable)
[1] 0.126 0.309 0.614 1.000
????? If you read the documentation including the examples, you will
see that many of these issues and others are handled automatically in
the way that I thought was the most sensible.? If you disagree, we can
discuss other examples and perhaps modify the code for those functions.
????? Spencer Graves
On 2018-08-20 00:26, Rui Barradas wrote:> Hello,
>
> Inline.
>
> On 20/08/2018 01:08, Daniel Nordlund wrote:
>> See comment inline below:
>>
>> On 8/18/2018 10:06 PM, Rui Barradas wrote:
>>> Hello,
>>>
>>> It also works with class "factor":
>>>
>>> df <- data.frame(variable = c("12.6%",
"30.9%", "61.4%"))
>>> class(df$variable)
>>> #[1] "factor"
>>>
>>> as.numeric(gsub(pattern = "%", "",
df$variable))
>>> #[1] 12.6 30.9 61.4
>>>
>>>
>>> This is because sub() and gsub() return a character vector and the
>>> instruction becomes an equivalent of what the help page ?factor
>>> documents in section Warning:
>>>
>>> To transform a factor f to approximately its original numeric
>>> values, as.numeric(levels(f))[f] is recommended and slightly more
>>> efficient than as.numeric(as.character(f)).
>>>
>>>
>>> Also, I would still prefer
>>>
>>> as.numeric(sub(pattern = "%$","",df$variable))
>>> #[1] 12.6 30.9 61.4
>>>
>>> The pattern is more strict and there is no need to
search&replace
>>> multiple occurrences of '%'.
>>
>> The pattern is more strict, and that could cause the conversion to
>> fail if the process that created the strings resulted in trailing
>> spaces.
>
> That's true, and I had thought of that but it wasn't in the
OP's
> problem description.
> The '$' could still be used with something like "%\\s*$":
>
> as.numeric(sub('%\\s*$', '', df$variable))
> #[1] 12.6 30.9 61.4
>
>
> Rui Barradas
>
>
>> Without the '$' the conversion succeeds.
>>
>> df <- data.frame(variable = c("12.6% ", "30.9%",
"61.4%"))
>> as.numeric(sub('%$', '', df$variable))
>> [1]?? NA 30.9 61.4
>> Warning message:
>> NAs introduced by coercion
>>
>>
>> <<<snip>>>
>>
>>
>> Dan
>>
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.