thr3ads.net - R help - [R] Best way to test for numeric digits? [Oct 2023]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2023-Oct-18 15:53 UTC

[R] Best way to test for numeric digits?

?s 15:59 de 18/10/2023, Leonard Mada via R-help
escreveu:> Dear List members,
> 
> What is the best way to test for numeric digits?
> 
> suppressWarnings(as.double(c("Li", "Na",
"K",? "2", "Rb", "Ca", "3")))
> # [1] NA NA NA? 2 NA NA? 3
> The above requires the use of the suppressWarnings function. Are there 
> any better ways?
> 
> I was working to extract chemical elements from a formula, something 
> like this:
> split.symbol.character = function(x, rm.digits = TRUE) {
>  ?? ?# Perl is partly broken in R 4.3, but this works:
>  ?? ?regex =
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>  ?? ?# stringi::stri_split(x, regex = regex);
>  ?? ?s = strsplit(x, regex, perl = TRUE);
>  ?? ?if(rm.digits) {
>  ?? ???? s = lapply(s, function(s) {
>  ?? ???? ??? isNotD = is.na(suppressWarnings(as.numeric(s)));
>  ?? ???? ??? s = s[isNotD];
>  ?? ???? });
>  ?? ?}
>  ?? ?return(s);
> }
> 
> split.symbol.character(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"))
> 
> 
> Sincerely,
> 
> 
> Leonard
> 
> 
> Note:
> # works:
> regex =
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> strsplit(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
> 
> 
> # broken in R 4.3.1
> # only slightly "erroneous" with stringi::stri_split
> regex =
"(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> strsplit(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr 
function.


devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
   data(pt, package = "chemr", envir = environment())
   el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
   pat <- paste(el, collapse = "|")
   stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O" 
"Al" "P"  "O"  "Si" "O" 
"Cl"


It is also possible to rewrite the function without calls to non base 
packages but that will take some more work.

Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

Leonard Mada

2023-Oct-18 16:24 UTC

head link

[R] Best way to test for numeric digits?

Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to 
refactor and enhance the Rpdb package, see Rpdb::elements:
https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs 
to iterate through all chemical symbols (in decreasing nchar). Elements 
like C, and especially O, P or S, appear late in the regex expression - 
but are quite common in chemistry.

The alternative regex is (in this respect) simpler. It actually works 
(once you know about the workaround).

Q: My question focused if there is anything like is.numeric, but to 
parse each element of a vector.

Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:> ?s 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:
>> Dear List members,
>>
>> What is the best way to test for numeric digits?
>>
>> suppressWarnings(as.double(c("Li", "Na",
"K",? "2", "Rb", "Ca", "3")))
>> # [1] NA NA NA? 2 NA NA? 3
>> The above requires the use of the suppressWarnings function. Are there
>> any better ways?
>>
>> I was working to extract chemical elements from a formula, something
>> like this:
>> split.symbol.character = function(x, rm.digits = TRUE) {
>>   ?? ?# Perl is partly broken in R 4.3, but this works:
>>   ?? ?regex =
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>>   ?? ?# stringi::stri_split(x, regex = regex);
>>   ?? ?s = strsplit(x, regex, perl = TRUE);
>>   ?? ?if(rm.digits) {
>>   ?? ???? s = lapply(s, function(s) {
>>   ?? ???? ??? isNotD = is.na(suppressWarnings(as.numeric(s)));
>>   ?? ???? ??? s = s[isNotD];
>>   ?? ???? });
>>   ?? ?}
>>   ?? ?return(s);
>> }
>>
>> split.symbol.character(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"))
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>>
>> Note:
>> # works:
>> regex =
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>> strsplit(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>>
>>
>> # broken in R 4.3.1
>> # only slightly "erroneous" with stringi::stri_split
>> regex =
"(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>> strsplit(c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
>> PLEASE do read the posting guide
>>
https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> If you want to extract chemical elements symbols, the following might work.
> It uses the periodic table in GitHub package chemr and a package stringr
> function.
>
>
> devtools::install_github("paleolimbot/chemr")
>
>
>
> split_chem_elements <- function(x) {
>     data(pt, package = "chemr", envir = environment())
>     el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
>     pat <- paste(el, collapse = "|")
>     stringr::str_extract_all(x, pat)
> }
>
> mol <- c("CCl3F", "Li4Al4H16",
"CCl2CO2AlPO4SiO4Cl")
> split_chem_elements(mol)
> #> [[1]]
> #> [1] "C"  "Cl" "F"
> #>
> #> [[2]]
> #> [1] "Li" "Al" "H"
> #>
> #> [[3]]
> #>  [1] "C"  "Cl" "C"  "O" 
"Al" "P"  "O"  "Si" "O" 
"Cl"
>
>
> It is also possible to rewrite the function without calls to non base
> packages but that will take some more work.
>
> Hope this helps,
>
> Rui Barradas
>
>

R help - Oct 2023 - Best way to test for numeric digits?

[R] Best way to test for numeric digits?

[R] Best way to test for numeric digits?