Peter Bishop
2020-Aug-24 10:27 UTC
[R] Matching backslash in a table's column using R language
In SQL, I'm using R as a way to filter data based on: - 20 characters in the range <space> to <tilde> - excluding <quote>, <apostrophe>, <comma>, <question mark>, <backslash>, <backtick> Given a SQL column containing the data: code ---- A\BCDEFG and the T-SQL script: EXEC [sys].[sp_execute_external_script] @language=N'R', @script=N' pattern1 = "^[\x20-\x7e]{1,20}$" pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]" outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & !grepl(pattern2, code, perl=TRUE))', @input_data_1 = N'SELECT [code] FROM [dbo].[products]', @input_data_1_name = N'inData', @output_data_1_name = N'outData' WITH RESULT SETS (AS OBJECT [dbo].[products]); GO why does the row detailed above get returned? I know that backslash is a special character but not in the SQL table. Consequently, the T-SQL code: SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table] returns 92 (the ASCII code for <backslash>) which shows that this is being recognised as a backslash character and not as an escape indicator for the following "B". Can anyone advise how I can filter out the <backslash> in the way that the other identified characters are being successfully filtered? As the data is being retrieved from a table, I can?t ask the data provider to use ?\\? instead of ?\? as that will be invalid for other uses. Thanks. [[alternative HTML version deleted]]
Bert Gunter
2020-Aug-25 18:26 UTC
[R] Matching backslash in a table's column using R language
1. I am far from an expert on such matters 2. It is unclear to me what your input is -- I assume a file. The problem, as you indicate, is that R's parser sees "\B" as an incorrect escape character, so, for example:> cat("\B")Error: '\B' is an unrecognized escape in character string starting ""\B" In any case, I think you should look at ?scan. Here is an example where I scan from the keyboard first and then remove the "\". You may have to scan from a file to do this.> z <-scan(file = "", what = "character")1: A\BCDEFG 2: #CR terminates input Read 1 item> cat(z)A\BCDEFG> nchar(z)[1] 8 ## scan read in the "\" as a single character from the console.> sub("\\\\","",z) ## Yes, 4 backslashes[1] "ABCDEFG" There may be better ways to do this, but as I said, I'm no expert. BTW, in posting here, please post in *plain text,* as the server can mangle html. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop <bishop_peterj at hotmail.com> wrote:> In SQL, I'm using R as a way to filter data based on: > - 20 characters in the range <space> to <tilde> > - excluding <quote>, <apostrophe>, <comma>, <question mark>, > <backslash>, <backtick> > > Given a SQL column containing the data: > > code > ---- > A\BCDEFG > > and the T-SQL script: > > EXEC [sys].[sp_execute_external_script] > @language=N'R', > @script=N' > pattern1 = "^[\x20-\x7e]{1,20}$" > pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]" > > outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & > !grepl(pattern2, code, perl=TRUE))', > @input_data_1 = N'SELECT [code] FROM [dbo].[products]', > @input_data_1_name = N'inData', > @output_data_1_name = N'outData' > WITH > RESULT SETS (AS OBJECT [dbo].[products]); > GO > > why does the row detailed above get returned? I know that backslash is a > special character but not in the SQL table. Consequently, the T-SQL code: > > SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table] > > returns 92 (the ASCII code for <backslash>) which shows that this is being > recognised as a backslash character and not as an escape indicator for the > following "B". > > Can anyone advise how I can filter out the <backslash> in the way that the > other identified characters are being successfully filtered? As the data is > being retrieved from a table, I can?t ask the data provider to use ?\\? > instead of ?\? as that will be invalid for other uses. > > Thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Peter Bishop
2020-Aug-25 19:16 UTC
[R] Matching backslash in a table's column using R language
The feed is coming from a SQL table and this is using the embedded support for R which comes with SQL 2016. The source is therefore a SELECT statement. As an aside, I found a workaround by changing the pattern from: "[\x22\x27\x2c\x3f\x5c\x60]" to: "[\x22\x27\x2c\x3f\x5c\x5c\x60]" This seems to be escaping the backslash in the R script rather than in the data - which confuses me. ________________________________ From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Wednesday, 26 August 2020 4:26 AM To: Peter Bishop <bishop_peterj at hotmail.com> Cc: r-help at r-project.org <r-help at r-project.org> Subject: Re: [R] Matching backslash in a table's column using R language 1. I am far from an expert on such matters 2. It is unclear to me what your input is -- I assume a file. The problem, as you indicate, is that R's parser sees "\B" as an incorrect escape character, so, for example:> cat("\B")Error: '\B' is an unrecognized escape in character string starting ""\B" In any case, I think you should look at ?scan. Here is an example where I scan from the keyboard first and then remove the "\". You may have to scan from a file to do this.> z <-scan(file = "", what = "character")1: A\BCDEFG 2: #CR terminates input Read 1 item> cat(z)A\BCDEFG> nchar(z)[1] 8 ## scan read in the "\" as a single character from the console.> sub("\\\\","",z) ## Yes, 4 backslashes[1] "ABCDEFG" There may be better ways to do this, but as I said, I'm no expert. BTW, in posting here, please post in *plain text,* as the server can mangle html. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 25, 2020 at 9:02 AM Peter Bishop <bishop_peterj at hotmail.com<mailto:bishop_peterj at hotmail.com>> wrote: In SQL, I'm using R as a way to filter data based on: - 20 characters in the range <space> to <tilde> - excluding <quote>, <apostrophe>, <comma>, <question mark>, <backslash>, <backtick> Given a SQL column containing the data: code ---- A\BCDEFG and the T-SQL script: EXEC [sys].[sp_execute_external_script] @language=N'R', @script=N' pattern1 = "^[\x20-\x7e]{1,20}$" pattern2 = "[\x22\x27\x2c\x3f\x5c\x60]" outData <- subset(inData, grepl(pattern1, code, perl=TRUE) & !grepl(pattern2, code, perl=TRUE))', @input_data_1 = N'SELECT [code] FROM [dbo].[products]', @input_data_1_name = N'inData', @output_data_1_name = N'outData' WITH RESULT SETS (AS OBJECT [dbo].[products]); GO why does the row detailed above get returned? I know that backslash is a special character but not in the SQL table. Consequently, the T-SQL code: SELECT ASCII(SUBSTRING([value], 2, 1)) FROM [table] returns 92 (the ASCII code for <backslash>) which shows that this is being recognised as a backslash character and not as an escape indicator for the following "B". Can anyone advise how I can filter out the <backslash> in the way that the other identified characters are being successfully filtered? As the data is being retrieved from a table, I can?t ask the data provider to use ?\\? instead of ?\? as that will be invalid for other uses. Thanks. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7C%7C21350c4700bf48cb035008d849245d6c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339767813946809&sdata=v79GeIK6IXhKuF4q0qJ3uopMdJeYzKKvdi5a4NqgdeM%3D&reserved=0> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=02%7C01%7C%7C21350c4700bf48cb035008d849245d6c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637339767813956802&sdata=iKgAGe23wPBrIK4iI3q0Vqk19q%2B%2FgSgiRUV858XOU3A%3D&reserved=0> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]