thr3ads.net - R help - [R] Removing a space from a string [Jul 2020]

If this information is useful, please help other people find it:
Share via:

Dennis Fisher

2020-Jul-28 20:20 UTC

[R] Removing a space from a string

R 4.0.2
OS X

Colleagues

I have strings that contain a space in an unexpected location.  The intended
string is:
	?STRING 01.  Remainder of the string"
However, variants are:
	?STR ING 01.  Remainder of the string"
	?STRIN G 01.  Remainder of the string"

I would like a general approach to deleting a space, but only if it appears
before the period.  Any suggestions on a regular expression for this?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com <http://www.plessthan.com/>





	[[alternative HTML version deleted]]

cpoiw@rt m@iii@g oii chemo@org@uk

2020-Jul-28 20:29 UTC

head link

[R] Removing a space from a string

On 2020-07-28 21:20, Dennis Fisher wrote:> R 4.0.2
> OS X
> 
> Colleagues
> 
> I have strings that contain a space in an unexpected location.  The
> intended string is:
> 	?STRING 01.  Remainder of the string"
> However, variants are:
> 	?STR ING 01.  Remainder of the string"
> 	?STRIN G 01.  Remainder of the string"
> 
> I would like a general approach to deleting a space, but only if it
> appears before the period.  Any suggestions on a regular expression
> for this?
You aren't deleting the space before 0? Is that in the requirement?

Dennis Fisher

2020-Jul-28 20:31 UTC

head link

[R] Removing a space from a string

Only the spaces in STRING.  However, if I inadvertently delete the space between
STRING and NN, I can add it back in.

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com <http://www.plessthan.com/>



> On Jul 28, 2020, at 1:29 PM, cpolwart at chemo.org.uk wrote:
> 
> On 2020-07-28 21:20, Dennis Fisher wrote:
>> R 4.0.2
>> OS X
>> Colleagues
>> I have strings that contain a space in an unexpected location.  The
>> intended string is:
>> 	?STRING 01.  Remainder of the string"
>> However, variants are:
>> 	?STR ING 01.  Remainder of the string"
>> 	?STRIN G 01.  Remainder of the string"
>> I would like a general approach to deleting a space, but only if it
>> appears before the period.  Any suggestions on a regular expression
>> for this?
> 
> You aren't deleting the space before 0? Is that in the requirement?

	[[alternative HTML version deleted]]

Dennis Fisher

2020-Jul-28 20:34 UTC

head link

[R] Removing a space from a string

It is possible that there will be > 1 space.  But, most likely only one
(i.e., a solution for one space will suffice; a solution for > 1 space would
be even better)

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com <http://www.plessthan.com/>



> On Jul 28, 2020, at 1:29 PM, cpolwart at chemo.org.uk wrote:
> 
> On 2020-07-28 21:20, Dennis Fisher wrote:
>> R 4.0.2
>> OS X
>> Colleagues
>> I have strings that contain a space in an unexpected location.  The
>> intended string is:
>> 	?STRING 01.  Remainder of the string"
>> However, variants are:
>> 	?STR ING 01.  Remainder of the string"
>> 	?STRIN G 01.  Remainder of the string"
>> I would like a general approach to deleting a space, but only if it
>> appears before the period.  Any suggestions on a regular expression
>> for this?
> 
> You aren't deleting the space before 0? Is that in the requirement?

	[[alternative HTML version deleted]]

Rasmus Liland

2020-Jul-28 21:00 UTC

head link

[R] Removing a space from a string

Dear Dennis,

On 2020-07-28 13:20 -0700, Dennis Fisher wrote:
| Colleagues
| 
| I have strings that contain a space in 
| an unexpected location.  The intended 
| string is:
| 	?STRING 01.  Remainder of the string"
| However, variants are:
| 	?STR ING 01.  Remainder of the string"
| 	?STRIN G 01.  Remainder of the string"
| 
| I would like a general approach to 
| deleting a space, but only if it 
| appears before the period.  Any 
| suggestions on a regular expression 
| for this?

Perhaps by using gregexpr to look for 
dots, remove spaces from the substring until the first 
finding, then pasting it back.

	strings <- 
	c("STRING 01.  Remainder of the string.",
	  "STR ING 01.  Remainder of the string.",
	  "STRIN G 01.  Remainder of the string.")
	
	search <- gregexpr("\\.", strings)
	lens <- nchar(strings)
	FUN <- function(i, strings, search, lens) {
	  before.dot <- substr(strings[i], 1, search[[i]][1])
	  before.dot <- gsub(" ", "", before.dot)
	  after.dot <- substr(strings[i], search[[i]][1]+1, lens[i])
	  return(paste0(before.dot, after.dot))
	}
	simplify2array(parallel::mclapply(
	  X=1:length(strings),
	  FUN=FUN,
	  strings=strings,
	  search=search,
	  lens=lens))

yields

	[1] "STRING01.  Remainder of the string."
	[2] "STRING01.  Remainder of the string."
	[3] "STRING01.  Remainder of the string."

Yes, I know, the space just before 01 
also disappears ... 

Best,
Rasmus

Rasmus Liland

2020-Jul-28 21:25 UTC

head link

[R] Removing a space from a string

On 2020-07-28 23:00 +0200, Rasmus Liland wrote:
| 
| Perhaps by using gregexpr to look for 
| dots, remove spaces from the substring until the first 
| finding, then pasting it back.
| 
| 	strings <- 
| 	c("STRING 01.  Remainder of the string.",
| 	  "STR ING 01.  Remainder of the string.",
| 	  "STRIN G 01.  Remainder of the string.")
| 	
| 	search <- gregexpr("\\.", strings)
| 	lens <- nchar(strings)
| 	FUN <- function(i, strings, search, lens) {
| 	  before.dot <- substr(strings[i], 1, search[[i]][1])
| 	  before.dot <- gsub(" ", "", before.dot)
| 	  after.dot <- substr(strings[i], search[[i]][1]+1, lens[i])
| 	  return(paste0(before.dot, after.dot))
| 	}
| 	simplify2array(parallel::mclapply(
| 	  X=1:length(strings),
| 	  FUN=FUN,
| 	  strings=strings,
| 	  search=search,
| 	  lens=lens))
| 
| yields
| 
| 	[1] "STRING01.  Remainder of the string."
| 	[2] "STRING01.  Remainder of the string."
| 	[3] "STRING01.  Remainder of the string."
| 
| Yes, I know, the space just before 01 
| also disappears ... 

I forgot about regexpr ... this is 
simpler I think:

	strings <- 
	c("STRING 01.  Remainder of the string.",
	  "STR ING 01.  Remainder of the string.",
	  "STRIN G 01.  Remainder of the string.")
	
	search <- regexpr("...\\.", strings)  # search for the first dot
and three chars in front of it
	ml <- attr(search, "match.length")
	paste0(
	  gsub(" ", "", substr(strings, 1, search)),
	  substr(strings, search, search+ml-1),
	  substr(strings, search+ml, nchar(strings))
	)

/Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20200728/59f0f2f3/attachment.sig>

Bert Gunter

2020-Jul-28 21:53 UTC

head link

[R] Removing a space from a string

1. Thanks for the nice reprex.
2. However, I thought there was still a bit of ambiguity. I interpreted
your specification to mean: "any number of spaces could occur in the
beginning alphabetic part of the strings before one or more digits occur
followed by a '.' (a period) and then more stuff after."
3. My strategy was simply to split the strings into the first part
consisting of the alphabetic characters and spaces and the second part with
the numbers and everything else. Then I just removed the spaces in the
first part. You can then concatenate them together again (using paste())
however you like. Thus

 >x
[1] "STRING 01.  Remainder of the string"  "STR ING 01. 
Remainder of the
string"
[3] "STRIN G 01.  Remainder of the string"> p1 <-gsub("
","",gsub("([^[:digit:]]+)[[:digit:]]+\\..*$","\\1",x))
> p2 <-
gsub("[^[:digit:]]+([[:digit:]]+\\..*$)","\\1",x)
> p1[1] "STRING" "STRING"
"STRING"> p2[1] "01.  Remainder of the string" "01.  Remainder of the
string"
[3] "01.  Remainder of the string"

I look forward to better approaches using basic regex's (no additional
packages), however.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jul 28, 2020 at 1:20 PM Dennis Fisher <fisher at plessthan.com>
wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location.  The
> intended string is:
>         ?STRING 01.  Remainder of the string"
> However, variants are:
>         ?STR ING 01.  Remainder of the string"
>         ?STRIN G 01.  Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it
> appears before the period.  Any suggestions on a regular expression for
> this?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com <http://www.plessthan.com/>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2020-Jul-29 00:06 UTC

head link

[R] Removing a space from a string

Note that my previous strategy can be expressed slightly more clearly as:

x <- c("STRING 01.  Remainder of the string",
"STR ING 01.  Remainder of the string",
"STRIN G 01.  Remainder of the string",
"STR  IN G 01.  Remainder of the string") ## more spaces in this last
example entry

rx <- "([^[:digit:]]+)([[:digit:]]+.+)"
> gsub(" ","",gsub(rx,"\\1",x))[1] "STRING" "STRING" "STRING" "STRING"
> gsub(rx,"\\2",x)[1] "01.  Remainder of the string" "01.  Remainder of the
string"
[3] "01.  Remainder of the string" "01.  Remainder of the
string"

Bert Gunter




On Tue, Jul 28, 2020 at 2:53 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 1. Thanks for the nice reprex.
> 2. However, I thought there was still a bit of ambiguity. I interpreted
> your specification to mean: "any number of spaces could occur in the
> beginning alphabetic part of the strings before one or more digits occur
> followed by a '.' (a period) and then more stuff after."
> 3. My strategy was simply to split the strings into the first part
> consisting of the alphabetic characters and spaces and the second part with
> the numbers and everything else. Then I just removed the spaces in the
> first part. You can then concatenate them together again (using paste())
> however you like. Thus
>
>  >x
> [1] "STRING 01.  Remainder of the string"  "STR ING 01. 
Remainder of the
> string"
> [3] "STRIN G 01.  Remainder of the string"
> > p1 <-gsub("
","",gsub("([^[:digit:]]+)[[:digit:]]+\\..*$","\\1",x))
> > p2 <-
gsub("[^[:digit:]]+([[:digit:]]+\\..*$)","\\1",x)
> > p1
> [1] "STRING" "STRING" "STRING"
> > p2
> [1] "01.  Remainder of the string" "01.  Remainder of the
string"
> [3] "01.  Remainder of the string"
>
> I look forward to better approaches using basic regex's (no additional
> packages), however.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Tue, Jul 28, 2020 at 1:20 PM Dennis Fisher <fisher at
plessthan.com>
> wrote:
>
>> R 4.0.2
>> OS X
>>
>> Colleagues
>>
>> I have strings that contain a space in an unexpected location.  The
>> intended string is:
>>         ?STRING 01.  Remainder of the string"
>> However, variants are:
>>         ?STR ING 01.  Remainder of the string"
>>         ?STRIN G 01.  Remainder of the string"
>>
>> I would like a general approach to deleting a space, but only if it
>> appears before the period.  Any suggestions on a regular expression for
>> this?
>>
>> Dennis
>>
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
>> www.PLessThan.com <http://www.plessthan.com/>
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Richard O'Keefe

2020-Jul-29 00:11 UTC

head link

[R] Removing a space from a string

The first response has to be "how did the spaces get there
in the first place?"  Can you fix the process that creates
the data?  If the process sometimes generates one extra
space, are you sure it never generates two?

But let's treat this purely as a regular expression
problem, where if there is a space before a dot you want
to delete the first.  In vi(1) you would do

s/^\([^  .]*\) \([^.]*\)/\1\2/

but apparently there is *supposed* to be a space before
the 01, so it is only when there are two or more spaces
that one should be deleted, so we'd want

s/^\([^  .]*\) \([^ .]* \)/\1\2/

I leave converting that to R as an exercise for the reader.




On Wed, 29 Jul 2020 at 08:20, Dennis Fisher <fisher at plessthan.com>
wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location.  The
> intended string is:
>         ?STRING 01.  Remainder of the string"
> However, variants are:
>         ?STR ING 01.  Remainder of the string"
>         ?STRIN G 01.  Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it
> appears before the period.  Any suggestions on a regular expression for
> this?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com <http://www.plessthan.com/>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Dennis Fisher

2020-Jul-29 00:19 UTC

head link

[R] Removing a space from a string

Richard

In reply to your ?first response?, the text was originally in a Word document
and it did NOT contain the errant spaces.  I used read_docx in the textreadr
package to access the text.  The spaces were added during that step.  I am
copying the maintainer of that package to see if he has any idea as to the
source.

Thanks for your regular expression suggestion.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com <http://www.plessthan.com/>



> On Jul 28, 2020, at 5:11 PM, Richard O'Keefe <raoknz at
gmail.com> wrote:
> 
> The first response has to be "how did the spaces get there
> in the first place?"  Can you fix the process that creates
> the data?  If the process sometimes generates one extra
> space, are you sure it never generates two?
> 
> But let's treat this purely as a regular expression
> problem, where if there is a space before a dot you want
> to delete the first.  In vi(1) you would do
> 
> s/^\([^  .]*\) \([^.]*\)/\1\2/
> 
> but apparently there is *supposed* to be a space before
> the 01, so it is only when there are two or more spaces
> that one should be deleted, so we'd want
> 
> s/^\([^  .]*\) \([^ .]* \)/\1\2/
> 
> I leave converting that to R as an exercise for the reader.
> 
> 
> 
> 
> On Wed, 29 Jul 2020 at 08:20, Dennis Fisher <fisher at plessthan.com
<mailto:fisher at plessthan.com>> wrote:
> R 4.0.2
> OS X
> 
> Colleagues
> 
> I have strings that contain a space in an unexpected location.  The
intended string is:
>         ?STRING 01.  Remainder of the string"
> However, variants are:
>         ?STR ING 01.  Remainder of the string"
>         ?STRIN G 01.  Remainder of the string"
> 
> I would like a general approach to deleting a space, but only if it appears
before the period.  Any suggestions on a regular expression for this?
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com <http://www.plessthan.com/>
<http://www.plessthan.com/ <http://www.plessthan.com/>>
> 
> 
> 
> 
> 
>         [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

R help - Jul 2020 - Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string

[R] Removing a space from a string