Richard In reply to your ?first response?, the text was originally in a Word document and it did NOT contain the errant spaces. I used read_docx in the textreadr package to access the text. The spaces were added during that step. I am copying the maintainer of that package to see if he has any idea as to the source. Thanks for your regular expression suggestion. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone / Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com <http://www.plessthan.com/>> On Jul 28, 2020, at 5:11 PM, Richard O'Keefe <raoknz at gmail.com> wrote: > > The first response has to be "how did the spaces get there > in the first place?" Can you fix the process that creates > the data? If the process sometimes generates one extra > space, are you sure it never generates two? > > But let's treat this purely as a regular expression > problem, where if there is a space before a dot you want > to delete the first. In vi(1) you would do > > s/^\([^ .]*\) \([^.]*\)/\1\2/ > > but apparently there is *supposed* to be a space before > the 01, so it is only when there are two or more spaces > that one should be deleted, so we'd want > > s/^\([^ .]*\) \([^ .]* \)/\1\2/ > > I leave converting that to R as an exercise for the reader. > > > > > On Wed, 29 Jul 2020 at 08:20, Dennis Fisher <fisher at plessthan.com <mailto:fisher at plessthan.com>> wrote: > R 4.0.2 > OS X > > Colleagues > > I have strings that contain a space in an unexpected location. The intended string is: > ?STRING 01. Remainder of the string" > However, variants are: > ?STR ING 01. Remainder of the string" > ?STRIN G 01. Remainder of the string" > > I would like a general approach to deleting a space, but only if it appears before the period. Any suggestions on a regular expression for this? > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone / Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com <http://www.plessthan.com/> <http://www.plessthan.com/ <http://www.plessthan.com/>> > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
The spaces may not have been VISIBLE in the Word document, but that does not mean that there wasn't anything THERE. - What happens if you open the document in Word and save it as plain text? - What happens if you open the document in Word and save it as RTF, then read that using read_rtf? - If you do that, what does the RTF look like? - Was the Word document typed by hand, or did was it the result of some other process? The thing is, "our troubles come not as single spies, but as whole battalions", so I'm wondering what _else_ is going wrong in the conversion process. [[alternative HTML version deleted]]
Hi! How about this? --- snip ---> x <- c("STRING 01. Remainder of the string","STR ING 01. Remainderof the string","STRIN G 01. Remainder of the string","STR IN G 01. Remainder of the string")> x1<-unlist(strsplit(x,"\\."))> for (i in seq(1,length(x1),2)) { x[(i+1) %/% 2]<-paste(gsub("","",x1[i]),x1[i+1],sep=".") }> x[1] "STRING01. Remainder of the string" "STRING01. Remainder of the string" "STRING01. Remainder of the string" [4] "STRING01. Remainder of the string" --- snip --- Or do I miss something? Best, Kimmo ti, 2020-07-28 kello 17:19 -0700, Dennis Fisher kirjoitti:> Richard > > In reply to your ?first response?, the text was originally in a Word > document and it did NOT contain the errant spaces. I used read_docx > in the textreadr package to access the text. The spaces were added > during that step. I am copying the maintainer of that package to see > if he has any idea as to the source. > > Thanks for your regular expression suggestion. > > Dennis > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone / Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com <http://www.plessthan.com/> > > > > > > On Jul 28, 2020, at 5:11 PM, Richard O'Keefe <raoknz at gmail.com> > > wrote: > > > > The first response has to be "how did the spaces get there > > in the first place?" Can you fix the process that creates > > the data? If the process sometimes generates one extra > > space, are you sure it never generates two? > > > > But let's treat this purely as a regular expression > > problem, where if there is a space before a dot you want > > to delete the first. In vi(1) you would do > > > > s/^\([^ .]*\) \([^.]*\)/\1\2/ > > > > but apparently there is *supposed* to be a space before > > the 01, so it is only when there are two or more spaces > > that one should be deleted, so we'd want > > > > s/^\([^ .]*\) \([^ .]* \)/\1\2/ > > > > I leave converting that to R as an exercise for the reader. > > > > > > > > > > On Wed, 29 Jul 2020 at 08:20, Dennis Fisher <fisher at plessthan.com > > <mailto:fisher at plessthan.com>> wrote: > > R 4.0.2 > > OS X > > > > Colleagues > > > > I have strings that contain a space in an unexpected location. The > > intended string is: > > ?STRING 01. Remainder of the string" > > However, variants are: > > ?STR ING 01. Remainder of the string" > > ?STRIN G 01. Remainder of the string" > > > > I would like a general approach to deleting a space, but only if it > > appears before the period. Any suggestions on a regular expression > > for this? > > > > Dennis > > > > Dennis Fisher MD > > P < (The "P Less Than" Company) > > Phone / Fax: 1-866-PLessThan (1-866-753-7784) > > www.PLessThan.com <http://www.plessthan.com/> < > > http://www.plessthan.com/ <http://www.plessthan.com/>> > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > > To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help < > > https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html < > > http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Richard Per your requests: 1. Plain text: no spaces 2. read_docx: spaces 3. read_rtf: no spaces 4. Not requested by you: copying from the Word document, then pasting into ?vim?: no spaces The Word document was created by hand but #1, #3, and #4 confirm that it contains no spaces. The offending entity here is textreadr:read_dicx That addresses how the spaces arose. But, my question was not about that ? rather I was looking for a general fix when that situation arises. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone / Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com <http://www.plessthan.com/>> On Jul 28, 2020, at 8:07 PM, Richard O'Keefe <raoknz at gmail.com> wrote: > > The spaces may not have been VISIBLE in the Word document, > but that does not mean that there wasn't anything THERE. > > - What happens if you open the document in Word and > save it as plain text? > - What happens if you open the document in Word and > save it as RTF, then read that using read_rtf? > - If you do that, what does the RTF look like? > - Was the Word document typed by hand, or did was it > the result of some other process? > > The thing is, "our troubles come not as single spies, > but as whole battalions", so I'm wondering what _else_ > is going wrong in the conversion process. > >[[alternative HTML version deleted]]