Hi Iago, This is not a bug. It is expected. Patterns may not overlap. However, there is a way to get the result you want using perl: ```R gsub("([aeiouAEIOU])(?=[aeiouAEIOU])", "\\1_", "aerioue", perl = TRUE) ``` The specific change I made is called a positive lookahead, you can read more about it here: https://www.regular-expressions.info/lookaround.html It's a way to check for a piece of text without consuming it in the match. Also, since you don't care about character case, it might be more legible to add ignore.case = TRUE and remove the upper case characters: ```R gsub("([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE, ignore.case TRUE) ## or gsub("(?i)([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE) ``` I hope this helps! On Fri, Mar 1, 2024, 06:37 Iago Gin? V?zquez <iago.gine at sjd.es> wrote:> Hi all, > > I tested next command: > > gsub("([aeiouAEIOU])([aeiouAEIOU])", "\\1_\\2", "aerioue") > > with the following output: > > [1] "a_eri_ou_e" > > So, there are two consecutive vowels where an underscore is not added. > > May it be a bug? Is it expected (bug or not)? Is there any chance to get > what I want (an underscore between each pair of consecutive vowels)? > > > Thank you! > > Best regards, > > Iago > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Iris, Thank you. Further, very nice solution. Best, Iago On 01/03/2024 12:49, Iris Simmons wrote:> Hi Iago, > > > This is not a bug. It is expected. Patterns may not overlap. However, there > is a way to get the result you want using perl: > > ```R > gsub("([aeiouAEIOU])(?=[aeiouAEIOU])", "\\1_", "aerioue", perl = TRUE) > ``` > > The specific change I made is called a positive lookahead, you can read > more about it here: > > https://www.regular-expressions.info/lookaround.html > > It's a way to check for a piece of text without consuming it in the match. > > Also, since you don't care about character case, it might be more legible > to add ignore.case = TRUE and remove the upper case characters: > > ```R > gsub("([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE, ignore.case > TRUE) > > ## or > > gsub("(?i)([aeiou])(?=[aeiou])", "\\1_", "aerioue", perl = TRUE) > ``` > > I hope this helps! > > > On Fri, Mar 1, 2024, 06:37 Iago Gin? V?zquez<iago.gine at sjd.es> wrote: > >> Hi all, >> >> I tested next command: >> >> gsub("([aeiouAEIOU])([aeiouAEIOU])", "\\1_\\2", "aerioue") >> >> with the following output: >> >> [1] "a_eri_ou_e" >> >> So, there are two consecutive vowels where an underscore is not added. >> >> May it be a bug? Is it expected (bug or not)? Is there any chance to get >> what I want (an underscore between each pair of consecutive vowels)? >> >> >> Thank you! >> >> Best regards, >> >> Iago >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>[[alternative HTML version deleted]]
Seemingly Similar Threads
- gsub issue with consecutive pattern finds
- gsub issue with consecutive pattern finds
- gsub issue with consecutive pattern finds
- Extract vowels and consonants using Ruby Regex
- Avoiding Delete key function as 'Quit R' in Rterm when there are no characters in cursor line