Good afternoon. sessionInfo() #R version 3.5.3 (2019-03-11) #Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows >= 8 x64 (build 9200) I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer. Works fine except where the second segment has a leading 0, then it is eliminating the 0 Example "73-0700090" becomes " 73700090" "77-0633896" becomes "77633896" Is there a remedy for this? tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "") head(tb2a$TID,n=10) [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117"> head(tb2a$TID2,n=10)[1] "111352310" "452711804" "356001540" "77633896" "621762545" "611029768" "73700090" "47376604" "47486026" "383833117" I have googled the problem and have not found a solution. http://www.endmemo.com/program/R/gsub.php http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html Thank you. WHP Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
On Fri, 15 Mar 2019 19:45:27 +0000 Bill Poling <Bill.Poling at zelis.com> wrote: Hello Bill,> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")Is the pattern supposed to mean something besides the "-" you want to remove? For the problem you describe, pattern="-" should be enough. It should locate all hyphens in the string and replace them with empty strings, i.e. remove them. -- Best regards, Ivan
If you want to remove just the hyphen, why not do sub("-", "", tb2a$TID) sub("-", "", "73-017323") [1] "73017323" Am I missing something? Peter On Fri, Mar 15, 2019 at 12:46 PM Bill Poling <Bill.Poling at zelis.com> wrote:> > Good afternoon. > > sessionInfo() > #R version 3.5.3 (2019-03-11) > #Platform: x86_64-w64-mingw32/x64 (64-bit) > #Running under: Windows >= 8 x64 (build 9200) > > I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer. > > Works fine except where the second segment has a leading 0, then it is eliminating the 0 > > Example "73-0700090" becomes " 73700090" > "77-0633896" becomes "77633896" > > Is there a remedy for this? > > tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "") > > head(tb2a$TID,n=10) > [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117" > > head(tb2a$TID2,n=10) > [1] "111352310" "452711804" "356001540" "77633896" "621762545" "611029768" "73700090" "47376604" "47486026" "383833117" > > I have googled the problem and have not found a solution. > > http://www.endmemo.com/program/R/gsub.php > http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html > > > Thank you. > > WHP > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Your pattern seems ... way overboard? Why not gsub("-", "", tb2a$TID) On March 15, 2019 12:45:27 PM PDT, Bill Poling <Bill.Poling at zelis.com> wrote:>Good afternoon. > >sessionInfo() >#R version 3.5.3 (2019-03-11) >#Platform: x86_64-w64-mingw32/x64 (64-bit) >#Running under: Windows >= 8 x64 (build 9200) > >I am using gsub function to remove a hyphen in a 9 character column of >values in order to convert it to integer. > >Works fine except where the second segment has a leading 0, then it is >eliminating the 0 > >Example "73-0700090" becomes " 73700090" > "77-0633896" becomes "77633896" > >Is there a remedy for this? > >tb2a$TID2 <- > >head(tb2a$TID,n=10) >[1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" >"61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117" >> head(tb2a$TID2,n=10) >[1] "111352310" "452711804" "356001540" "77633896" "621762545" >"611029768" "73700090" "47376604" "47486026" "383833117" > >I have googled the problem and have not found a solution. > >http://www.endmemo.com/program/R/gsub.php >http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html > > >Thank you. > >WHP > >Confidentiality Notice This message is sent from Zelis. >...{{dropped:13}} > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Good morning Peter, yes that works fine. My attempt was based on a google search that looked promising but was obviously more complicated than it needed to be. Thank you. WHP From: Peter Langfelder <peter.langfelder at gmail.com> Sent: Friday, March 15, 2019 3:53 PM To: Bill Poling <Bill.Poling at zelis.com> Cc: r-help (r-help at r-project.org) <r-help at r-project.org> Subject: Re: [R] Help with gsub function If you want to remove just the hyphen, why not do sub("-", "", tb2a$TID) sub("-", "", "73-017323") [1] "73017323" Am I missing something? Peter On Fri, Mar 15, 2019 at 12:46 PM Bill Poling <mailto:Bill.Poling at zelis.com> wrote:> > Good afternoon. > > sessionInfo() > #R version 3.5.3 (2019-03-11) > #Platform: x86_64-w64-mingw32/x64 (64-bit) > #Running under: Windows >= 8 x64 (build 9200) > > I am using gsub function to remove a hyphen in a 9 character column of values in order to convert it to integer. > > Works fine except where the second segment has a leading 0, then it is eliminating the 0 > > Example "73-0700090" becomes " 73700090" > "77-0633896" becomes "77633896" > > Is there a remedy for this? > > tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "") > > head(tb2a$TID,n=10) > [1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" "61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117" > > head(tb2a$TID2,n=10) > [1] "111352310" "452711804" "356001540" "77633896" "621762545" "611029768" "73700090" "47376604" "47486026" "383833117" > > I have googled the problem and have not found a solution. > > http://www.endmemo.com/program/R/gsub.php > http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html > > > Thank you. > > WHP > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} > > ______________________________________________ > mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
Yep, thank you Jeff, consequence of the first url I landed on asking how to do it and rushing off. All set now. Appreciate your help. WHP From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> Sent: Friday, March 15, 2019 4:00 PM To: r-help at r-project.org; Bill Poling <Bill.Poling at zelis.com>; r-help (r-help at r-project.org) <r-help at r-project.org> Subject: Re: [R] Help with gsub function Your pattern seems ... way overboard? Why not gsub("-", "", tb2a$TID) On March 15, 2019 12:45:27 PM PDT, Bill Poling <mailto:Bill.Poling at zelis.com> wrote:>Good afternoon. > >sessionInfo() >#R version 3.5.3 (2019-03-11) >#Platform: x86_64-w64-mingw32/x64 (64-bit) >#Running under: Windows >= 8 x64 (build 9200) > >I am using gsub function to remove a hyphen in a 9 character column of >values in order to convert it to integer. > >Works fine except where the second segment has a leading 0, then it is >eliminating the 0 > >Example "73-0700090" becomes " 73700090" > "77-0633896" becomes "77633896" > >Is there a remedy for this? > >tb2a$TID2 <- > >head(tb2a$TID,n=10) >[1] "11-1352310" "45-2711804" "35-6001540" "77-0633896" "62-1762545" >"61-1029768" "73-0700090" "47-0376604" "47-0486026" "38-3833117" >> head(tb2a$TID2,n=10) >[1] "111352310" "452711804" "356001540" "77633896" "621762545" >"611029768" "73700090" "47376604" "47486026" "383833117" > >I have googled the problem and have not found a solution. > >http://www.endmemo.com/program/R/gsub.php >http://r.789695.n4.nabble.com/extracting-characters-from-string-td3298971.html > > >Thank you. > >WHP > >Confidentiality Notice This message is sent from Zelis. >...{{dropped:13}} > >______________________________________________ >mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity. Confidentiality Notice This message is sent from Zelis. This transmission may contain information which is privileged and confidential and is intended for the personal and confidential use of the named recipient only. Such information may be protected by applicable State and Federal laws from this disclosure or unauthorized use. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any disclosure, review, discussion, copying, or taking any action in reliance on the contents of this transmission is strictly prohibited. If you have received this transmission in error, please contact the sender immediately. Zelis, 2018.
> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")Just to add something on why this didn't work ... It looks like you were trying to match a hyphen followed by a number up to seven digits. by mistake(?) you gave the digit range as [0-0] so it would repmatch a hyphen followed by between none and seven zeroes. When it met "-0" it matched that. And because it was gsub, it replaced what it matched. If you'd given it the right digit range it would have replaced the whole of the number. If you _really_ wanted to do that kind of thing (control the following pattern), you'd have needed something like (untested) gsub("-([0-0]{0,7})", "\\1", tb2a$TID) #The () means 'remember this bit"; the "\\1" means "put the first thing you remember here". And it needs to be "\\1" because that becomes "\1" for the grep parser. Steve E ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Good morning Steve. Terrific, so kind of you to follow-up. I will add that to my ever growing R bag of tips and tricks. Cheers. WHP William H. Poling, Ph.D., MPH | Manager, Revenue Development Data Intelligence & Analytics Zelis Healthcare -----Original Message----- From: S Ellison <S.Ellison at LGCGroup.com> Sent: Monday, March 18, 2019 8:32 AM To: Bill Poling <Bill.Poling at zelis.com>; r-help (r-help at r-project.org) <r-help at r-project.org> Subject: RE: Help with gsub function> tb2a$TID2 <- gsub(tb2a$TID, pattern="-[0-0]{0,7}", replacement = "")Just to add something on why this didn't work ... It looks like you were trying to match a hyphen followed by a number up to seven digits. by mistake(?) you gave the digit range as [0-0] so it would repmatch a hyphen followed by between none and seven zeroes. When it met "-0" it matched that. And because it was gsub, it replaced what it matched. If you'd given it the right digit range it would have replaced the whole of the number. If you _really_ wanted to do that kind of thing (control the following pattern), you'd have needed something like (untested) gsub("-([0-0]{0,7})", "\\1", tb2a$TID) #The () means 'remember this bit"; the "\\1" means "put the first thing you remember here". And it needs to be "\\1" because that becomes "\1" for the grep parser. Steve E ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:22}}