Hey, worked like a charm! :) Could you please explain about sub("^([0-9]*).*$", "\\1", fields) Thanks, Abhinaba On Mon, Nov 30, 2015 at 4:47 PM, <phgrosjean at sciviews.org> wrote:> fields <- c("2154333b-3208-4519-8b76-acaef5b5a479", > "980958a0-103b-4ba9-afaf-27b2f5c24e69", > "00966654-0dea-4899-b8cf-26e8300b262d") > sub("^([0-9]*).*$", "\\1", fields) > > Best, > > Philippe Grosjean > > > On 30 Nov 2015, at 11:39, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote: > > > > Hi, > > > > I have a field with alpha numeric codes like, > > > > 2154333b-3208-4519-8b76-acaef5b5a479 980958a0-103b-4ba9-afaf-27b2f5c24e69 > > 00966654-0dea-4899-b8cf-26e8300b262d > > I want a derived field which will contain ONLY the numeric part before > the > > first alphabet and the first '-', > > > > for example the derived field from the sample above will give me > > > > 2154333 > > 980958 > > 00966654 > > > > How can this be achieved in R? > > > > P.S. I do not have much knowledge on regex. It would be of great help if > > you could suggest some reading for beginners. > > > > Thanks, > > Abhinaba > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
> Could you please explain about > > sub("^([0-9]*).*$", "\\1", fields)See ?regex and the extensive online literature on regular expressions. S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
phgrosjean at sciviews.org
2015-Nov-30 13:57 UTC
[R] Extracting part of alpha numeric string
> On 30 Nov 2015, at 13:09, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote: > > Hey, > > worked like a charm! :) > > Could you please explain about > > sub("^([0-9]*).*$", "\\1", fields) >Yes. sub() replaces substrings. The first argument captures the interesting part of the string: ^ = start of the string, ([0-9]*) = capture of the interesting part of the string. [0-9] means any figure from 0 to 9. * means 1 or more of these characters, and () is used to capture the substring, .* = all the rest. Dot (.) means any character, and * means again one or more of these characters, $ = the end of the string. The whole regular expression matches the whole string and captures the interesting part inside the (). The second argument is the replacement. //1 means the first captured substring. Thus, globally, we replace the whole string by the captured substring. Best, Philippe Grosjean> Thanks, > Abhinaba > > On Mon, Nov 30, 2015 at 4:47 PM, <phgrosjean at sciviews.org <mailto:phgrosjean at sciviews.org>> wrote: > fields <- c("2154333b-3208-4519-8b76-acaef5b5a479", "980958a0-103b-4ba9-afaf-27b2f5c24e69", > "00966654-0dea-4899-b8cf-26e8300b262d") > sub("^([0-9]*).*$", "\\1", fields) > > Best, > > Philippe Grosjean > > > On 30 Nov 2015, at 11:39, Abhinaba Roy <abhinabaroy09 at gmail.com <mailto:abhinabaroy09 at gmail.com>> wrote: > > > > Hi, > > > > I have a field with alpha numeric codes like, > > > > 2154333b-3208-4519-8b76-acaef5b5a479 980958a0-103b-4ba9-afaf-27b2f5c24e69 > > 00966654-0dea-4899-b8cf-26e8300b262d > > I want a derived field which will contain ONLY the numeric part before the > > first alphabet and the first '-', > > > > for example the derived field from the sample above will give me > > > > 2154333 > > 980958 > > 00966654 > > > > How can this be achieved in R? > > > > P.S. I do not have much knowledge on regex. It would be of great help if > > you could suggest some reading for beginners. > > > > Thanks, > > Abhinaba > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
> On 30 Nov 2015, at 14:57, phgrosjean at sciviews.org wrote: > > >> On 30 Nov 2015, at 13:09, Abhinaba Roy <abhinabaroy09 at gmail.com> wrote: >> >> Hey, >> >> worked like a charm! :) >> >> Could you please explain about >> >> sub("^([0-9]*).*$", "\\1", fields) >> > > Yes. > > sub() replaces substrings. The first argument captures the interesting part of the string: > > ^ = start of the string, > > ([0-9]*) = capture of the interesting part of the string. [0-9] means any figure from 0 to 9. * means 1 or more of these characters, and () is used to capture the substring, > > .* = all the rest. Dot (.) means any character, and * means again one or more of these characters, > > $ = the end of the string.Small correction: * means zero or more characters according to ?regex. Berend