Sunny Singha
2016-Apr-25 06:35 UTC
[R] Please assist -- Unable to remove '-' character from char vector--
Hi, I have a char vector with year values. Some cells have single year value '2001-' and some have range like 1996-2007. I need to remove hyphen character '-' from all the values within the character vector named as 'end'. After removing the hyphen I need to get the last number from the cells where there are year range values i.e if the cell has range 1996-2007, the code should return me 2007. How could I get this done? Below are the values within this char vector:> end[1] "2001-" "1992-" "2013-" "2013-" "2013-" "2013-" [7] "2003-" "2010-" "2009-" "1986-" "2012-" "2003-" [13] "2005-" "2013-" "2003-" "2013-" "1993?2007, 2010-" "2012-" [19] "1984?1992, 1996-" "2015-" "2009-" "2000-" "2005-" "1997-" [25] "2012-" "1997-" "2002-" "2006-" "1992-" "2007-" [31] "1997-" "1982-" "2015-" "2015-" "2010-" "1996?2007, 2011-" [37] "2004-" "1999-" "2007-" "1996-" "2013-" "2012-" [43] "2012-" "2010-" "2011-" "1994-" "2014-" I tried below command--> gsub('[-|,]', '', end) This did remove all the hyphen character but not from cells having range year values.Below is the result after executing above command: As you see hypphen character is removed from single values but not from ranges. Please guide.> gsub('[-|,]', '', end)[1] "2001" "1992" "2013" "2013" "2013" "2013" "2003" [8] "2010" "2009" "1986" "2012" "2003" "2005" "2013" [15] "2003" "2013" "1993?2007 2010" "2012" "1984?1992 1996" "2015" "2009" [22] "2000" "2005" "1997" "2012" "1997" "2002" "2006" [29] "1992" "2007" "1997" "1982" "2015" "2015" "2010" [36] "1996?2007 2011" "2004" "1999" "2007" "1996" "2013" "2012" [43] "2012" "2010" "2011" "1994" "2014" Regards, Sunny Singha
Jim Lemon
2016-Apr-25 07:09 UTC
[R] Please assist -- Unable to remove '-' character from char vector--
Hi Sunny, Try this: # notice that I have replaced the fancy hyphens with real hyphens end<-c("2001-","1992-","2013-","2013-","2013-","2013-", "1993-2007","2010-","2012-","1984-1992","1996-","2015-") splitends<-sapply(end,strsplit,"-") last_bit(x) return(x[length(x)]) sapply(splitends,last_bit) Jim On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha <sunnysingha.analytics at gmail.com> wrote:> Hi, > I have a char vector with year values. Some cells have single year > value '2001-' and some have range like 1996-2007. > I need to remove hyphen character '-' from all the values within the > character vector named as 'end'. After removing the hyphen I need to > get the last > number from the cells where there are year range values i.e if the > cell has range 1996-2007, the code should return me 2007. > > How could I get this done? > > Below are the values within this char vector: > >> end > [1] "2001-" "1992-" "2013-" "2013-" > "2013-" "2013-" > [7] "2003-" "2010-" "2009-" "1986-" > "2012-" "2003-" > [13] "2005-" "2013-" "2003-" "2013-" > "1993?2007, 2010-" "2012-" > [19] "1984?1992, 1996-" "2015-" "2009-" "2000-" > "2005-" "1997-" > [25] "2012-" "1997-" "2002-" "2006-" > "1992-" "2007-" > [31] "1997-" "1982-" "2015-" "2015-" > "2010-" "1996?2007, 2011-" > [37] "2004-" "1999-" "2007-" "1996-" > "2013-" "2012-" > [43] "2012-" "2010-" "2011-" "1994-" > "2014-" > > I tried below command--> gsub('[-|,]', '', end) > This did remove all the hyphen character but not from cells having > range year values.Below is the result after executing above command: > As you see hypphen character is removed from single values but not > from ranges. Please guide. > >> gsub('[-|,]', '', end) > [1] "2001" "1992" "2013" "2013" > "2013" "2013" "2003" > [8] "2010" "2009" "1986" "2012" > "2003" "2005" "2013" > [15] "2003" "2013" "1993?2007 2010" "2012" > "1984?1992 1996" "2015" "2009" > [22] "2000" "2005" "1997" "2012" > "1997" "2002" "2006" > [29] "1992" "2007" "1997" "1982" > "2015" "2015" "2010" > [36] "1996?2007 2011" "2004" "1999" "2007" > "1996" "2013" "2012" > [43] "2012" "2010" "2011" "1994" > "2014" > > Regards, > Sunny Singha > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
PIKAL Petr
2016-Apr-25 07:59 UTC
[R] Please assist -- Unable to remove '-' character from char vector--
Hi> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon > Sent: Monday, April 25, 2016 9:10 AM > To: Sunny Singha <sunnysingha.analytics at gmail.com> > Cc: r-help <r-help at r-project.org>; Sandeep Singha > <sandeep.singha at acrotrend.com> > Subject: Re: [R] Please assist -- Unable to remove '-' character from char > vector-- > > Hi Sunny, > Try this: > > # notice that I have replaced the fancy hyphens with real hyphens > end<-c("2001-","1992-","2013-","2013-","2013-","2013-", > "1993-2007","2010-","2012-","1984-1992","1996-","2015-") > splitends<-sapply(end,strsplit,"-")> last_bit(x) return(x[length(x)])You probably meant last_bit <- function(x) return(x[length(x)])> sapply(splitends,last_bit)And good finalisation is as.numeric(sapply(splitends,last_bit)) Cheers Petr> > Jim > > On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha > <sunnysingha.analytics at gmail.com> wrote: > > Hi, > > I have a char vector with year values. Some cells have single year > > value '2001-' and some have range like 1996-2007. > > I need to remove hyphen character '-' from all the values within the > > character vector named as 'end'. After removing the hyphen I need to > > get the last > > number from the cells where there are year range values i.e if the > > cell has range 1996-2007, the code should return me 2007. > > > > How could I get this done? > > > > Below are the values within this char vector: > > > >> end > > [1] "2001-" "1992-" "2013-" "2013-" > > "2013-" "2013-" > > [7] "2003-" "2010-" "2009-" "1986-" > > "2012-" "2003-" > > [13] "2005-" "2013-" "2003-" "2013-" > > "1993?2007, 2010-" "2012-" > > [19] "1984?1992, 1996-" "2015-" "2009-" "2000-" > > "2005-" "1997-" > > [25] "2012-" "1997-" "2002-" "2006-" > > "1992-" "2007-" > > [31] "1997-" "1982-" "2015-" "2015-" > > "2010-" "1996?2007, 2011-" > > [37] "2004-" "1999-" "2007-" "1996-" > > "2013-" "2012-" > > [43] "2012-" "2010-" "2011-" "1994-" > > "2014-" > > > > I tried below command--> gsub('[-|,]', '', end) > > This did remove all the hyphen character but not from cells having > > range year values.Below is the result after executing above command: > > As you see hypphen character is removed from single values but not > > from ranges. Please guide. > > > >> gsub('[-|,]', '', end) > > [1] "2001" "1992" "2013" "2013" > > "2013" "2013" "2003" > > [8] "2010" "2009" "1986" "2012" > > "2003" "2005" "2013" > > [15] "2003" "2013" "1993?2007 2010" "2012" > > "1984?1992 1996" "2015" "2009" > > [22] "2000" "2005" "1997" "2012" > > "1997" "2002" "2006" > > [29] "1992" "2007" "1997" "1982" > > "2015" "2015" "2010" > > [36] "1996?2007 2011" "2004" "1999" "2007" > > "1996" "2013" "2012" > > [43] "2012" "2010" "2011" "1994" > > "2014" > > > > Regards, > > Sunny Singha > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Sunny Singha
2016-Apr-25 09:32 UTC
[R] Please assist -- Unable to remove '-' character from char vector--
Thank you Jim, The code did assist me to get the what I needed. Also, I learnt that there are different types of dashes (en-dash/em-dash/hyphen) as explained on this site : http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/ I achieved it by executing below command after going through this page on stackoverflow: http://stackoverflow.com/questions/9223795/how-to-correctly-deal-with-escaped-unicode-characters-in-r-e-g-the-em-dash splitends<-sapply(end,strsplit,"-|\u2013|,") where '\u2013' is, i guess, the unicode for en-dash/em-dash character in the ranges values. I had scrapped the HTML table from this web page : https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger and range values does have en-dash characters. For now the issue is resolved but how does one capture values similar to '\u2013' for other possible special cases to be specified in the regex ? Regards, Sunny Singha. On Mon, Apr 25, 2016 at 12:39 PM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Sunny, > Try this: > > # notice that I have replaced the fancy hyphens with real hyphens > end<-c("2001-","1992-","2013-","2013-","2013-","2013-", > "1993-2007","2010-","2012-","1984-1992","1996-","2015-") > splitends<-sapply(end,strsplit,"-") > last_bit(x) return(x[length(x)]) > sapply(splitends,last_bit) > > Jim > > On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha > <sunnysingha.analytics at gmail.com> wrote: >> Hi, >> I have a char vector with year values. Some cells have single year >> value '2001-' and some have range like 1996-2007. >> I need to remove hyphen character '-' from all the values within the >> character vector named as 'end'. After removing the hyphen I need to >> get the last >> number from the cells where there are year range values i.e if the >> cell has range 1996-2007, the code should return me 2007. >> >> How could I get this done? >> >> Below are the values within this char vector: >> >>> end >> [1] "2001-" "1992-" "2013-" "2013-" >> "2013-" "2013-" >> [7] "2003-" "2010-" "2009-" "1986-" >> "2012-" "2003-" >> [13] "2005-" "2013-" "2003-" "2013-" >> "1993?2007, 2010-" "2012-" >> [19] "1984?1992, 1996-" "2015-" "2009-" "2000-" >> "2005-" "1997-" >> [25] "2012-" "1997-" "2002-" "2006-" >> "1992-" "2007-" >> [31] "1997-" "1982-" "2015-" "2015-" >> "2010-" "1996?2007, 2011-" >> [37] "2004-" "1999-" "2007-" "1996-" >> "2013-" "2012-" >> [43] "2012-" "2010-" "2011-" "1994-" >> "2014-" >> >> I tried below command--> gsub('[-|,]', '', end) >> This did remove all the hyphen character but not from cells having >> range year values.Below is the result after executing above command: >> As you see hypphen character is removed from single values but not >> from ranges. Please guide. >> >>> gsub('[-|,]', '', end) >> [1] "2001" "1992" "2013" "2013" >> "2013" "2013" "2003" >> [8] "2010" "2009" "1986" "2012" >> "2003" "2005" "2013" >> [15] "2003" "2013" "1993?2007 2010" "2012" >> "1984?1992 1996" "2015" "2009" >> [22] "2000" "2005" "1997" "2012" >> "1997" "2002" "2006" >> [29] "1992" "2007" "1997" "1982" >> "2015" "2015" "2010" >> [36] "1996?2007 2011" "2004" "1999" "2007" >> "1996" "2013" "2012" >> [43] "2012" "2010" "2011" "1994" >> "2014" >> >> Regards, >> Sunny Singha >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- Please assist -- Unable to remove '-' character from char vector--
- Please assist -- Unable to remove '-' character from char vector--
- Please help(urgent) - How to simulate transactional data for reliability/survival analysis
- Please help(urgent) - How to simulate transactional data for reliability/survival analysis
- Fwd: Please help(immediate) - How to simulate transactional data for reliability/survival analysis