So I have the following data frame and I want to know how I can remove all "NA" values from each string, and also remove all "|" values from the START of the string. So they should something like "auto|insurance" or "auto|insurance|quote" one = data.frame(keyword=c("|auto", "NA|auto|insurance|quote", "NA|auto|insurance", "NA|insurance", "NA|auto|insurance", "<NA>")) one Can anyone point me in the right direction? I'm still not too familiar with regex or gsub to find a solution, and there doesn't seem to be anything helpful in the stringr package for this task. Thanks -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]]
There are a couple of ambiguities in your request, but this should get you started:> one$keyword <- gsub("NA\\|", "", one$keyword) > one$keyword <- gsub("^\\|", "", one$keyword) > onekeyword 1 auto 2 auto|insurance|quote 3 auto|insurance 4 insurance 5 auto|insurance 6 <NA> Note that this won't remove values that are actually NA, as in row 6. Also note that your keyword values are factors rather than character strings. You may well want to add stringsAsFactors=FALSE to your data.frame() command. Sarah On Thu, Jul 19, 2012 at 3:21 PM, Abraham Mathew <abmathewks at gmail.com> wrote:> So I have the following data frame and I want to know how I can remove all > "NA" values from each string, and also > remove all "|" values from the START of the string. So they should > something like "auto|insurance" or "auto|insurance|quote" > > one = data.frame(keyword=c("|auto", "NA|auto|insurance|quote", > "NA|auto|insurance", > "NA|insurance", "NA|auto|insurance", "<NA>")) > > one > > > Can anyone point me in the right direction? I'm still not too familiar with > regex or gsub to find a solution, and there doesn't seem > to be anything helpful in the stringr package for this task. > > > Thanks >-- Sarah Goslee http://www.functionaldiversity.org
Hi, Try this: one = data.frame(keyword=c("|auto", "NA|auto|insurance|quote", "NA|auto|insurance", ?????????????????????????? "NA|insurance", "NA|auto|insurance", "<NA>")) onenew<-data.frame(keyword=gsub("(NA){0,1}\\|","",one$keyword)) onenew1<-data.frame(keyword=gsub("(<NA>){0,1}","",onenew$keyword)) ?onenew1 ???????????? keyword 1?????????????? auto 2 autoinsurancequote 3????? autoinsurance 4????????? insurance 5????? autoinsurance 6?????????????????? A.K. ----- Original Message ----- From: Abraham Mathew <abmathewks at gmail.com> To: r-help at r-project.org Cc: Sent: Thursday, July 19, 2012 3:21 PM Subject: [R] Removing values from a string So I have the following data frame and I want to know how I can remove all "NA" values from each string, and also remove all "|" values from the START of the string. So they should something like "auto|insurance" or "auto|insurance|quote" one = data.frame(keyword=c("|auto", "NA|auto|insurance|quote", "NA|auto|insurance", ? ? ? ? ? ? ? ? ? ? ? ? ? "NA|insurance", "NA|auto|insurance", "<NA>")) one Can anyone point me in the right direction? I'm still not too familiar with regex or gsub to find a solution, and there doesn't seem to be anything helpful in the stringr package for this task. Thanks -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, This should make much more sense onenew<-data.frame(keyword=gsub("(NA){0,1}\\|"," ",one$keyword)) onenew1<-data.frame(keyword=gsub("(<NA>){0,1}","",onenew$keyword)) onenew1 ?????????????? keyword 1???????????????? auto 2 auto insurance quote 3?????? auto insurance 4??????????? insurance 5?????? auto insurance 6???????????????????? A.K. ----- Original Message ----- From: Abraham Mathew <abmathewks at gmail.com> To: r-help at r-project.org Cc: Sent: Thursday, July 19, 2012 3:21 PM Subject: [R] Removing values from a string So I have the following data frame and I want to know how I can remove all "NA" values from each string, and also remove all "|" values from the START of the string. So they should something like "auto|insurance" or "auto|insurance|quote" one = data.frame(keyword=c("|auto", "NA|auto|insurance|quote", "NA|auto|insurance", ? ? ? ? ? ? ? ? ? ? ? ? ? "NA|insurance", "NA|auto|insurance", "<NA>")) one Can anyone point me in the right direction? I'm still not too familiar with regex or gsub to find a solution, and there doesn't seem to be anything helpful in the stringr package for this task. Thanks -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.