Let?s say I have a data set that includes a column of answers to a question ?What days of the week are you most likely to eat steak??. The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday, Saturday, Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa? How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th, F, Sa? as three separate entries for each unique observation? And is there a way to simultaneously tell R that, for example, ?Friday? is the same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.? Thanks for your assistance. -- View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html Sent from the R help mailing list archive at Nabble.com.
Hi, May be this helps: dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE) ?dat1 ???????????????????? Ans 1???????????????? Friday 2????????????? Wednesday 3 Friday,Saturday,Sunday 4?????????????? Saturday 5??????????????? Sat,Sun 6??????????????? Th,F,Sa ?vec1<- c("Su","M","Tu","W","Th","F","Sa") ?vec2<-unlist(strsplit(dat1$Ans,",")) vec2 ?#[1] "Friday"??? "Wednesday" "Friday"??? "Saturday"? "Sunday"??? "Saturday" ?#[7] "Sat"?????? "Sun"?????? "Th"??????? "F"???????? "Sa"????? sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #Su? M Tu? W Th? F Sa # 2? 0? 0? 1? 1? 3? 4 A.K. ----- Original Message ----- From: bcrombie <bcrombie at utk.edu> To: r-help at r-project.org Cc: Sent: Monday, June 17, 2013 1:59 PM Subject: [R] help with text patterns in strings Let?s say I have a data set that includes a column of answers to a question ?What days of the week are you most likely to eat steak??. The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday, Saturday, Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa? How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th, F, Sa? as three separate entries for each unique observation? And is there a way to simultaneously tell R that, for example, ?Friday? is the same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.? Thanks for your assistance. -- View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi, dat1$Ans<-tolower(dat1$Ans) #But, if you do this: ?vec1<- c("su","m","tu","w","th","f","sa") ?vec2<-unlist(strsplit(dat1$Ans,",")) ?sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #su? m tu? w th? f sa? # which is incorrect here "tu" got two matches in sa"tu"rday # 2? 0? 2? 1? 1? 3? 4 Instead: #Suppose your data looks like this: dat2<- data.frame(Ans=c("friday","wednesday","Friday,Saturday,sunday","saturday","sat,Sun","th,F,Sa"),stringsAsFactors=FALSE) vec2<- unlist(strsplit(dat2$Ans,",")) library(Hmisc) vec2New<-capitalize(vec2) vec2New #[1] "Friday"??? "Wednesday" "Friday"??? "Saturday"? "Sunday"??? "Saturday" ?#[7] "Sat"?????? "Sun"?????? "Th"??????? "F"???????? "Sa"?????? vec1<- c("Su","M","Tu","W","Th","F","Sa") sapply(vec1,function(x) length(vec2New[grep(x,vec2New)]) ) #Su? M Tu? W Th? F Sa # 2? 0? 0? 1? 1? 3? 4 #Or Using Bills' solution: dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)] # [1] "Friday"??? "Wednesday" "Friday"??? "Saturday"? "Sunday"??? "Saturday" ?#[7] "Saturday"? "Sunday"??? "Thursday"? "Friday"??? "Saturday" table(dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)]) #?? Friday? Saturday??? Sunday? Thursday Wednesday ?# ????? 3???????? 4???????? 2???????? 1???????? 1 A.K. ----- Original Message ----- From: "Crombie, Burnette N" <bcrombie at utk.edu> To: arun <smartpink111 at yahoo.com> Cc: Sent: Monday, June 17, 2013 4:12 PM Subject: RE: [R] help with text patterns in strings Arun, thanks.? Your script achieves the goal I stated, but now I'm tweaking it as I see possible obstacles with my real data. I anticipate the responses, since they are handwritten, with be a mixture of upper- & lowercase text, so I decided to prevent issues by using the "tolower()" function. It did not work as I intended when editing your script (see below). How do I use "tolower()" so that it will save the modification of my variable in the data frame. Do I have to rename the original data frame in order to save my changes (create new object)? dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE) dat1 #? ? ? ? ? ? ? ? Ans # 1? ? ? ? ? ? ? ? Friday # 2? ? ? ? ? ? ? Wednesday # 3 Friday,Saturday,Sunday # 4? ? ? ? ? ? ? Saturday # 5? ? ? ? ? ? ? ? Sat,Sun # 6? ? ? ? ? ? ? ? Th,F,Sa tolower(dat1$Ans) dat1 #the output I want: #? ? ? ? ? ? ? ? Ans # 1? ? ? ? ? ? ? ? friday # 2? ? ? ? ? ? ? wednesday # 3 friday,saturday,sunday # 4? ? ? ? ? ? ? saturday # 5? ? ? ? ? ? ? ? sat,sun # 6? ? ? ? ? ? ? ? th,f,sa #but the real R output is not all lowercase vec1<- c("su","m","tu","w","th","f","sa") vec2<-unlist(strsplit(dat1$Ans,",")) vec2 #the output I want #[1] "friday"? ? "wednesday" "friday"? ? "saturday"? "sunday"? ? "saturday" #[7] "sat"? ? ? "sun"? ? ? "th"? ? ? ? "f"? ? ? ? "sa" #but the real R output is not all lowercase sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #su? m tu? w th? f sa # 2? 0? 0? 1? 1? 3? 4 -----Original Message----- From: arun [mailto:smartpink111 at yahoo.com] Sent: Monday, June 17, 2013 3:15 PM To: Crombie, Burnette N Cc: R help Subject: Re: [R] help with text patterns in strings Hi, May be this helps: dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE) ?dat1 ???????????????????? Ans 1???????????????? Friday 2????????????? Wednesday 3 Friday,Saturday,Sunday 4?????????????? Saturday 5??????????????? Sat,Sun 6??????????????? Th,F,Sa ?vec1<- c("Su","M","Tu","W","Th","F","Sa") ?vec2<-unlist(strsplit(dat1$Ans,",")) vec2 ?#[1] "Friday"??? "Wednesday" "Friday"??? "Saturday"? "Sunday"??? "Saturday" ?#[7] "Sat"?????? "Sun"?????? "Th"??????? "F"???????? "Sa" sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #Su? M Tu? W Th? F Sa # 2? 0? 0? 1? 1? 3? 4 A.K. ----- Original Message ----- From: bcrombie <bcrombie at utk.edu> To: r-help at r-project.org Cc: Sent: Monday, June 17, 2013 1:59 PM Subject: [R] help with text patterns in strings Let?s say I have a data set that includes a column of answers to a question ?What days of the week are you most likely to eat steak??. The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday, Saturday, Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa? How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th, F, Sa? as three separate entries for each unique observation? And is there a way to simultaneously tell R that, for example, ?Friday? is the same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.? Thanks for your assistance. -- View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI Burnette, As this is continuation of the earlier thread, you could post it on the same thread by cc: to rhelp. Try this: res1<-sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) ) dat1<-data.frame(res1,Name=names(vec3)) ?dat1$Name<-factor(dat1$Name,levels=c("early","mid","late","wknd")) ?with(dat1,tapply(res1,list(Name),FUN=sum)) #early?? mid? late? wknd ?# ? 0???? 1???? 4???? 6 #or ?sapply(split(res1,names(vec3)),sum) #early? late?? mid? wknd ?# ? 0???? 4???? 1???? 6 A.K. ----- Original Message ----- From: "Crombie, Burnette N" <bcrombie at utk.edu> To: arun <smartpink111 at yahoo.com> Cc: Sent: Wednesday, June 19, 2013 3:55 PM Subject: RE: [R] help with text patterns in strings Arun, let me know if I should post this email separately, but it involves the script from our previous conversation.? I've been messing around as I think of potential scenarios with my data and am unclear how I can recount vec3 after assigning range names to the different days of the week.? For this example, I want my output to go from: #Su? M Tu? W Th? F Sa # 2? ? 0? 0? ? 1? 1? ? 3? 4 to: # early? mid? late? wknd #? 0? ? ? ? ? 1? ? ? ? ? 4? ? ? ? ? 6 Thanks for your help throughout, but, again, let me know if I should start a new thread. Burnette ########################################## Begin script ########################################## dat3<- read.csv("~/Rburnette/TextStringMatch.csv", stringsAsFactors=FALSE) dat3 #respondent.ID? ? ? ? ? ? response # 1? ? ? ? ? ? ? ? ? ? ? Friday # 2? ? ? ? ? ? ? ? ? ? ? Wednesday # 3? ? ? ? ? ? ? ? ? ? ? Friday, saturday,Sunday # 4? ? ? ? ? ? ? ? ? ? ? Saturday # 5? ? ? ? ? ? ? ? ? ? ? Sat, sun # 6? ? ? ? ? ? ? ? ? ? ? Th,F, Sa # Rename the variable ?response? to ?Ans? to fit the script that?s already been written # fix(dat3) can be used to do this manually, but then you need to keep "dat3" as the data frame, not "dat3edit" # if not familiar, fix() generates a popup window like a spreadsheet that can be edited, and character vs numeric property can be changed # the data set being ?fixed? is saved automatically upon closing, but I think only within the current R session # I think you need to redefine the fix() as a new object to keep the changes outside the R session? (need to test this) ########################################## library(gdata) dat3edit <- rename.vars(dat3,from="response", to="Ans") dat3edit #respondent.ID? ? ? ? ? ? Ans # 1? ? ? ? ? ? ? ? ? ? ? Friday # 2? ? ? ? ? ? ? ? ? ? ? Wednesday # 3? ? ? ? ? ? ? ? ? ? ? Friday, saturday,Sunday # 4? ? ? ? ? ? ? ? ? ? ? Saturday # 5? ? ? ? ? ? ? ? ? ? ? Sat, sun # 6? ? ? ? ? ? ? ? ? ? ? Th,F, Sa # get rid of the spaces embedded in text strings ########################################## dat3edit$Ans2 <- gsub(" ","",dat3edit$Ans) dat3edit$Ans2 # [1] "Friday"? ? ? ? ? ? ? ? "Wednesday"? ? ? ? ? ? ? "Friday,saturday,Sunday" "Saturday"? ? ? ? ? ? ? # [5] "Sat,sun"? ? ? ? ? ? ? ? "Th,F,Sa" # split up multiple responses within an observation so they can be counted separately ########################################## vec2<-unlist(strsplit(dat3edit$Ans2,",")) vec2 # [1] "Friday"? ? "Wednesday" "Friday"? ? "saturday"? "Sunday"? ? "Saturday"? "Sat"? ? ? "sun"? ? ? # [9] "Th"? ? ? ? "F"? ? ? ? "Sa" #consistently format all (split up) responses to start with a capital letter for more accurate matching to a ?universal? response code created in the next step ########################################## library(Hmisc) vec2New<-capitalize(vec2) vec2New # [1] "Friday"? ? "Wednesday" "Friday"? ? "Saturday"? "Sunday"? ? "Saturday"? "Sat"? ? ? "Sun"? ? ? # [9] "Th"? ? ? ? "F"? ? ? ? "Sa" #match capitalized data to a ?universal? response code of choice ########################################## vec3<- c("Su","M","Tu","W","Th","F","Sa") sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) ) #Su? M Tu? W Th? F Sa # 2? 0? 0? 1? 1? 3? 4 #assign range names to vec3 ########################################## names(vec3) <- c("wknd","early","early","mid","late","late","wknd") vec3 # wknd early early? mid? late? late? wknd # "Su"? "M"? "Tu"? "W"? "Th"? "F"? "Sa"