Let?s say I have a data set that includes a column of answers to a question ?What days of the week are you most likely to eat steak??. The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday, Saturday, Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa? How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th, F, Sa? as three separate entries for each unique observation? And is there a way to simultaneously tell R that, for example, ?Friday? is the same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.? Thanks for your assistance. -- View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html Sent from the R help mailing list archive at Nabble.com.
Hi,
May be this helps:
dat1<-
data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
?dat1
???????????????????? Ans
1???????????????? Friday
2????????????? Wednesday
3 Friday,Saturday,Sunday
4?????????????? Saturday
5??????????????? Sat,Sun
6??????????????? Th,F,Sa
?vec1<-
c("Su","M","Tu","W","Th","F","Sa")
?vec2<-unlist(strsplit(dat1$Ans,","))
vec2
?#[1] "Friday"??? "Wednesday" "Friday"???
"Saturday"? "Sunday"??? "Saturday"
?#[7] "Sat"?????? "Sun"?????? "Th"???????
"F"???????? "Sa"?????
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#Su? M Tu? W Th? F Sa 
# 2? 0? 0? 1? 1? 3? 4 
A.K.
----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, June 17, 2013 1:59 PM
Subject: [R] help with text patterns in strings
Let?s say I have a data set that includes a column of answers to a question
?What days of the week are you most likely to eat steak??.
The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday,
Saturday, Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa? 
How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th,
F, Sa? as three separate entries for each unique observation?
And is there a way to simultaneously tell R that, for example, ?Friday? is
the same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.?
Thanks for your assistance.
--
View this message in context:
http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi,
dat1$Ans<-tolower(dat1$Ans)
#But, if you do this:
?vec1<-
c("su","m","tu","w","th","f","sa")
?vec2<-unlist(strsplit(dat1$Ans,","))
?sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su? m tu? w th? f sa? # which is incorrect here "tu" got two matches
in sa"tu"rday
# 2? 0? 2? 1? 1? 3? 4 
Instead:
#Suppose your data looks like this: 
dat2<-
data.frame(Ans=c("friday","wednesday","Friday,Saturday,sunday","saturday","sat,Sun","th,F,Sa"),stringsAsFactors=FALSE)
vec2<- unlist(strsplit(dat2$Ans,","))
library(Hmisc)
vec2New<-capitalize(vec2)
vec2New
#[1] "Friday"??? "Wednesday" "Friday"???
"Saturday"? "Sunday"??? "Saturday"
?#[7] "Sat"?????? "Sun"?????? "Th"???????
"F"???????? "Sa"??????
vec1<-
c("Su","M","Tu","W","Th","F","Sa")
sapply(vec1,function(x) length(vec2New[grep(x,vec2New)]) )
#Su? M Tu? W Th? F Sa 
# 2? 0? 0? 1? 1? 3? 4 
#Or Using Bills' solution:
dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)]
# [1] "Friday"??? "Wednesday" "Friday"???
"Saturday"? "Sunday"??? "Saturday"
?#[7] "Saturday"? "Sunday"??? "Thursday"?
"Friday"??? "Saturday"
table(dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)])
#?? Friday? Saturday??? Sunday? Thursday Wednesday 
?# ????? 3???????? 4???????? 2???????? 1???????? 1 
A.K.
----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Monday, June 17, 2013 4:12 PM
Subject: RE: [R] help with text patterns in strings
Arun, thanks.? Your script achieves the goal I stated, but now I'm tweaking
it as I see possible obstacles with my real data.
I anticipate the responses, since they are handwritten, with be a mixture of
upper- & lowercase text, so I decided to prevent issues by using the
"tolower()" function.
It did not work as I intended when editing your script (see below).
How do I use "tolower()" so that it will save the modification of my
variable in the data frame.
Do I have to rename the original data frame in order to save my changes (create
new object)?
dat1<-
data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
dat1
#? ? ? ? ? ? ? ?  Ans
# 1? ? ? ? ? ? ? ?  Friday
# 2? ? ? ? ? ? ? Wednesday
# 3 Friday,Saturday,Sunday
# 4? ? ? ? ? ? ?  Saturday
# 5? ? ? ? ? ? ? ? Sat,Sun
# 6? ? ? ? ? ? ? ? Th,F,Sa
tolower(dat1$Ans)
dat1
#the output I want:
#? ? ? ? ? ? ? ?  Ans
# 1? ? ? ? ? ? ? ?  friday
# 2? ? ? ? ? ? ? wednesday
# 3 friday,saturday,sunday
# 4? ? ? ? ? ? ?  saturday
# 5? ? ? ? ? ? ? ? sat,sun
# 6? ? ? ? ? ? ? ? th,f,sa
#but the real R output is not all lowercase
vec1<-
c("su","m","tu","w","th","f","sa")
vec2<-unlist(strsplit(dat1$Ans,","))
vec2
#the output I want
#[1] "friday"? ? "wednesday" "friday"? ?
"saturday"? "sunday"? ? "saturday"
#[7] "sat"? ? ?  "sun"? ? ?  "th"? ? ? ?
"f"? ? ? ?  "sa"
#but the real R output is not all lowercase
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su? m tu? w th? f sa
# 2? 0? 0? 1? 1? 3? 4
-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Monday, June 17, 2013 3:15 PM
To: Crombie, Burnette N
Cc: R help
Subject: Re: [R] help with text patterns in strings
Hi,
May be this helps:
dat1<-
data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
?dat1
???????????????????? Ans
1???????????????? Friday
2????????????? Wednesday
3 Friday,Saturday,Sunday
4?????????????? Saturday
5??????????????? Sat,Sun
6??????????????? Th,F,Sa
?vec1<-
c("Su","M","Tu","W","Th","F","Sa")
?vec2<-unlist(strsplit(dat1$Ans,","))
vec2
?#[1] "Friday"??? "Wednesday" "Friday"???
"Saturday"? "Sunday"??? "Saturday"
?#[7] "Sat"?????? "Sun"?????? "Th"???????
"F"???????? "Sa"
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #Su? M Tu? W Th? F Sa # 2?
0? 0? 1? 1? 3? 4
A.K.
----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, June 17, 2013 1:59 PM
Subject: [R] help with text patterns in strings
Let?s say I have a data set that includes a column of answers to a question
?What days of the week are you most likely to eat steak??.
The answers provided are [1] ?Friday?, [2] ?Wednesday?, [3] ?Friday, Saturday,
Sunday", [4] "Saturday?, [5] ?Sat, Sun?, [6] ?Th, F, Sa?
How can I tell R to count ?Friday, Saturday, Sunday?, ?Sat, Sun?, and ?Th, F,
Sa? as three separate entries for each unique observation?
And is there a way to simultaneously tell R that, for example, ?Friday? is the
same as ?Fri? or ?F?; ?Saturday? is the same as ?Sat? or ?Sa?; etc.?
Thanks for your assistance.
--
View this message in context:
http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
HI Burnette,
As this is continuation of the earlier thread, you could post it on the same
thread by cc: to rhelp.
Try this:
res1<-sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) )
dat1<-data.frame(res1,Name=names(vec3))
?dat1$Name<-factor(dat1$Name,levels=c("early","mid","late","wknd"))
?with(dat1,tapply(res1,list(Name),FUN=sum))
#early?? mid? late? wknd 
?# ? 0???? 1???? 4???? 6 
#or
?sapply(split(res1,names(vec3)),sum)
#early? late?? mid? wknd 
?# ? 0???? 4???? 1???? 6 
A.K.
----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Wednesday, June 19, 2013 3:55 PM
Subject: RE: [R] help with text patterns in strings
Arun, let me know if I should post this email separately, but it involves the
script from our previous conversation.? I've been messing around as I think
of potential scenarios with my data and am unclear how I can recount vec3 after
assigning range names to the different days of the week.? For this example, I
want my output to go from:
#Su? M Tu? W Th? F Sa
# 2? ? 0?  0? ?  1?  1? ? 3? 4
to:
# early?  mid?  late?  wknd
#?  0? ? ? ? ? 1? ? ? ? ? 4? ? ? ? ? 6
Thanks for your help throughout, but, again, let me know if I should start a new
thread.
Burnette
##########################################
Begin script
##########################################
dat3<- read.csv("~/Rburnette/TextStringMatch.csv",
stringsAsFactors=FALSE)
dat3
#respondent.ID? ? ? ? ? ? response
# 1? ? ? ? ? ? ? ? ? ? ?  Friday
# 2? ? ? ? ? ? ? ? ? ? ?  Wednesday
# 3? ? ? ? ? ? ? ? ? ? ?  Friday, saturday,Sunday
# 4? ? ? ? ? ? ? ? ? ? ?  Saturday
# 5? ? ? ? ? ? ? ? ? ? ?  Sat, sun
# 6? ? ? ? ? ? ? ? ? ? ?  Th,F, Sa
# Rename the variable ?response? to ?Ans? to fit the script that?s already been
written
# fix(dat3) can be used to do this manually, but then you need to keep
"dat3" as the data frame, not "dat3edit"
# if not familiar, fix() generates a popup window like a spreadsheet that can be
edited, and character vs numeric property can be changed
# the data set being ?fixed? is saved automatically upon closing, but I think
only within the current R session
# I think you need to redefine the fix() as a new object to keep the changes
outside the R session? (need to test this)
##########################################
library(gdata)
dat3edit <- rename.vars(dat3,from="response", to="Ans")
dat3edit
#respondent.ID? ? ? ? ? ? Ans
# 1? ? ? ? ? ? ? ? ? ? ?  Friday
# 2? ? ? ? ? ? ? ? ? ? ?  Wednesday
# 3? ? ? ? ? ? ? ? ? ? ?  Friday, saturday,Sunday
# 4? ? ? ? ? ? ? ? ? ? ?  Saturday
# 5? ? ? ? ? ? ? ? ? ? ?  Sat, sun
# 6? ? ? ? ? ? ? ? ? ? ?  Th,F, Sa
# get rid of the spaces embedded in text strings
##########################################
dat3edit$Ans2 <- gsub(" ","",dat3edit$Ans)
dat3edit$Ans2
# [1] "Friday"? ? ? ? ? ? ? ?  "Wednesday"? ? ? ? ? ? ?
"Friday,saturday,Sunday" "Saturday"? ? ? ? ? ? ?
# [5] "Sat,sun"? ? ? ? ? ? ? ? "Th,F,Sa" 
# split up multiple responses within an observation so they can be counted
separately
##########################################
vec2<-unlist(strsplit(dat3edit$Ans2,","))
vec2
# [1] "Friday"? ? "Wednesday" "Friday"? ?
"saturday"? "Sunday"? ? "Saturday"?
"Sat"? ? ?  "sun"? ? ?
# [9] "Th"? ? ? ? "F"? ? ? ?  "Sa" 
#consistently format all (split up) responses to start with a capital letter for
more accurate matching to a ?universal? response code created in the next step
##########################################
library(Hmisc)
vec2New<-capitalize(vec2)
vec2New
# [1] "Friday"? ? "Wednesday" "Friday"? ?
"Saturday"? "Sunday"? ? "Saturday"?
"Sat"? ? ?  "Sun"? ? ?
# [9] "Th"? ? ? ? "F"? ? ? ?  "Sa"
#match capitalized data to a ?universal? response code of choice
##########################################
vec3<-
c("Su","M","Tu","W","Th","F","Sa")
sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) )
#Su? M Tu? W Th? F Sa
# 2? 0? 0? 1? 1? 3? 4
#assign range names to vec3
##########################################
names(vec3) <-
c("wknd","early","early","mid","late","late","wknd")
vec3
# wknd early early?  mid? late? late? wknd 
# "Su"?  "M"? "Tu"?  "W"?
"Th"?  "F"? "Sa"