I have a string variable (V1) in a data frame structured as follows: V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1 I want the following: V1 V2 V3 a1 1 A a2 1 A a3 1 A a4 1 A a5 1 A b1 1 B b2 1 B b3 1 B b4 1 B I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. Thanks, Nicholas Pretnar Mizzou Economics Grad Assistant npretnar at gmail.com
I'm not sure what's so complicated about that (am I missing something?). You can search using grep, and replace using gsub, so tmpDF <- read.table(text="V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1", header=TRUE) tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ] data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1))) Seems to do the trick. Best, Ista On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote:> I have a string variable (V1) in a data frame structured as follows: > > V1 V2 > A 5 > a1 1 > a2 1 > a3 1 > a4 1 > a5 1 > B 4 > b1 1 > b2 1 > b3 1 > b4 1 > > I want the following: > > V1 V2 V3 > a1 1 A > a2 1 A > a3 1 A > a4 1 A > a5 1 A > b1 1 B > b2 1 B > b3 1 B > b4 1 B > > I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. > > Thanks, > > Nicholas Pretnar > Mizzou Economics Grad Assistant > npretnar at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Sorry. Bad example on my part. Try this. V1 is ... V1 alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome And I want: V1 V2 alabama bates alabama tuscaloosa alabama smith arkansas fayette arkansas little rock alaska juneau alaskas nome This is more representative of the problem, extended to all 50 states. - Nick On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:> I'm not sure what's so complicated about that (am I missing > something?). You can search using grep, and replace using gsub, so > > tmpDF <- read.table(text="V1 V2 > A 5 > a1 1 > a2 1 > a3 1 > a4 1 > a5 1 > B 4 > b1 1 > b2 1 > b3 1 > b4 1", > header=TRUE) > tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ] > data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1))) > > Seems to do the trick. > > Best, > Ista > > On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote: >> I have a string variable (V1) in a data frame structured as follows: >> >> V1 V2 >> A 5 >> a1 1 >> a2 1 >> a3 1 >> a4 1 >> a5 1 >> B 4 >> b1 1 >> b2 1 >> b3 1 >> b4 1 >> >> I want the following: >> >> V1 V2 V3 >> a1 1 A >> a2 1 A >> a3 1 A >> a4 1 A >> a5 1 A >> b1 1 B >> b2 1 B >> b3 1 B >> b4 1 B >> >> I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. >> >> Thanks, >> >> Nicholas Pretnar >> Mizzou Economics Grad Assistant >> npretnar at gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.