I have a string variable (V1) in a data frame structured as follows: V1 V2 A 5 a1 1 a2 1 a3 1 a4 1 a5 1 B 4 b1 1 b2 1 b3 1 b4 1 I want the following: V1 V2 V3 a1 1 A a2 1 A a3 1 A a4 1 A a5 1 A b1 1 B b2 1 B b3 1 B b4 1 B I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. Thanks, Nicholas Pretnar Mizzou Economics Grad Assistant npretnar at gmail.com
I'm not sure what's so complicated about that (am I missing
something?). You can search using grep, and replace using gsub, so
tmpDF <- read.table(text="V1 V2
A 5
a1 1
a2 1
a3 1
a4 1
a5 1
B 4
b1 1
b2 1
b3 1
b4 1",
header=TRUE)
tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1)))
Seems to do the trick.
Best,
Ista
On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com>
wrote:> I have a string variable (V1) in a data frame structured as follows:
>
> V1 V2
> A 5
> a1 1
> a2 1
> a3 1
> a4 1
> a5 1
> B 4
> b1 1
> b2 1
> b3 1
> b4 1
>
> I want the following:
>
> V1 V2 V3
> a1 1 A
> a2 1 A
> a3 1 A
> a4 1 A
> a5 1 A
> b1 1 B
> b2 1 B
> b3 1 B
> b4 1 B
>
> I am not sure how to go about making this transformation besides writing a
long vector that contains each of the categorical string names (these are state
names, so it would be a really long vector). Any help would be greatly
appreciated.
>
> Thanks,
>
> Nicholas Pretnar
> Mizzou Economics Grad Assistant
> npretnar at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Sorry. Bad example on my part. Try this. V1 is ... V1 alabama bates tuscaloosa smith arkansas fayette little rock alaska juneau nome And I want: V1 V2 alabama bates alabama tuscaloosa alabama smith arkansas fayette arkansas little rock alaska juneau alaskas nome This is more representative of the problem, extended to all 50 states. - Nick On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:> I'm not sure what's so complicated about that (am I missing > something?). You can search using grep, and replace using gsub, so > > tmpDF <- read.table(text="V1 V2 > A 5 > a1 1 > a2 1 > a3 1 > a4 1 > a5 1 > B 4 > b1 1 > b2 1 > b3 1 > b4 1", > header=TRUE) > tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ] > data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "", tmpDF$V1))) > > Seems to do the trick. > > Best, > Ista > > On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com> wrote: >> I have a string variable (V1) in a data frame structured as follows: >> >> V1 V2 >> A 5 >> a1 1 >> a2 1 >> a3 1 >> a4 1 >> a5 1 >> B 4 >> b1 1 >> b2 1 >> b3 1 >> b4 1 >> >> I want the following: >> >> V1 V2 V3 >> a1 1 A >> a2 1 A >> a3 1 A >> a4 1 A >> a5 1 A >> b1 1 B >> b2 1 B >> b3 1 B >> b4 1 B >> >> I am not sure how to go about making this transformation besides writing a long vector that contains each of the categorical string names (these are state names, so it would be a really long vector). Any help would be greatly appreciated. >> >> Thanks, >> >> Nicholas Pretnar >> Mizzou Economics Grad Assistant >> npretnar at gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.