thr3ads.net - R help - [R] Separating a Complicated String Vector [Jan 2015]

If this information is useful, please help other people find it:
Share via:

npretnar

2015-Jan-04 05:20 UTC

[R] Separating a Complicated String Vector

Sorry. Bad example on my part. Try this. V1 is ...

V1
alabama
bates
tuscaloosa
smith
arkansas
fayette
little rock
alaska
juneau
nome

And I want:

V1			V2
alabama	bates
alabama	tuscaloosa
alabama	smith
arkansas	fayette
arkansas	little rock
alaska		juneau
alaskas		nome

This is more representative of the problem, extended to all 50 states.

- Nick


On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
> I'm not sure what's so complicated about that (am I missing
> something?). You can search using grep, and replace using gsub, so
> 
> tmpDF <- read.table(text="V1      V2
> A       5
> a1      1
> a2      1
> a3      1
> a4      1
> a5      1
> B       4
> b1      1
> b2      1
> b3      1
> b4      1",
>                    header=TRUE)
> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
> data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "",
tmpDF$V1)))
> 
> Seems to do the trick.
> 
> Best,
> Ista
> 
> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com>
wrote:
>> I have a string variable (V1) in a data frame structured as follows:
>> 
>> V1      V2
>> A       5
>> a1      1
>> a2      1
>> a3      1
>> a4      1
>> a5      1
>> B       4
>> b1      1
>> b2      1
>> b3      1
>> b4      1
>> 
>> I want the following:
>> 
>> V1      V2      V3
>> a1      1       A
>> a2      1       A
>> a3      1       A
>> a4      1       A
>> a5      1       A
>> b1      1       B
>> b2      1       B
>> b3      1       B
>> b4      1       B
>> 
>> I am not sure how to go about making this transformation besides
writing a long vector that contains each of the categorical string names (these
are state names, so it would be a really long vector). Any help would be greatly
appreciated.
>> 
>> Thanks,
>> 
>> Nicholas Pretnar
>> Mizzou Economics Grad Assistant
>> npretnar at gmail.com
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2015-Jan-04 07:47 UTC

head link

[R] Separating a Complicated String Vector

On Jan 3, 2015, at 9:20 PM, npretnar wrote:
> Sorry. Bad example on my part. Try this. V1 is ...
> 
> V1
> alabama
> bates
> tuscaloosa
> smith
> arkansas
> fayette
> little rock
> alaska
> juneau
> nome
> 
> And I want:
> 
> V1			V2
> alabama	bates
> alabama	tuscaloosa
> alabama	smith
> arkansas	fayette
> arkansas	little rock
> alaska		juneau
> alaskas		nome

dat$is_state <- grepl(tolower(paste(state.name, collapse="|")),
dat$V1)

dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) )
dat2 <- data.frame(V1 = dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ] ,
                   V2 = dat$V1[ !dat$is_state] )

> dat2        V1         V2
1  alabama      bates
2  alabama tuscaloosa
3  alabama      smith
4 arkansas    fayette
5 arkansas     little
6 arkansas       rock
7   alaska     juneau
8   alaska       nome

-- 
David.
> 
> This is more representative of the problem, extended to all 50 states.
> 
> - Nick
> 
> 
> On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
> 
>> I'm not sure what's so complicated about that (am I missing
>> something?). You can search using grep, and replace using gsub, so
>> 
>> tmpDF <- read.table(text="V1      V2
>> A       5
>> a1      1
>> a2      1
>> a3      1
>> a4      1
>> a5      1
>> B       4
>> b1      1
>> b2      1
>> b3      1
>> b4      1",
>>                   header=TRUE)
>> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
>> data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "",
tmpDF$V1)))
>> 
>> Seems to do the trick.
>> 
>> Best,
>> Ista
>> 
>> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com>
wrote:
>>> I have a string variable (V1) in a data frame structured as
follows:
>>> 
>>> V1      V2
>>> A       5
>>> a1      1
>>> a2      1
>>> a3      1
>>> a4      1
>>> a5      1
>>> B       4
>>> b1      1
>>> b2      1
>>> b3      1
>>> b4      1
>>> 
>>> I want the following:
>>> 
>>> V1      V2      V3
>>> a1      1       A
>>> a2      1       A
>>> a3      1       A
>>> a4      1       A
>>> a5      1       A
>>> b1      1       B
>>> b2      1       B
>>> b3      1       B
>>> b4      1       B
>>> 
>>> I am not sure how to go about making this transformation besides
writing a long vector that contains each of the categorical string names (these
are state names, so it would be a really long vector). Any help would be greatly
appreciated.
>>> 
>>> Thanks,
>>> 
>>> Nicholas Pretnar
>>> Mizzou Economics Grad Assistant
>>> npretnar at gmail.com

David Winsemius
Alameda, CA, USA

John Posner

2015-Jan-04 15:43 UTC

head link

[R] Separating a Complicated String Vector

I'm coming to R from Python, so I coded a Python3 solution:

#####################
data = """alabama
bates
tuscaloosa
smith
arkansas
fayette
little rock
alaska
juneau
nome
""".split()

state_list = ["alabama", "arkansas", "alaska"]   #
etc.

return_list = []
for word in data:
    if word in state_list:
        current_state = word
    else:
        return_list.append([current_state, word])

print(return_list)
#####################

... and then translated it to R:

#####################
data = "alabama
bates
tuscaloosa
smith
arkansas
fayette
little rock
alaska
juneau
nome
"

data = strsplit(data, split="\n")[[1]]

states = vector()
cities = vector()

for (word in data) {
  if (word %in% tolower(state.name)) {
    current_state = word
  } else {
    states = c(states, current_state)
    cities = c(cities, word)
  }
}

print(data.frame(V1=states, V2=cities))
#####################

-John



> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David
> Winsemius
> Sent: Sunday, January 04, 2015 2:48 AM
> To: npretnar
> Cc: R-help at r-project.org
> Subject: Re: [R] Separating a Complicated String Vector
>
>
> On Jan 3, 2015, at 9:20 PM, npretnar wrote:
>
> > Sorry. Bad example on my part. Try this. V1 is ...
> >
> > V1
> > alabama
> > bates
> > tuscaloosa
> > smith
> > arkansas
> > fayette
> > little rock
> > alaska
> > juneau
> > nome
> >
> > And I want:
> >
> > V1			V2
> > alabama	bates
> > alabama	tuscaloosa
> > alabama	smith
> > arkansas	fayette
> > arkansas	little rock
> > alaska		juneau
> > alaskas		nome
>
>
> dat$is_state <- grepl(tolower(paste(state.name,
collapse="|")), dat$V1)
>
> dat$thisstate <- cumsum(rownames(dat) %in% which(dat$is_state) )
> dat2 <- data.frame(V1 =
dat$V1[dat$is_state][dat$thisstate[!dat$is_state] ]
> ,
>                    V2 = dat$V1[ !dat$is_state] )
>
>
> > dat2
>         V1         V2
> 1  alabama      bates
> 2  alabama tuscaloosa
> 3  alabama      smith
> 4 arkansas    fayette
> 5 arkansas     little
> 6 arkansas       rock
> 7   alaska     juneau
> 8   alaska       nome
>
> --
> David.
>
> >
> > This is more representative of the problem, extended to all 50 states.
> >
> > - Nick
> >
> >
> > On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
> >
> >> I'm not sure what's so complicated about that (am I
missing
> >> something?). You can search using grep, and replace using gsub, so
> >>
> >> tmpDF <- read.table(text="V1      V2
> >> A       5
> >> a1      1
> >> a2      1
> >> a3      1
> >> a4      1
> >> a5      1
> >> B       4
> >> b1      1
> >> b2      1
> >> b3      1
> >> b4      1",
> >>                   header=TRUE)
> >> tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
data.frame(tmpDF, V3 > >> toupper(gsub("[0-9]", "",
tmpDF$V1)))
> >>
> >> Seems to do the trick.
> >>
> >> Best,
> >> Ista
> >>
> >> On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at
gmail.com> wrote:
> >>> I have a string variable (V1) in a data frame structured as
follows:
> >>>
> >>> V1      V2
> >>> A       5
> >>> a1      1
> >>> a2      1
> >>> a3      1
> >>> a4      1
> >>> a5      1
> >>> B       4
> >>> b1      1
> >>> b2      1
> >>> b3      1
> >>> b4      1
> >>>
> >>> I want the following:
> >>>
> >>> V1      V2      V3
> >>> a1      1       A
> >>> a2      1       A
> >>> a3      1       A
> >>> a4      1       A
> >>> a5      1       A
> >>> b1      1       B
> >>> b2      1       B
> >>> b3      1       B
> >>> b4      1       B
> >>>
> >>> I am not sure how to go about making this transformation
besides
> writing a long vector that contains each of the categorical string names 
> (these
> are state names, so it would be a really long vector). Any help would be
> greatly appreciated.
> >>>
> >>> Thanks,
> >>>
> >>> Nicholas Pretnar
> >>> Mizzou Economics Grad Assistant
> >>> npretnar at gmail.com
>
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

William Dunlap

2015-Jan-05 02:21 UTC

head link

[R] Separating a Complicated String Vector

f <- function (x) {
    isState <- is.element(tolower(x), tolower(state.name))
    w <- which(isState)
    data.frame(State = x[rep(w, diff(c(w, length(x) + 1)) - 1L)],
        City = x[!isState])
}

E.g.,
V1 <-c("alabama", "bates", "tuscaloosa",
"smith", "arkansas", "fayette",
"little rock", "alaska", "juneau",
"nome")> f(V1)     State        City
1  alabama       bates
2  alabama  tuscaloosa
3  alabama       smith
4 arkansas     fayette
5 arkansas little rock
6   alaska      juneau
7   alaska        nome



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Jan 3, 2015 at 9:20 PM, npretnar <npretnar at gmail.com> wrote:
> Sorry. Bad example on my part. Try this. V1 is ...
>
> V1
> alabama
> bates
> tuscaloosa
> smith
> arkansas
> fayette
> little rock
> alaska
> juneau
> nome
>
> And I want:
>
> V1                      V2
> alabama bates
> alabama tuscaloosa
> alabama smith
> arkansas        fayette
> arkansas        little rock
> alaska          juneau
> alaskas         nome
>
> This is more representative of the problem, extended to all 50 states.
>
> - Nick
>
>
> On Jan 3, 2015, at 9:22 PM, Ista Zahn wrote:
>
> > I'm not sure what's so complicated about that (am I missing
> > something?). You can search using grep, and replace using gsub, so
> >
> > tmpDF <- read.table(text="V1      V2
> > A       5
> > a1      1
> > a2      1
> > a3      1
> > a4      1
> > a5      1
> > B       4
> > b1      1
> > b2      1
> > b3      1
> > b4      1",
> >                    header=TRUE)
> > tmpDF <- tmpDF[grepl("[0-9]", tmpDF$V1), ]
> > data.frame(tmpDF, V3 = toupper(gsub("[0-9]", "",
tmpDF$V1)))
> >
> > Seems to do the trick.
> >
> > Best,
> > Ista
> >
> > On Sat, Jan 3, 2015 at 9:41 PM, npretnar <npretnar at gmail.com>
wrote:
> >> I have a string variable (V1) in a data frame structured as
follows:
> >>
> >> V1      V2
> >> A       5
> >> a1      1
> >> a2      1
> >> a3      1
> >> a4      1
> >> a5      1
> >> B       4
> >> b1      1
> >> b2      1
> >> b3      1
> >> b4      1
> >>
> >> I want the following:
> >>
> >> V1      V2      V3
> >> a1      1       A
> >> a2      1       A
> >> a3      1       A
> >> a4      1       A
> >> a5      1       A
> >> b1      1       B
> >> b2      1       B
> >> b3      1       B
> >> b4      1       B
> >>
> >> I am not sure how to go about making this transformation besides
> writing a long vector that contains each of the categorical string names
> (these are state names, so it would be a really long vector). Any help
> would be greatly appreciated.
> >>
> >> Thanks,
> >>
> >> Nicholas Pretnar
> >> Mizzou Economics Grad Assistant
> >> npretnar at gmail.com
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jan 2015 - Separating a Complicated String Vector

[R] Separating a Complicated String Vector

[R] Separating a Complicated String Vector

[R] Separating a Complicated String Vector

[R] Separating a Complicated String Vector