thr3ads.net - R help - [R] year extraction over a list [Jul 2012]

If this information is useful, please help other people find it:
Share via:

jimi adams

2012-Jul-31 15:33 UTC

[R] year extraction over a list

Hello,
I have a data frame, one element in that data frame is a LIST, with each element
being a character string. I am trying to extract the first year listed in each
of those character strings. The character elements are typically csv, but the
position of the year can vary (think citations with varying citation standards).
I.e.,

foo$a
[[1]]
[1] text, text, 2001, text
[2] text, 2000, text
[3] 1999, text, text, text, ?

I'm trying to figure out how to create a new list such that each element is
that year, i.e., the result on the above would be:
foo$year
[[1]]
[1] 2001
[2] 2000 
[3] 1999
?

For some reason i'm not figuring out how to properly get lapply and strsplit
(or other alternatives) to play nicely together. Any help greatly appreciated.

thanks,
jimi


jimi adams
Assistant Professor
Department of Sociology
American University
e: jadams at american.edu
w: jimiadams.com

Rui Barradas

2012-Jul-31 16:14 UTC

head link

[R] year extraction over a list

Hello,

Try the following.


x <- c("text, text, 2001, text", "text, 2000, text",
"1999, text, text,
text")

extract.year <- function(x, n = 4){
     pattern <- paste(".*([[:digit:]]{", n, "}).*",
sep="")
     as.integer(sub(pattern, "\\1", x))
}

extract.year(x)

The argument 'n' is the number of digits of year. Then use the function 
as you want, within lapply, for instance, or directly as in

extract.year(foo$a)

Hope this helps,

Rui Barradas

Em 31-07-2012 16:33, jimi adams escreveu:> Hello,
> I have a data frame, one element in that data frame is a LIST, with each
element being a character string. I am trying to extract the first year listed
in each of those character strings. The character elements are typically csv,
but the position of the year can vary (think citations with varying citation
standards). I.e.,
>
> foo$a
> [[1]]
> [1] text, text, 2001, text
> [2] text, 2000, text
> [3] 1999, text, text, text, ?
>
> I'm trying to figure out how to create a new list such that each
element is that year, i.e., the result on the above would be:
> foo$year
> [[1]]
> [1] 2001
> [2] 2000
> [3] 1999
> ?
>
> For some reason i'm not figuring out how to properly get lapply and
strsplit (or other alternatives) to play nicely together. Any help greatly
appreciated.
>
> thanks,
> jimi
>
>
> jimi adams
> Assistant Professor
> Department of Sociology
> American University
> e: jadams at american.edu
> w: jimiadams.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Jul-31 18:01 UTC

head link

[R] year extraction over a list

Hello,

Try this:

list1<-list(
"text, text, 2001, text",
"text, 2000, text",
?"1999, text, text, text")
l1<-lapply(list1,function(x) gsub("\\D","",x))
[[1]]
[1] "2001"

[[2]]
[1] "2000"

[[3]]
[1] "1999"


unlist(l1)
[1] "2001" "2000" "1999"


A.K.







----- Original Message -----
From: jimi adams <jimi.adams at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 31, 2012 11:33 AM
Subject: [R] year extraction over a list

Hello,
I have a data frame, one element in that data frame is a LIST, with each element
being a character string. I am trying to extract the first year listed in each
of those character strings. The character elements are typically csv, but the
position of the year can vary (think citations with varying citation standards).
I.e.,

foo$a
[[1]]
[1] text, text, 2001, text
[2] text, 2000, text
[3] 1999, text, text, text, ?

I'm trying to figure out how to create a new list such that each element is
that year, i.e., the result on the above would be:
foo$year
[[1]]
[1] 2001
[2] 2000 
[3] 1999
?

For some reason i'm not figuring out how to properly get lapply and strsplit
(or other alternatives) to play nicely together. Any help greatly appreciated.

thanks,
jimi


jimi adams
Assistant Professor
Department of Sociology
American University
e: jadams at american.edu
w: jimiadams.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Jul-31 18:59 UTC

head link

[R] year extraction over a list

Hello,

Ok, try this, then.
(I've renamed the function and included some numbers in the text strings.)

x <- c("text, text, 2001, text, 1234", "text, 2000, 1234,
text", "1999,
text, 1234, text, text")

extract.first.year <- function(x, n = 4){
     pattern <- paste("\\D*(\\d{", n, "}).*$",
sep="")
     as.integer(sub(pattern, "\\1", x))
}

extract.first.year(x)

Also, thanks to arun for having reminded me of \\d and its negation, 
\\D. It makes the code easier to read than [[:digit:]], which I 
systematically use, and its negation in two places:
pattern <- paste("[^[:digit:]]*([[:digit:]]{", n,
"}).*$", sep="")

Rui Barradas

Em 31-07-2012 19:36, jimi adams escreveu:> Thanks. Apparently my question wasn't quite specific enough, and my
experience with regular expressions is extremely limited.
> This is doing *exactly* as expected for what i described, but i left out a
few details. The most important is that often #'s also appear in some of
what i labeled as "text" below.
>
> This appears to be returning the LAST 4-digit number it finds in each of
the items over which it runs. How can i make it the first instead? That should
do the trick to make this exactly what i actually need, not just what i asked
about.
>
> again, thanks!
> jimi
>
> On 31Jul, 2012, at 12:14 , Rui Barradas wrote:
>
>> Hello,
>>
>> Try the following.
>>
>>
>> x <- c("text, text, 2001, text", "text, 2000,
text", "1999, text, text, text")
>>
>> extract.year <- function(x, n = 4){
>>     pattern <- paste(".*([[:digit:]]{", n,
"}).*", sep="")
>>     as.integer(sub(pattern, "\\1", x))
>> }
>>
>> extract.year(x)
>>
>> The argument 'n' is the number of digits of year. Then use the
function as you want, within lapply, for instance, or directly as in
>>
>> extract.year(foo$a)
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 31-07-2012 16:33, jimi adams escreveu:
>>> Hello,
>>> I have a data frame, one element in that data frame is a LIST, with
each element being a character string. I am trying to extract the first year
listed in each of those character strings. The character elements are typically
csv, but the position of the year can vary (think citations with varying
citation standards). I.e.,
>>>
>>> foo$a
>>> [[1]]
>>> [1] text, text, 2001, text
>>> [2] text, 2000, text
>>> [3] 1999, text, text, text, ?
>>>
>>> I'm trying to figure out how to create a new list such that
each element is that year, i.e., the result on the above would be:
>>> foo$year
>>> [[1]]
>>> [1] 2001
>>> [2] 2000
>>> [3] 1999
>>> ?
>>>
>>> For some reason i'm not figuring out how to properly get lapply
and strsplit (or other alternatives) to play nicely together. Any help greatly
appreciated.
>>>
>>> thanks,
>>> jimi
>>>
>>>
>>> jimi adams
>>> Assistant Professor
>>> Department of Sociology
>>> American University
>>> e: jadams at american.edu
>>> w: jimiadams.com
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jul 2012 - year extraction over a list

[R] year extraction over a list

[R] year extraction over a list

[R] year extraction over a list

[R] year extraction over a list

Possibly Parallel Threads