thr3ads.net - R help - [R] regexpr with accents [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Luca Meyer

2012-Aug-06 05:55 UTC

[R] regexpr with accents

Hello,

I have build a syntax to find out if a given substring is included in a larger
string that works like this:

d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9

and this works all right till "some text" contains standard ASCII set.
However, it does not work when accents are included as the following:

d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9

I have tried to substitute "?" with several wildcards but it did not
work, can anyone suggest how to have the syntax parse the string ignoring the
accent?

Thank you in advance,

Luca

Rui Barradas

2012-Aug-06 06:22 UTC

head link

[R] regexpr with accents

Hello,

Works with me:

d1 <- data.frame(V1 = 1:3,
     V2 = c("some text = 9", "some t?xt = 9", "some
other text = 9"))

regexpr("some text = 9", d1$V2)
[1]  1 -1 -1
attr(,"match.length")
[1] 13 -1 -1
regexpr("some t?xt = 9", d1$V2)
[1] -1  1 -1
attr(,"match.length")
[1] -1 13 -1
d1$V1[regexpr("some text = 9",d1$V2) > 0] <- 9
d1$V1[regexpr("some t?xt = 9",d1$V2) > 0] <- 9
d1
   V1                  V2
1  9       some text = 9
2  9       some t?xt = 9
3  3 some other text = 9

What do you mean by "it did not work"? What was the contents of
'd1'?

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

Hope this helps,

Rui Barradas

Em 06-08-2012 06:55, Luca Meyer escreveu:> Hello,
>
> I have build a syntax to find out if a given substring is included in a
larger string that works like this:
>
> d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9
>
> and this works all right till "some text" contains standard ASCII
set. However, it does not work when accents are included as the following:
>
> d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9
>
> I have tried to substitute "?" with several wildcards but it did
not work, can anyone suggest how to have the syntax parse the string ignoring
the accent?
>
> Thank you in advance,
>
> Luca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Aug-06 06:58 UTC

head link

[R] regexpr with accents

HI,

It works with me.? I am using R 2.15 on Ubuntu 12.04.

?d1 <- data.frame(V1 = 1:5, V2=c("some text = 9", "some
t?xt=9","s?me t?xt=9",?
"s?me text=9", "some t?xt=9"))
d1
#? V1??????????? V2
#1? 1 some text = 9
#2? 2?? some t?xt=9
#3? 3?? s?me t?xt=9
#4? 4?? s?me text=9
#5? 5?? some t?xt=9
? 
d1$V1[regexpr("some t?xt=9",d1$V2)>0]<-9
d1$V1[regexpr("s?me text=9",d1$V2)>0] <-9
d1$V1[regexpr("some t?xt=9",d1$V2)>0] <-9
d1$V1[regexpr("s?me t?xt=9",d1$V2)>0] <-9
d1$V1[regexpr("some text = 9",d1$V2)>0] <-9

d1
#? V1??????????? V2
#1? 9 some text = 9
#2? 9?? some t?xt=9
#3? 9?? s?me t?xt=9
#4? 9?? s?me text=9
#5? 9?? some t?xt=9

A.K.




----- Original Message -----
From: Luca Meyer <lucam1968 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, August 6, 2012 1:55 AM
Subject: [R] regexpr with accents

Hello,

I have build a syntax to find out if a given substring is included in a larger
string that works like this:

d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9

and this works all right till "some text" contains standard ASCII set.
However, it does not work when accents are included as the following:

d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9

I have tried to substitute "?" with several wildcards but it did not
work, can anyone suggest how to have the syntax parse the string ignoring the
accent?

Thank you in advance,

Luca

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Luca Meyer

2012-Aug-06 12:25 UTC

head link

[R] regexpr with accents

? stato filtrato un testo allegato il cui set di caratteri non era
indicato...
Nome: non disponibile
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20120806/5e8ac959/attachment.pl>

arun

2012-Aug-06 13:01 UTC

head link

[R] regexpr with accents

Hi,

Here, the string with in the quotes are read exactly like that.? So, you may
have to use the symbol instead of "friendly" or "numeric"
from the link.? Or you have to convert those.

d1 <- data.frame(V1 = 1:4,
??? V2 = c("some text = 9", "some t&egravext = 9",
"some t?xt = 9", "some t&#232xt = 9"))

d1$V1[regexpr("some t&egravext = 9",d1$V2)>0] <- 9
?d1$V1[regexpr("some t&#232xt = 9",d1$V2)>0] <- 9
d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9

d1
? V1????????????????? V2
1? 1?????? some text = 9
2? 9 some t&egravext = 9
3? 9?????? some t?xt = 9
4? 9?? some t&#232xt = 9

A.K.


----- Original Message -----
From: Luca Meyer <lucam1968 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, August 6, 2012 8:25 AM
Subject: [R]  regexpr with accents

Sorry but my previous email did not go through properly. Instead of the ? you
should really read an &egrave or &#232 according to
http://www.lookuptables.com/.

So there are extended ASCII characters I need to deal with.

I have tried

d1$V1[regexpr("some t&egravext = 9",d1$V2)>0] <- 9
and 

d1$V1[regexpr("some t&#232xt = 9",d1$V2)>0] <- 9

without success...

Thanks,
Luca




??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Luca Meyer

2012-Aug-06 13:16 UTC

head link

[R] regexpr with accents

Thanks Arun,

It works all right, I just found out that my problem was not with accents but
with the correct spelling of "some text".....

Kind regards,

Luca

Il giorno 06/ago/2012, alle ore 15.01, arun ha scritto:
> 
> 
> Hi,
> 
> Here, the string with in the quotes are read exactly like that.  So, you
may have to use the symbol instead of "friendly" or
"numeric" from the link.  Or you have to convert those.
> 
> d1 <- data.frame(V1 = 1:4,
>     V2 = c("some text = 9", "some t&egravext = 9",
"some t?xt = 9", "some t&#232xt = 9"))
> 
> d1$V1[regexpr("some t&egravext = 9",d1$V2)>0] <- 9
>  d1$V1[regexpr("some t&#232xt = 9",d1$V2)>0] <- 9
> d1$V1[regexpr("some t?xt = 9",d1$V2)>0] <- 9
> 
> d1
>   V1                  V2
> 1  1       some text = 9
> 2  9 some t&egravext = 9
> 3  9       some t?xt = 9
> 4  9   some t&#232xt = 9
> 
> A.K.
> 
> 
> ----- Original Message -----
> From: Luca Meyer <lucam1968 at gmail.com>
> To: r-help at r-project.org
> Cc: 
> Sent: Monday, August 6, 2012 8:25 AM
> Subject: [R]  regexpr with accents
> 
> Sorry but my previous email did not go through properly. Instead of the ?
you should really read an &egrave or &#232 according to
http://www.lookuptables.com/.
> 
> So there are extended ASCII characters I need to deal with.
> 
> I have tried
> 
> d1$V1[regexpr("some t&egravext = 9",d1$V2)>0] <- 9
> and 
> 
> d1$V1[regexpr("some t&#232xt = 9",d1$V2)>0] <- 9
> 
> without success...
> 
> Thanks,
> Luca
> 
> 
> 
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more seemingly similar threads

R help - Aug 2012 - regexpr with accents

[R] regexpr with accents

[R] regexpr with accents

[R] regexpr with accents

[R] regexpr with accents

[R] regexpr with accents

[R] regexpr with accents

Reasonably Related Threads