Dear All,
I have been trying to scan data from pdf files and use R to seperate them.
The following will make it clear
I have a line that reads
"Intrepid (D) 15,977 11,956 45,143 39,014"
where what is in the parenthesis is either a "D" for domestic or
"I" for
import.
I want to try to strsplit the line according to the full character
"(D)"
or "(I)" but unfortunately R recognizes this as "D", or
"I" in the
strsplit command. my question is is there any way that I can tell R that
this "(" is a character.
P.S. even grep() does have the same problem. I know this is not a problem
since "(", ")", "[", etc.. are predefined in R but
why is the quotation
marks for strings not doing its job here?
Thanks,
Jean Eid
Jean Eid wrote:> Dear All, > I have been trying to scan data from pdf files and use R to seperate them. > The following will make it clear > I have a line that reads > "Intrepid (D) 15,977 11,956 45,143 39,014" > where what is in the parenthesis is either a "D" for domestic or "I" for > import. > I want to try to strsplit the line according to the full character "(D)" > or "(I)" but unfortunately R recognizes this as "D", or "I" in the > strsplit command. my question is is there any way that I can tell R that > this "(" is a character. > P.S. even grep() does have the same problem. I know this is not a problem > since "(", ")", "[", etc.. are predefined in R but why is the quotation > marks for strings not doing its job here? > > Thanks, > Jean EidThese functions do use regular expression. Try e.g.: strsplit("Intrepid (D) 15,977 11,956 45,143 39,014", "\\(D\\)") Uwe Ligges
You have to escape (="\\") such "special" characters, e.g.
strsplit("abc(d)(e)", split="\\(")
or put them in brackets (matches sets of characters)
strsplit("abc(d)(e)", split="[(]")
if you think that is more readable. Similar for gsub(), grep(),
regexpr() and friends.
Henrik Bengtsson
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jean Eid
> Sent: den 15 november 2003 15:26
> To: r-help at stat.math.ethz.ch
> Subject: [R] recognizing "(" as a character
>
>
> Dear All,
> I have been trying to scan data from pdf files and use R to
> seperate them. The following will make it clear I have a line
> that reads "Intrepid (D) 15,977 11,956 45,143 39,014" where
> what is in the parenthesis is either a "D" for domestic or
> "I" for import. I want to try to strsplit the line according
> to the full character "(D)" or "(I)" but unfortunately
R
> recognizes this as "D", or "I" in the strsplit command.
my
> question is is there any way that I can tell R that this "("
> is a character. P.S. even grep() does have the same problem.
> I know this is not a problem since "(", ")",
"[", etc.. are
> predefined in R but why is the quotation marks for strings
> not doing its job here?
>
> Thanks,
> Jean Eid
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailma> n/listinfo/r-help
>
>
On Sat, 15 Nov 2003, Henrik Bengtsson wrote:> You have to escape (="\\") such "special" characters, e.g. > > strsplit("abc(d)(e)", split="\\(") > > or put them in brackets (matches sets of characters) > > strsplit("abc(d)(e)", split="[(]") > > if you think that is more readable. Similar for gsub(), grep(), > regexpr() and friends. >For all except (unfortunately) strsplit() there is now a fixed=TRUE argument to make this unnecessary. -thomas
On Sat, 15 Nov 2003, Thomas Lumley wrote:> On Sat, 15 Nov 2003, Henrik Bengtsson wrote: > > > You have to escape (="\\") such "special" characters, e.g. > > > > strsplit("abc(d)(e)", split="\\(") > > > > or put them in brackets (matches sets of characters) > > > > strsplit("abc(d)(e)", split="[(]") > > > > if you think that is more readable. Similar for gsub(), grep(), > > regexpr() and friends. > > > > For all except (unfortunately) strsplit() there is now a fixed=TRUE > argument to make this unnecessary.As from 1.8.1 there will be a help page explaining what exactly a regexp is. You can use basic regexps here:> strsplit("abc(d)(e)", split="(", extended=FALSE)[[1]] [1] "abc" "d)" "e)" and those brought up on the original pre-POSIX Unix grep may be more comfortable with that. But it sounds like people would like a fixed=TRUE option too. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595