Dear All, I have been trying to scan data from pdf files and use R to seperate them. The following will make it clear I have a line that reads "Intrepid (D) 15,977 11,956 45,143 39,014" where what is in the parenthesis is either a "D" for domestic or "I" for import. I want to try to strsplit the line according to the full character "(D)" or "(I)" but unfortunately R recognizes this as "D", or "I" in the strsplit command. my question is is there any way that I can tell R that this "(" is a character. P.S. even grep() does have the same problem. I know this is not a problem since "(", ")", "[", etc.. are predefined in R but why is the quotation marks for strings not doing its job here? Thanks, Jean Eid
Jean Eid wrote:> Dear All, > I have been trying to scan data from pdf files and use R to seperate them. > The following will make it clear > I have a line that reads > "Intrepid (D) 15,977 11,956 45,143 39,014" > where what is in the parenthesis is either a "D" for domestic or "I" for > import. > I want to try to strsplit the line according to the full character "(D)" > or "(I)" but unfortunately R recognizes this as "D", or "I" in the > strsplit command. my question is is there any way that I can tell R that > this "(" is a character. > P.S. even grep() does have the same problem. I know this is not a problem > since "(", ")", "[", etc.. are predefined in R but why is the quotation > marks for strings not doing its job here? > > Thanks, > Jean EidThese functions do use regular expression. Try e.g.: strsplit("Intrepid (D) 15,977 11,956 45,143 39,014", "\\(D\\)") Uwe Ligges
You have to escape (="\\") such "special" characters, e.g. strsplit("abc(d)(e)", split="\\(") or put them in brackets (matches sets of characters) strsplit("abc(d)(e)", split="[(]") if you think that is more readable. Similar for gsub(), grep(), regexpr() and friends. Henrik Bengtsson> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jean Eid > Sent: den 15 november 2003 15:26 > To: r-help at stat.math.ethz.ch > Subject: [R] recognizing "(" as a character > > > Dear All, > I have been trying to scan data from pdf files and use R to > seperate them. The following will make it clear I have a line > that reads "Intrepid (D) 15,977 11,956 45,143 39,014" where > what is in the parenthesis is either a "D" for domestic or > "I" for import. I want to try to strsplit the line according > to the full character "(D)" or "(I)" but unfortunately R > recognizes this as "D", or "I" in the strsplit command. my > question is is there any way that I can tell R that this "(" > is a character. P.S. even grep() does have the same problem. > I know this is not a problem since "(", ")", "[", etc.. are > predefined in R but why is the quotation marks for strings > not doing its job here? > > Thanks, > Jean Eid > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailma> n/listinfo/r-help > >
On Sat, 15 Nov 2003, Henrik Bengtsson wrote:> You have to escape (="\\") such "special" characters, e.g. > > strsplit("abc(d)(e)", split="\\(") > > or put them in brackets (matches sets of characters) > > strsplit("abc(d)(e)", split="[(]") > > if you think that is more readable. Similar for gsub(), grep(), > regexpr() and friends. >For all except (unfortunately) strsplit() there is now a fixed=TRUE argument to make this unnecessary. -thomas
On Sat, 15 Nov 2003, Thomas Lumley wrote:> On Sat, 15 Nov 2003, Henrik Bengtsson wrote: > > > You have to escape (="\\") such "special" characters, e.g. > > > > strsplit("abc(d)(e)", split="\\(") > > > > or put them in brackets (matches sets of characters) > > > > strsplit("abc(d)(e)", split="[(]") > > > > if you think that is more readable. Similar for gsub(), grep(), > > regexpr() and friends. > > > > For all except (unfortunately) strsplit() there is now a fixed=TRUE > argument to make this unnecessary.As from 1.8.1 there will be a help page explaining what exactly a regexp is. You can use basic regexps here:> strsplit("abc(d)(e)", split="(", extended=FALSE)[[1]] [1] "abc" "d)" "e)" and those brought up on the original pre-POSIX Unix grep may be more comfortable with that. But it sounds like people would like a fixed=TRUE option too. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595