Tom D. Harray
2011-Oct-05 02:52 UTC
[R] fgrep with caret (^) meta-character in system() call
Hi there, I would like to use my linux system's fgrep to search for a text pattern in a file. Calling system with system("fgrep \"SearchPattern\" /path/to/the/textFile.txt") works in general, but I need to search for the search pattern at the beginning of the line. The corresponding shell command fgrep "^SearchPattern" /path/to/the/textFile.txt | |___ here's my problem does exactly what I want. I tried various combinations on ", \", \^, but failed to make system() work. How can I call the working shell command including the caret meta-character with system()? Thanks and regards, dirk P.S.: Actually I have to search for about 5.000 patterns, stored in an R list, in a text file with about 30.000.000 lines. The patterns appear in one or more lines of the text file. Only those lines have to be extracted if the patterns at the beginning of the line. Example with matching line 1, non matching line 2, non-matching line 3 (line three comprises aaa, but not at the beginning of the line 3): SearchPattern = "^aaa" Text file: aaaooooooooooo bbbiiiiiiiiiii aacttttttttaaa Going line by line through the file in R is too slow, and I cannot program it in C or C++. Hence I use the fgrep command. I would appreciate if anyone has a fast alternative which works with R on Linux and Windows systems.
man awk? I've used awk for similar tasks (if I am reading the post correctly.) Google-Fu should turn up some useful examples. Also awk should be on your linux installation in some form or another. Regards, Ken Hutchison On Oct 4, 2554 BE, at 10:52 PM, "Tom D. Harray" <tomdharray at gmail.com> wrote:> Hi there, > > I would like to use my linux system's fgrep to search for a text pattern > in a file. Calling system with > > system("fgrep \"SearchPattern\" /path/to/the/textFile.txt") > > works in general, but I need to search for the search pattern at the > beginning of the line. > > The corresponding shell command > > fgrep "^SearchPattern" /path/to/the/textFile.txt > | > |___ here's my problem > > does exactly what I want. I tried various combinations on ", \", \^, but > failed to make system() work. > > How can I call the working shell command including the caret > meta-character with system()? > > Thanks and regards, > > dirk > > > P.S.: Actually I have to search for about 5.000 patterns, stored in an R > list, in a text file with about 30.000.000 lines. The patterns appear in > one or more lines of the text file. Only those lines have to be > extracted if the patterns at the beginning of the line. > > Example with matching line 1, non matching line 2, non-matching line 3 > (line three comprises aaa, but not at the beginning of the line 3): > > SearchPattern = "^aaa" > > Text file: aaaooooooooooo > bbbiiiiiiiiiii > aacttttttttaaa > > Going line by line through the file in R is too slow, and I cannot > program it in C or C++. Hence I use the fgrep command. I would > appreciate if anyone has a fast alternative which works with R on Linux > and Windows systems. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.