similar to: Good Package(s) for String and URL processing?

Displaying 20 results from an estimated 10000 matches similar to: "Good Package(s) for String and URL processing?"

2008 Apr 09
11
Number of words in a string
Hi R, A quick question: How do we find the number of words in a string? Example: C="Have a nice day" And the number of words should be 4. any built in function or?... Thanks, Shubha Shubha Karanth | Amba Research Ph +91 80 3980 8031 | Mob +91 94 4886 4510 Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com This e-mail may contain
2009 May 11
3
Searching within a ch. string
Hi all, is there any function to find some words in a character-string? For example suppose the string is : "gdfsa-sdhchc-88", now I want to find whether this string contains "sdhch". Is there any R function to do that? Regards, -- View this message in context: http://www.nabble.com/Searching-within-a-ch.-string-tp23484010p23484010.html Sent from the R help mailing list
2006 Jul 23
3
RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences
Dear all I use R for Windows 2.3.1 on a fully updated Windows XP Home SP2 machine and I have two related regular expression problems. platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor
2010 Jul 15
2
Search and extract string function
Hi all, I'm trying to write a function that will search and extract from a long character string, but with a twist: I want to use the characters before and the characters after what I want to extract as reference points. For example, say I'm working with data entries that looks like this: Drink=Coffee:Location=Office:Time=Morning:Market=Flat
2008 Aug 01
2
Extract Element of String with R's Regex
Hi, I have this string, in which I want to extract some of it's element: > x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35" yielding this array [1] "211952_at" "RANBP5" "2" In Perl we would do it this way: __BEGIN__ my @needed =(); my $str = "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669
2009 Nov 03
2
R 2.10.0: Error in gsub/calloc
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this is a Mac-specific problem. I have a very large (158,908 possible sentences, ca. 58 MB) plain text document d which I am trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am encountering the following error: Error in base::gsub(pattern, rs, x, ...) : Calloc could not allocate (-1398215180 of
2010 Mar 31
3
regular expression help to extract specific strings from text
Dear all, Lets say I have the following: > x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: @Eve I'm sure they'll sort it out :)", "blahblah") > x [1]
2009 Nov 03
3
re ading tokens
Greetings, I am not familiar with processing text in R. Can someone tell me how to read each line of words as separate elements in a list? FE, I would like to turn: word1 word2 word3 word2 word4 into a list of length two with three character elements in the first list and two elements in the second. I know that this should be easy, but I am a little confused by the text functions. Thanks in
2009 Apr 10
3
Determine the Length of the Longest Word in a String
Hi Everyone, I'm new to programming R and have accomplished my goal, but feel that there is probably a more efficient way of coding this. I'd appreciate any guidance that a more advanced programmer can provide. My goal -- I would like to find the length of the longest word in a string containing many words separated by spaces. How I did it -- I was able to find the length of the
2008 Sep 14
5
string functions
Hello, trying to locate all the string commands in the base version of R, can't seem to find an area that describes them. I am in need to do some serious parsing of text data to create my dataset. Is there a summary link to all the character operators? string manipulations that would help in parsing text.
2007 Sep 25
5
extracting data using strings as delimiters
Dear List, I have an ascii text file with data I'd like to extract. Example: Year Built: 1873 Gross Building Area: 578 sq ft Total Rooms: 6 Living Area: 578 sq ft There is a lot of data I'd like to ignore in each record, so I'm hoping there is a way to use strings as delimiters to get the data I want (e.g. tell R to take data between "Built:" and "Gross" -
2010 Apr 23
2
Problem with parsing a dataset - help earnestly sought
Dear fellow R-help members, I hope to seek your advice on how to parse/manage a dataset with hundreds of columns. Two examples of these columns, 'cancer.problems', and 'neuro.problems' are depicted below. Essentially, I need to parse this into a useful dataset, and unfortunately, I am not familiar with perl or any such language. data <- data.frame(id=c(1:10))
2010 Jul 08
2
strsplit("dia ma", "\\b") splits characterwise
\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " "
2009 Dec 28
3
apply loop - using/providing a data frame to loop over
Hi, I want to extract individual names from a single string that contains all names. My problem is not the extraction itself, but the looping over the extraction start and end points, which I try to realize with apply. #Say, I have a string with names. authors=c("Schleyer T, Spallek H, Butler BS, Subramanian S, Weiss D, Poythress ML, Rattanathikun P, Mueller G") #Since I only want the
2012 Jan 07
3
Getting a list of unique gene names from a list with semi-colons
Hello, I have one column in my dataframe that has gene names of interest. Unfortunately, due to the fact that some probes lie between two genes or two transcripts of a gene, it looks something like this - FAM81A LOC283050;LOC283050;LOC283050;ZMIZ1 PINK1;PINK1 MRPL12;MRPL12 C1orf114 MMS19;UBTD1 I would like to know how to get a list with all the names with no semi-colons and removing the
2009 Aug 04
4
regex question
Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<",
2008 Nov 02
5
R newbie: how to replace string/regular expression
Hello; I am a R newbie and would like to know correct and efficient method for doing string replacement. I have a large data set, where I want to replace character "M", "b", and "K" (currency in Million, Billion and K) to millions. That is 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) and etc.. d <- c("120.0M", "11.01m",
2010 May 05
1
extracting a matched string using regexpr
Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" # a pattern to extract 5 digits > pattern<-"[0-9]{5}" #
2010 Jun 03
5
string handling
I have a data.frame as the following: var1 var2 9G/G09 abd89C/T90 10A/T9 32C/C 90G/G A/A . . . . . . 10T/C 00G/G90 What I want is to get the letters which are on the left and right of '/'. for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I only want "C" and
2011 Feb 01
3
R string help
Dear R guru: If I got a variable aaa<- "up.6.11(16)" how can I extract 16 out of the bracket? I could use substr, e.g. substr(aaa, start=1, stop=2) [1] "up" But it needs start and stop, what if my start or stop is not fixed, I just want the number inside the bracket, how can I achieve this? Many thanks yan