thr3ads.net - similar to: "Good Package(s) for String and URL processing?"

Displaying 20 results from an estimated 10000 matches similar to: "Good Package(s) for String and URL processing?"

2008 Apr 09

Number of words in a string

Hi R, A quick question: How do we find the number of words in a string? Example: C="Have a nice day" And the number of words should be 4. any built in function or?... Thanks, Shubha Shubha Karanth | Amba Research Ph +91 80 3980 8031 | Mob +91 94 4886 4510 Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com This e-mail may contain

Searching within a ch. string

2009 May 11

Searching within a ch. string

Hi all, is there any function to find some words in a character-string? For example suppose the string is : "gdfsa-sdhchc-88", now I want to find whether this string contains "sdhch". Is there any R function to do that? Regards, -- View this message in context: http://www.nabble.com/Searching-within-a-ch.-string-tp23484010p23484010.html Sent from the R help mailing list

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

2006 Jul 23

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

Dear all I use R for Windows 2.3.1 on a fully updated Windows XP Home SP2 machine and I have two related regular expression problems. platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor

Search and extract string function

2010 Jul 15

Search and extract string function

Hi all, I'm trying to write a function that will search and extract from a long character string, but with a twist: I want to use the characters before and the characters after what I want to extract as reference points. For example, say I'm working with data entries that looks like this: Drink=Coffee:Location=Office:Time=Morning:Market=Flat

Extract Element of String with R's Regex

2008 Aug 01

Extract Element of String with R's Regex

Hi, I have this string, in which I want to extract some of it's element: > x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35" yielding this array [1] "211952_at" "RANBP5" "2" In Perl we would do it this way: __BEGIN__ my @needed =(); my $str = "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669

R 2.10.0: Error in gsub/calloc

2009 Nov 03

R 2.10.0: Error in gsub/calloc

I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this is a Mac-specific problem. I have a very large (158,908 possible sentences, ca. 58 MB) plain text document d which I am trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am encountering the following error: Error in base::gsub(pattern, rs, x, ...) : Calloc could not allocate (-1398215180 of

regular expression help to extract specific strings from text

2010 Mar 31

regular expression help to extract specific strings from text

Dear all, Lets say I have the following: > x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: @Eve I'm sure they'll sort it out :)", "blahblah") > x [1]

re ading tokens

2009 Nov 03

re ading tokens

Greetings, I am not familiar with processing text in R. Can someone tell me how to read each line of words as separate elements in a list? FE, I would like to turn: word1 word2 word3 word2 word4 into a list of length two with three character elements in the first list and two elements in the second. I know that this should be easy, but I am a little confused by the text functions. Thanks in

Determine the Length of the Longest Word in a String

2009 Apr 10

Determine the Length of the Longest Word in a String

Hi Everyone, I'm new to programming R and have accomplished my goal, but feel that there is probably a more efficient way of coding this. I'd appreciate any guidance that a more advanced programmer can provide. My goal -- I would like to find the length of the longest word in a string containing many words separated by spaces. How I did it -- I was able to find the length of the

string functions

2008 Sep 14

string functions

Hello, trying to locate all the string commands in the base version of R, can't seem to find an area that describes them. I am in need to do some serious parsing of text data to create my dataset. Is there a summary link to all the character operators? string manipulations that would help in parsing text.

extracting data using strings as delimiters

2007 Sep 25

extracting data using strings as delimiters

Dear List, I have an ascii text file with data I'd like to extract. Example: Year Built: 1873 Gross Building Area: 578 sq ft Total Rooms: 6 Living Area: 578 sq ft There is a lot of data I'd like to ignore in each record, so I'm hoping there is a way to use strings as delimiters to get the data I want (e.g. tell R to take data between "Built:" and "Gross" -

Problem with parsing a dataset - help earnestly sought

2010 Apr 23

Problem with parsing a dataset - help earnestly sought

Dear fellow R-help members, I hope to seek your advice on how to parse/manage a dataset with hundreds of columns. Two examples of these columns, 'cancer.problems', and 'neuro.problems' are depicted below. Essentially, I need to parse this into a useful dataset, and unfortunately, I am not familiar with perl or any such language. data <- data.frame(id=c(1:10))

strsplit("dia ma", "\\b") splits characterwise

2010 Jul 08

strsplit("dia ma", "\\b") splits characterwise

\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " "

apply loop - using/providing a data frame to loop over

2009 Dec 28

apply loop - using/providing a data frame to loop over

Hi, I want to extract individual names from a single string that contains all names. My problem is not the extraction itself, but the looping over the extraction start and end points, which I try to realize with apply. #Say, I have a string with names. authors=c("Schleyer T, Spallek H, Butler BS, Subramanian S, Weiss D, Poythress ML, Rattanathikun P, Mueller G") #Since I only want the

Getting a list of unique gene names from a list with semi-colons

2012 Jan 07

Getting a list of unique gene names from a list with semi-colons

Hello, I have one column in my dataframe that has gene names of interest. Unfortunately, due to the fact that some probes lie between two genes or two transcripts of a gene, it looks something like this - FAM81A LOC283050;LOC283050;LOC283050;ZMIZ1 PINK1;PINK1 MRPL12;MRPL12 C1orf114 MMS19;UBTD1 I would like to know how to get a list with all the names with no semi-colons and removing the

regex question

2009 Aug 04

regex question

Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<",

R newbie: how to replace string/regular expression

2008 Nov 02

R newbie: how to replace string/regular expression

Hello; I am a R newbie and would like to know correct and efficient method for doing string replacement. I have a large data set, where I want to replace character "M", "b", and "K" (currency in Million, Billion and K) to millions. That is 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) and etc.. d <- c("120.0M", "11.01m",

extracting a matched string using regexpr

2010 May 05

extracting a matched string using regexpr

Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" # a pattern to extract 5 digits > pattern<-"[0-9]{5}" #

string handling

2010 Jun 03

string handling

I have a data.frame as the following: var1 var2 9G/G09 abd89C/T90 10A/T9 32C/C 90G/G A/A . . . . . . 10T/C 00G/G90 What I want is to get the letters which are on the left and right of '/'. for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I only want "C" and

R string help

2011 Feb 01

R string help

Dear R guru: If I got a variable aaa<- "up.6.11(16)" how can I extract 16 out of the bracket? I could use substr, e.g. substr(aaa, start=1, stop=2) [1] "up" But it needs start and stop, what if my start or stop is not fixed, I just want the number inside the bracket, how can I achieve this? Many thanks yan

similar to: Good Package(s) for String and URL processing?