thr3ads.net - similar to: "Extracting matched expressions"

Displaying 20 results from an estimated 20000 matches similar to: "Extracting matched expressions"

2008 Jun 14

strsplit, keeping delimiters

Hi all, Does anyone have a version of strsplit that keeps the string that is split by. e.g. from x <- "A: 123 B: 456 C: 678" I'd like to get c("A:", "123 ", "B: ", "456 ", "C: ", 678) but strsplit(x, "[A-Z]+:") gives me c("", " 123 ", " 456 ", " 678") Any ideas? Thanks,

extracting a matched string using regexpr

2010 May 05

extracting a matched string using regexpr

Given a text like I want to be able to extract a matched regular expression from a piece of text. this apparently works, but is pretty ugly # some html test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>" # a pattern to extract 5 digits > pattern<-"[0-9]{5}" #

string handling

2010 Jun 03

string handling

I have a data.frame as the following: var1 var2 9G/G09 abd89C/T90 10A/T9 32C/C 90G/G A/A . . . . . . 10T/C 00G/G90 What I want is to get the letters which are on the left and right of '/'. for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I only want "C" and

regular expression help to extract specific strings from text

2010 Mar 31

regular expression help to extract specific strings from text

Dear all, Lets say I have the following: > x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: @Eve I'm sure they'll sort it out :)", "blahblah") > x [1]

book on regular expressions

2007 Dec 11

book on regular expressions

Hello, Could someone recommend a good book on regular expressions with focus on applications/use as it might relate to R. I remember there was a mention of such a reference book recently, but I could not locate that message on the archive. Thanks. -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830

How to extract a specific substring from a string (regular expressions) ? See details inside

2009 Sep 16

How to extract a specific substring from a string (regular expressions) ? See details inside

Hi all, I have thousands of strings like these ones: "1159_1; YP_177963; PPE FAMILY PROTEIN" "1100_13; SECRETED L-ALANINE DEHYDROGENASE ALD CAA15575" "1141_24; gi;2894249;emb;CAA17111.1; PROBABLE ISOCITRATE DEHYDROGENASE" and various others.. I'm interested to extract the code for the protein (in this example: YP_177963, CAA15575, CAA17111). I

Escaping regular expressions

2009 Nov 13

Escaping regular expressions

Hi all, Is there a method for escaping strings to be used regular expressions? i.e. if I have a user supplied string that I'd like to use as a fixed component is there a method that will turn (e.g.) ".$^" into "\\.\\$\\^" ? Thanks, Hadley -- http://had.co.nz/

Identify and extract a whole word of variable length using regular expressions

2010 Jun 28

Identify and extract a whole word of variable length using regular expressions

Hi everybody, I'm quite weak with regular expression, and I need some help... I have strings of the type >a [1,] "ppe46 Rv3018c MT3098/MT3101 MTV012.32c" [2,] "ppe16 Rv1135c MT1168" [3,] "ppe21 Rv1548c MT1599 MTCY48.17" [4,] "ppe12 Rv0755c MT0779" [5,] "PE_PGRS51 Rv3367"

Function that is giving me a headache- any help appreciated (automatic read )

2010 May 18

Function that is giving me a headache- any help appreciated (automatic read )

note: whole function is below- I am sure I am doing something silly. when I use it like USGS(input="precipitation") it is choking on the precip.1 <- subset(DF, precipitation!="NA") b <- ddply(precip.1$precipitation, .(precip.1$gauge_name), cumsum) DF.precip <- precip.1 DF.precip$precipitation <- b$.data part, but runs fine outside of the function: days=7

regular expressions

2009 Oct 26

regular expressions

Dear list, I have the following text to parse (originating from readLines as some lines have unequal size), st = c("START text1 1 text2 2.3", "whatever intermediate text", "START text1 23.4 text2 3.1415") from which I'd like to extract the lines starting with "START", and group the subsequent fields in a data.frame in this format: text1 text2

R regular expression to extract words with the query string.

2009 Jul 08

R regular expression to extract words with the query string.

Hi, Is there a way in R to get the string which matches the expression, where the expression is a substring of the parent string. Lets say, I have $i <- "transcript:ENST0000112334 pid:ENSP000012345" What I need is the string "pid:ENSP000012345" from $i using the query "ENSP". Appreciate your comments. Praveen Surendran School of Medicine and

R newbie: how to replace string/regular expression

2008 Nov 02

R newbie: how to replace string/regular expression

Hello; I am a R newbie and would like to know correct and efficient method for doing string replacement. I have a large data set, where I want to replace character "M", "b", and "K" (currency in Million, Billion and K) to millions. That is 209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100) and etc.. d <- c("120.0M", "11.01m",

perl expression question

2008 Aug 12

perl expression question

I have a string such as fileName<-"Agg.20.20.20-all-01". All I want to do is pull the "20.20.20" and the "all" as strings. Obviously, they aren't always those values. The "20.20.20" can be "30.30.30" but it's always after the . which is next to the second g in Agg and it's always the same length. The all might not always be

Regular Expressions

2008 May 13

Regular Expressions

Hi R, Again struck with regular expressions... Suppose, S=c("World_is_beautiful", "one_two_three_four","My_book") I need to extract the last but one element of the strings. So, my output should look like: Ans=c("is","three","My") gsub() can do this...but wondering how do I give the regular expression....

matching problem

2008 Aug 06

matching problem

I have a matching problem that I cant solve. mystring = "xxx{XX}yy{YYY}zzz{Z}" where "x","X","y","Y","z","Z" basiclly can be anything, letters, digits etc. I'm only interested in the content within each "{}". I am close but not really there yet. library(gsubfn) strapply(mystring,"\\{[^\\}]+",, perl=F)

tcltk and R

2010 Mar 15

tcltk and R

I have had some comments on sqldf regarding its dependence on tcltk such as the second last sentence on this blog post: http://translate.google.com/translate?hl=en&sl=zh-CN&u=http://www.wentrue.net/blog/%3Fp%3D453&prev=http://blogsearch.google.com/blogsearch%3Fhl%3Den%26ie%3DUTF-8%26q%3Dsqldf%26lr%3D%26sa%3DN%26start%3D10 sqldf does not directly use tcltk but it does use strapply in

readRDS and saveRDS

2011 Oct 18

readRDS and saveRDS

Hi all, Is there any chance that readRDS and saveRDS might one day become read.rds and write.rds? That would make them more consistent with the other reading and writing functions. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/

Regular expression to find value between brackets

2010 Oct 13

Regular expression to find value between brackets

Hi, this should be an easy one, but I can't figure it out. I have a vector of tests, with their units between brackets (if they have units). eg tests <- c("pH", "Assay (%)", "Impurity A(%)", "content (mg/ml)") Now I would like to hava a function where I use a test as input, and which returns the units like: f <- function (x) sub("\\)",

Good Package(s) for String and URL processing?

2010 Jul 02

Good Package(s) for String and URL processing?

Are there packages that allow improved String and URL processing? E.g. extract parts of a URLs such as sub-domains, top-level domain, protocols (e.g. https, http, ftp), file type based on endings, check if a URL is valid or not, etc... I am currently only using split and paste. Are there better and more efficient ways to handle strings e.g. finding sub-strings or to do pattern matching? What

Re ad in a file - produce independent vectors

2008 Jul 04

Re ad in a file - produce independent vectors

Is there a way of reading in a file in a way that each line becomes a vector: for example: meals.txt breakfast bacon eggs sausage lunch sandwich apple marsbar crisps dinner chicken rice custard pie I want to read in this file and end up with 3 different vectors, one called breakfast which contains "bacon", "eggs", sausage" One called

similar to: Extracting matched expressions