Displaying 20 results from an estimated 10000 matches similar to: "Good Package(s) for String and URL processing?"
2008 Apr 09
11
Number of words in a string
Hi R,
A quick question: How do we find the number of words in a string?
Example:
C="Have a nice day"
And the number of words should be 4. any built in function or?...
Thanks, Shubha
Shubha Karanth | Amba Research
Ph +91 80 3980 8031 | Mob +91 94 4886 4510
Bangalore * Colombo * London * New York * San José * Singapore * www.ambaresearch.com
This e-mail may contain
2009 May 11
3
Searching within a ch. string
Hi all, is there any function to find some words in a character-string? For
example suppose the string is : "gdfsa-sdhchc-88", now I want to find
whether this string contains "sdhch". Is there any R function to do that?
Regards,
--
View this message in context: http://www.nabble.com/Searching-within-a-ch.-string-tp23484010p23484010.html
Sent from the R help mailing list
2006 Jul 23
3
RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences
Dear all
I use R for Windows 2.3.1 on a fully updated Windows XP Home SP2 machine and I have two related regular expression problems.
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor
2010 Jul 15
2
Search and extract string function
Hi all,
I'm trying to write a function that will search and extract from a long
character string, but with a twist: I want to use the characters before and
the characters after what I want to extract as reference points. For
example, say I'm working with data entries that looks like this:
Drink=Coffee:Location=Office:Time=Morning:Market=Flat
2008 Aug 01
2
Extract Element of String with R's Regex
Hi,
I have this string, in which I want to extract some of it's element:
> x <- "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL= -963.669 -965.35"
yielding this array
[1] "211952_at" "RANBP5" "2"
In Perl we would do it this way:
__BEGIN__
my @needed =();
my $str = "Best-K Gene 11340 211952_at RANBP5 Noc= 3 - 2 LL=
-963.669
2009 Nov 03
2
R 2.10.0: Error in gsub/calloc
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this
is a Mac-specific problem.
I have a very large (158,908 possible sentences, ca. 58 MB) plain text
document d which I am
trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am
encountering the following error:
Error in base::gsub(pattern, rs, x, ...) :
Calloc could not allocate (-1398215180 of
2010 Mar 31
3
regular expression help to extract specific strings from text
Dear all,
Lets say I have the following:
> x <- c("Eve: Going to try something new today...", "Adam: Hey @Eve, how are you finding R? #rstats", "Eve: @Adam, It's awesome, so much better at statistics that #Excel ever was! @Cain & @Able disagree though :(", "Adam: @Eve I'm sure they'll sort it out :)", "blahblah")
> x
[1]
2009 Nov 03
3
re ading tokens
Greetings,
I am not familiar with processing text in R. Can someone tell me how to
read each line of words as separate elements in a list?
FE, I would like to turn:
word1 word2 word3
word2 word4
into a list of length two with three character elements in the first list
and two elements in the second. I know that this should be easy, but I am a
little confused by the text functions.
Thanks in
2009 Apr 10
3
Determine the Length of the Longest Word in a String
Hi Everyone,
I'm new to programming R and have accomplished my goal, but feel that there
is probably a more efficient way of coding this. I'd appreciate any
guidance that a more advanced programmer can provide.
My goal --
I would like to find the length of the longest word in a string containing
many words separated by spaces.
How I did it --
I was able to find the length of the
2008 Sep 14
5
string functions
Hello, trying to locate all the string commands in the base version of
R, can't seem to find an area that describes them. I am in need to do
some serious parsing of text data to create my dataset. Is there a
summary link to all the character operators? string manipulations that
would help in parsing text.
2007 Sep 25
5
extracting data using strings as delimiters
Dear List,
I have an ascii text file with data I'd like to extract. Example:
Year Built: 1873 Gross Building Area: 578 sq ft
Total Rooms: 6 Living Area: 578 sq ft
There is a lot of data I'd like to ignore in each record, so I'm
hoping there is a way to use strings as delimiters to get the data I
want (e.g. tell R to take data between "Built:" and "Gross" -
2010 Apr 23
2
Problem with parsing a dataset - help earnestly sought
Dear fellow R-help members,
I hope to seek your advice on how to parse/manage a dataset with hundreds of
columns. Two examples of these columns, 'cancer.problems', and
'neuro.problems' are depicted below. Essentially, I need to parse this into
a useful dataset, and unfortunately, I am not familiar with perl or any such
language.
data <- data.frame(id=c(1:10))
2010 Jul 08
2
strsplit("dia ma", "\\b") splits characterwise
\b is word boundary.
But, unexpectedly, strsplit("dia ma", "\\b") splits character by character.
> strsplit("dia ma", "\\b")
[[1]]
[1] "d" "i" "a" " " "m" "a"
> strsplit("dia ma", "\\b", perl=TRUE)
[[1]]
[1] "d" "i" "a" " "
2009 Dec 28
3
apply loop - using/providing a data frame to loop over
Hi,
I want to extract individual names from a single string that contains all
names. My problem is not the extraction itself, but the looping over the
extraction start and end points, which I try to realize with apply.
#Say, I have a string with names.
authors=c("Schleyer T, Spallek H, Butler BS, Subramanian S, Weiss D,
Poythress ML, Rattanathikun P, Mueller G")
#Since I only want the
2012 Jan 07
3
Getting a list of unique gene names from a list with semi-colons
Hello,
I have one column in my dataframe that has gene names of interest.
Unfortunately, due to the fact that some probes lie between two genes or
two transcripts of a gene, it looks something like this -
FAM81A LOC283050;LOC283050;LOC283050;ZMIZ1 PINK1;PINK1 MRPL12;MRPL12
C1orf114 MMS19;UBTD1
I would like to know how to get a list with all the names with no
semi-colons and removing the
2009 Aug 04
4
regex question
Hi,
I am getting stuck over an apparently simple problem in the use of regular expressions :
To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI.
I tried the following code :
?
##### A regex problem with the Perl motto
astr<-"There is more than one way to do it"
b1<-grep("\\<",
2008 Nov 02
5
R newbie: how to replace string/regular expression
Hello;
I am a R newbie and would like to know correct and efficient method for
doing string replacement.
I have a large data set, where I want to replace character "M", "b",
and "K" (currency in Million, Billion and K) to millions. That is
209.7B with (209.7 * 10e6) and 100.00K with (100.00 *1/100)
and etc..
d <- c("120.0M", "11.01m",
2010 May 05
1
extracting a matched string using regexpr
Given a text like
I want to be able to extract a matched regular expression from a piece of
text.
this apparently works, but is pretty ugly
# some html
test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
# a pattern to extract 5 digits
> pattern<-"[0-9]{5}"
#
2010 Jun 03
5
string handling
I have a data.frame as the following:
var1 var2
9G/G09 abd89C/T90
10A/T9 32C/C
90G/G A/A
. .
. .
. .
10T/C 00G/G90
What I want is to get the letters which are on the left and right of '/'.
for example, for "9G/G09", I only want "G", "G", and for "abd89C/T90", I
only want "C" and
2011 Feb 01
3
R string help
Dear R guru:
If I got a variable
aaa<- "up.6.11(16)"
how can I extract 16 out of the bracket?
I could use substr, e.g.
substr(aaa, start=1, stop=2)
[1] "up"
But it needs start and stop, what if my start or stop is not fixed, I
just want the number inside the bracket, how can I achieve this?
Many thanks
yan