Mark Kimpel
2009-Aug-28 01:03 UTC
[R] problems with strsplit using a split of ' \\\ ' : a regex problem
I have a vector of gene symbols, some of which have multiple aliases. In the
case of an alias, they are separated by ' \\\ '.
Here is a real world example, which would represent one element of my
vector:
Eif4g2 /// Eif4g2-ps1 /// LOC678831
What I would like to do is input the vector into a function and output a
vector with just the first alias of each element (or, if there are no
aliases, just the one symbol).
So I wrote a simple little function to do this:
get.first.id.func <- function(vec, splitter){
vec.lst <- strsplit(vec, splitter)
first.func <- function(vec1){vec1[1]}
vec.out <- sapply(vec.lst, first.func)
vec.out
}
For a trivial example, this works:> a <- c("a_b", "c_d")
> get.first.id.func(a, "_")
[1] "a" "c"
I am running into problems, however, with the real world split of ' \\\
'
I'm not even able to construct a sample vector of my own! Here is what I
get:> a <- c('a \\\ b', 'a \\\ b')
> a
[1] "a \\ b" "a \\ b"> a <- c('a \\\\ b', 'a \\\\ b')
> a
[1] "a \\\\ b" "a \\\\ b"
I KNOW this is related to R's peculiarities with \ escapes, but I don't
have
the expertise to know how to get around it.
I would be very interested to learn:
1. how to construct a vector such that a == c('a \\\ b', 'a \\\
b')
2. how to properly input my split into my function so that I get the split
desired.
Thanks regex experts!
Mark
------------------------------------------------------------
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
"The real problem is not whether machines think but whether men do."
-- B.
F. Skinner
******************************************************************
[[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Aug-28 01:15 UTC
[R] problems with strsplit using a split of ' \\\ ' : a regex problem
You need a escape before each backslash:
a <- c('a \\\\\\ b', 'a \\\\\\ b')
cat(a, "\n")
You can write in this form:
strsplit(a, " .*\\.* ")
On Thu, Aug 27, 2009 at 10:03 PM, Mark Kimpel <mwkimpel@gmail.com> wrote:
> I have a vector of gene symbols, some of which have multiple aliases. In
> the
> case of an alias, they are separated by ' \\\ '.
> Here is a real world example, which would represent one element of my
> vector:
> Eif4g2 /// Eif4g2-ps1 /// LOC678831
>
> What I would like to do is input the vector into a function and output a
> vector with just the first alias of each element (or, if there are no
> aliases, just the one symbol).
>
> So I wrote a simple little function to do this:
> get.first.id.func <- function(vec, splitter){
> vec.lst <- strsplit(vec, splitter)
> first.func <- function(vec1){vec1[1]}
> vec.out <- sapply(vec.lst, first.func)
> vec.out
> }
>
> For a trivial example, this works:
> > a <- c("a_b", "c_d")
> > get.first.id.func(a, "_")
> [1] "a" "c"
>
> I am running into problems, however, with the real world split of ' \\\
'
> I'm not even able to construct a sample vector of my own! Here is what
I
> get:
> > a <- c('a \\\ b', 'a \\\ b')
> > a
> [1] "a \\ b" "a \\ b"
> > a <- c('a \\\\ b', 'a \\\\ b')
> > a
> [1] "a \\\\ b" "a \\\\ b"
>
> I KNOW this is related to R's peculiarities with \ escapes, but I
don't
> have
> the expertise to know how to get around it.
>
> I would be very interested to learn:
> 1. how to construct a vector such that a == c('a \\\ b', 'a \\\
b')
> 2. how to properly input my split into my function so that I get the split
> desired.
>
> Thanks regex experts!
> Mark
>
> ------------------------------------------------------------
> Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN 46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
>
> "The real problem is not whether machines think but whether men
do." -- B.
> F. Skinner
> ******************************************************************
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O
[[alternative HTML version deleted]]
Possibly Parallel Threads
- divide column in a dataframe based on a character
- Help with Vectors and conditional functions
- if value is in vector, perform this function
- Zoo object problem: problem when I attempt to create a zoo object of only one column
- help with regular expressions in R