Mark Kimpel
2009-Aug-28 01:03 UTC
[R] problems with strsplit using a split of ' \\\ ' : a regex problem
I have a vector of gene symbols, some of which have multiple aliases. In the case of an alias, they are separated by ' \\\ '. Here is a real world example, which would represent one element of my vector: Eif4g2 /// Eif4g2-ps1 /// LOC678831 What I would like to do is input the vector into a function and output a vector with just the first alias of each element (or, if there are no aliases, just the one symbol). So I wrote a simple little function to do this: get.first.id.func <- function(vec, splitter){ vec.lst <- strsplit(vec, splitter) first.func <- function(vec1){vec1[1]} vec.out <- sapply(vec.lst, first.func) vec.out } For a trivial example, this works:> a <- c("a_b", "c_d") > get.first.id.func(a, "_")[1] "a" "c" I am running into problems, however, with the real world split of ' \\\ ' I'm not even able to construct a sample vector of my own! Here is what I get:> a <- c('a \\\ b', 'a \\\ b') > a[1] "a \\ b" "a \\ b"> a <- c('a \\\\ b', 'a \\\\ b') > a[1] "a \\\\ b" "a \\\\ b" I KNOW this is related to R's peculiarities with \ escapes, but I don't have the expertise to know how to get around it. I would be very interested to learn: 1. how to construct a vector such that a == c('a \\\ b', 'a \\\ b') 2. how to properly input my split into my function so that I get the split desired. Thanks regex experts! Mark ------------------------------------------------------------ Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, & Mobile & VoiceMail "The real problem is not whether machines think but whether men do." -- B. F. Skinner ****************************************************************** [[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Aug-28 01:15 UTC
[R] problems with strsplit using a split of ' \\\ ' : a regex problem
You need a escape before each backslash: a <- c('a \\\\\\ b', 'a \\\\\\ b') cat(a, "\n") You can write in this form: strsplit(a, " .*\\.* ") On Thu, Aug 27, 2009 at 10:03 PM, Mark Kimpel <mwkimpel@gmail.com> wrote:> I have a vector of gene symbols, some of which have multiple aliases. In > the > case of an alias, they are separated by ' \\\ '. > Here is a real world example, which would represent one element of my > vector: > Eif4g2 /// Eif4g2-ps1 /// LOC678831 > > What I would like to do is input the vector into a function and output a > vector with just the first alias of each element (or, if there are no > aliases, just the one symbol). > > So I wrote a simple little function to do this: > get.first.id.func <- function(vec, splitter){ > vec.lst <- strsplit(vec, splitter) > first.func <- function(vec1){vec1[1]} > vec.out <- sapply(vec.lst, first.func) > vec.out > } > > For a trivial example, this works: > > a <- c("a_b", "c_d") > > get.first.id.func(a, "_") > [1] "a" "c" > > I am running into problems, however, with the real world split of ' \\\ ' > I'm not even able to construct a sample vector of my own! Here is what I > get: > > a <- c('a \\\ b', 'a \\\ b') > > a > [1] "a \\ b" "a \\ b" > > a <- c('a \\\\ b', 'a \\\\ b') > > a > [1] "a \\\\ b" "a \\\\ b" > > I KNOW this is related to R's peculiarities with \ escapes, but I don't > have > the expertise to know how to get around it. > > I would be very interested to learn: > 1. how to construct a vector such that a == c('a \\\ b', 'a \\\ b') > 2. how to properly input my split into my function so that I get the split > desired. > > Thanks regex experts! > Mark > > ------------------------------------------------------------ > Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry > Indiana University School of Medicine > > 15032 Hunter Court, Westfield, IN 46074 > > (317) 490-5129 Work, & Mobile & VoiceMail > > "The real problem is not whether machines think but whether men do." -- B. > F. Skinner > ****************************************************************** > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Seemingly Similar Threads
- divide column in a dataframe based on a character
- Help with Vectors and conditional functions
- if value is in vector, perform this function
- Zoo object problem: problem when I attempt to create a zoo object of only one column
- help with regular expressions in R