Daren Tan
2008-Jul-31 11:11 UTC
[R] Identifying common prefixes from a vector of words, and delete those prefixes
For example, c("dog.is.an.animal", "cat.is.an.animal",
"rat.is.an.animal"). How can I identify the common prefix is
".is.an.animal" and delete it to give c("dog",
"cat", "rat") ?
Thanks
_________________________________________________________________
[[alternative HTML version deleted]]
Richard.Cotton at hsl.gov.uk
2008-Jul-31 13:21 UTC
[R] Identifying common prefixes from a vector of words, and delete those prefixes
> For example, c("dog.is.an.animal", "cat.is.an.animal", "rat.is.an. > animal"). How can I identify the common prefix is ".is.an.animal" > and delete it to give c("dog", "cat", "rat") ?foo <- c("dog.is.an.animal", "cat.is.an.animal", "rat.is.an.animal") sub(".is.an.animal", "", foo) Being pedantic, ".is.an.animal" is a suffix not a prefix since it comes afterwards. Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}}
John Kane
2008-Jul-31 16:48 UTC
[R] Identifying common prefixes from a vector of words, and delete those prefixes
There MUST be a better way but this will work.
x <- c("dog.is.an.animal", "cat.is.an.animal",
"rat.is.an.animal")
bb <- strsplit(x, "\\.")
myfun <- function(m) m[1]
animals <- unlist(lapply(bb, myfun))
animals
--- On Thu, 7/31/08, Daren Tan <daren76 at hotmail.com> wrote:
> From: Daren Tan <daren76 at hotmail.com>
> Subject: [R] Identifying common prefixes from a vector of words, and delete
those prefixes
> To: r-help at stat.math.ethz.ch
> Received: Thursday, July 31, 2008, 7:11 AM
> For example, c("dog.is.an.animal",
> "cat.is.an.animal", "rat.is.an.animal").
> How can I identify the common prefix is
> ".is.an.animal" and delete it to give
> c("dog", "cat", "rat") ?
>
> Thanks
> _________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
__________________________________________________________________
[[elided Yahoo spam]]
Christos Hatzis
2008-Jul-31 17:16 UTC
[R] Identifying common prefixes from a vector of words, and delete those prefixes
A more general solution:
strip.fun <- function(x, split=".") {
xx <- strsplit(x, split, fixed=TRUE)
txx <- table(unlist(xx))
nxx <- names(txx)[txx > 1]
setdiff(unlist(xx), nxx)
}
> x <- c("dog.is.an.animal", "cat.is.an.animal",
"rat.is.an.animal")
> strip.fun(x)
[1] "dog" "cat" "rat"
> y <- c("my_cat_pet", "my_dog_pet",
"my_rat_pet")
> strip.fun(y, "_")
[1] "cat" "dog" "rat"
-Christos
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of John Kane
> Sent: Thursday, July 31, 2008 12:48 PM
> To: r-help at stat.math.ethz.ch; Daren Tan
> Subject: Re: [R] Identifying common prefixes from a vector of
> words,and delete those prefixes
>
> There MUST be a better way but this will work.
>
> x <- c("dog.is.an.animal", "cat.is.an.animal",
> "rat.is.an.animal") bb <- strsplit(x, "\\.") myfun
<-
> function(m) m[1] animals <- unlist(lapply(bb, myfun)) animals
>
>
>
>
> --- On Thu, 7/31/08, Daren Tan <daren76 at hotmail.com> wrote:
>
> > From: Daren Tan <daren76 at hotmail.com>
> > Subject: [R] Identifying common prefixes from a vector of
> words, and
> > delete those prefixes
> > To: r-help at stat.math.ethz.ch
> > Received: Thursday, July 31, 2008, 7:11 AM For example,
> > c("dog.is.an.animal", "cat.is.an.animal",
"rat.is.an.animal").
> > How can I identify the common prefix is ".is.an.animal" and
> delete it
> > to give c("dog", "cat", "rat") ?
> >
> > Thanks
> > _________________________________________________________________
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> __________________________________________________________________
> [[elided Yahoo spam]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
Martin Morgan
2008-Aug-02 02:42 UTC
[R] Identifying common prefixes from a vector of words, and delete those prefixes
Daren Tan <daren76 at hotmail.com> writes:> For example, c("dog.is.an.animal", "cat.is.an.animal", > "rat.is.an.animal"). How can I identify the common prefix is > ".is.an.animal" and delete it to give c("dog", "cat", "rat") ?The 'Rlibstree' package from omegahat is quite fun for this sort of thing:> install.packages('Rlibstree', repos="http://www.omegahat.org/R")[snip]> library(Rlibstree) > rstrings <- function(string) {+ lapply(lapply(strsplit(string, ""), rev), + paste, collapse="") + }> pets <- c("dog.is.an.animal", "cat.is.an.animal", "rat.is.an.animal") > commonSfx <- rstrings(getCommonPrefix(rstrings(pets))) > commonSfx[[1]] [1] ".is.an.animal"> sub(commonSfx, "", pets)[1] "dog" "cat" "rat" Martin> Thanks > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793