thr3ads.net - R help - [R] testing whether two character vectors contain (the same) items in the same order [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2015-Aug-06 22:59 UTC

[R] testing whether two character vectors contain (the same) items in the same order

Boris:

You may be right, but it seems like esp to me based on the op's
non-description of likelihood of coming from the same noisy process. My
response would be: seek local statistical help, as your replies indicate a
good deal of statistical confusion.

Cheers,
Bert



On Thursday, August 6, 2015, Boris Steipe <boris.steipe at utoronto.ca>
wrote:
> You are looking for what is known as the "Cayley distance"
between vectors
> - an edit distance that allows only transpositions. RSeek mentions
> PerMallows (
> https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and
> Rankluster (
> https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as
> packages that support work with Cayley distances. It seems to me that
> distCayley() in Rankcluster does what you want. From the examples:
>
> x=1:5
> y=c(2,3,1,4,5)
> distCayley(x,y)
> 8
>
>
> Cheers,
> Boris
>
>
>
>
>
> On Aug 6, 2015, at 9:51 AM, Federico Calboli <federico.calboli at
helsinki.fi
> <javascript:;>> wrote:
>
> >>
> >> On 6 Aug 2015, at 15:40, Bert Gunter <bgunter.4567 at gmail.com
> <javascript:;>> wrote:
> >>
> >> Define "goodness of match" .  For exact matches, see
?"==" , all.equal,
> etc.
> >
> > Fair point.  I would define it as a number that tells me how likely it
> is that the same (noisy) process produced both lists.
> >
> > BW
> >
> > F
> >
> >
> >
> >
> >>
> >> Bert
> >>
> >> On Thursday, August 6, 2015, Federico Calboli <
> federico.calboli at helsinki.fi <javascript:;>> wrote:
> >> Hi All,
> >>
> >> let?s assume I have a vector of letters drawn only once from the
> alphabet:
> >>
> >> x = sample(letters, 15, replace = F)
> >> x
> >> [1] "z" "t" "g" "l"
"u" "d" "w" "x" "a"
"q" "k" "j" "f" "n" ?v"
> >>
> >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
> >>
> >> I would now like to test how good a match y is for x.  Obviously I
can
> transform the letters in numbers and use a rank test, but I was left
> wondering whether this is the only solution and whether there are more
> appropriate solutions that are already implemented in R (I am not going to
> reinvent the wheel if I can avoid it).
> >>
> >> BW
> >>
> >> F
> >>
> >>
> >> --
> >> Federico Calboli
> >> Ecological Genetics Research Unit
> >> Department of Biosciences
> >> PO Box 65 (Biocenter 3, Viikinkaari 1)
> >> FIN-00014 University of Helsinki
> >> Finland
> >>
> >> federico.calboli at helsinki.fi <javascript:;>
> >>
> >> ______________________________________________
> >> R-help at r-project.org <javascript:;> mailing list -- To
UNSUBSCRIBE and
> more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >> --
> >> Bert Gunter
> >>
> >> "Data is not information. Information is not knowledge. And
knowledge
> is certainly not wisdom."
> >>   -- Clifford Stoll
> >>
> >
> >
> > --
> > Federico Calboli
> > Ecological Genetics Research Unit
> > Department of Biosciences
> > PO Box 65 (Biocenter 3, Viikinkaari 1)
> > FIN-00014 University of Helsinki
> > Finland
> >
> > federico.calboli at helsinki.fi <javascript:;>
> >
> > ______________________________________________
> > R-help at r-project.org <javascript:;> mailing list -- To
UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE
and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge is
certainly not wisdom."
   -- Clifford Stoll

	[[alternative HTML version deleted]]

Federico Calboli

2015-Aug-07 07:22 UTC

head link

[R] testing whether two character vectors contain (the same) items in the same order

> On 7 Aug 2015, at 01:59, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> Boris:
> 
> You may be right, but it seems like esp to me based on the op's
non-description of likelihood of coming from the same noisy process. My response
would be: seek local statistical help, as your replies indicate a good deal of
statistical confusion.
> 
> Cheers,
> Bert
Bert,

as this is R-help and not cross-validated I am looking for a precanned function
that would test whether the order of characters in two character vectors comes
from the same (noisy) process.  I would thus expect you to say something on the
lines of:

function X uses method Y to do something like that
function W uses method Z to do something like that
?

look into those, figure out exactly what you are testing and use the most
appropiate function.

The whys and wherefores are for me to deal with, I just want to know whether
someone has built a function that does, or seems to do, what I asked for.  As I
said, this is R-help, and I seek help for R use.

I do concede that my original question might have left many wondering, but I
guess my reply to Boris would have cleared any doubts.  I am therefore puzzled
by the great deal of confusion on your part in understanding the purpose of my
question and, in general, of this list.

Best wishes

F

> 
> 
> 
> On Thursday, August 6, 2015, Boris Steipe <boris.steipe at
utoronto.ca> wrote:
> You are looking for what is known as the "Cayley distance"
between vectors - an edit distance that allows only transpositions. RSeek
mentions PerMallows
(https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and
Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf)
as packages that support work with Cayley distances. It seems to me that
distCayley() in Rankcluster does what you want. From the examples:
> 
> x=1:5
> y=c(2,3,1,4,5)
> distCayley(x,y)
> 8
> 
> 
> Cheers,
> Boris
> 
> 
> 
> 
> 
> On Aug 6, 2015, at 9:51 AM, Federico Calboli <federico.calboli at
helsinki.fi> wrote:
> 
> >>
> >> On 6 Aug 2015, at 15:40, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
> >>
> >> Define "goodness of match" .  For exact matches, see
?"==" , all.equal, etc.
> >
> > Fair point.  I would define it as a number that tells me how likely it
is that the same (noisy) process produced both lists.
> >
> > BW
> >
> > F
> >
> >
> >
> >
> >>
> >> Bert
> >>
> >> On Thursday, August 6, 2015, Federico Calboli <federico.calboli
at helsinki.fi> wrote:
> >> Hi All,
> >>
> >> let?s assume I have a vector of letters drawn only once from the
alphabet:
> >>
> >> x = sample(letters, 15, replace = F)
> >> x
> >> [1] "z" "t" "g" "l"
"u" "d" "w" "x" "a"
"q" "k" "j" "f" "n" ?v"
> >>
> >> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
> >>
> >> I would now like to test how good a match y is for x.  Obviously I
can transform the letters in numbers and use a rank test, but I was left
wondering whether this is the only solution and whether there are more
appropriate solutions that are already implemented in R (I am not going to
reinvent the wheel if I can avoid it).
> >>
> >> BW
> >>
> >> F
> >>
> >>
> >> --
> >> Federico Calboli
> >> Ecological Genetics Research Unit
> >> Department of Biosciences
> >> PO Box 65 (Biocenter 3, Viikinkaari 1)
> >> FIN-00014 University of Helsinki
> >> Finland
> >>
> >> federico.calboli at helsinki.fi
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >> --
> >> Bert Gunter
> >>
> >> "Data is not information. Information is not knowledge. And
knowledge is certainly not wisdom."
> >>   -- Clifford Stoll
> >>
> >
> >
> > --
> > Federico Calboli
> > Ecological Genetics Research Unit
> > Department of Biosciences
> > PO Box 65 (Biocenter 3, Viikinkaari 1)
> > FIN-00014 University of Helsinki
> > Finland
> >
> > federico.calboli at helsinki.fi
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> Bert Gunter
> 
> "Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
>    -- Clifford Stoll
> 

--
Federico Calboli
Ecological Genetics Research Unit
Department of Biosciences
PO Box 65 (Biocenter 3, Viikinkaari 1)
FIN-00014 University of Helsinki
Finland

federico.calboli at helsinki.fi

David Winsemius

2015-Aug-07 20:48 UTC

head link

[R] testing whether two character vectors contain (the same) items in the same order

On Aug 7, 2015, at 12:22 AM, Federico Calboli wrote:
> 
>> On 7 Aug 2015, at 01:59, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
>> 
>> Boris:
>> 
>> You may be right, but it seems like esp to me based on the op's
non-description of likelihood of coming from the same noisy process. My response
would be: seek local statistical help, as your replies indicate a good deal of
statistical confusion.
>> 
>> Cheers,
>> Bert
> 
> Bert,
> 
> as this is R-help and not cross-validated I am looking for a precanned
function that would test whether the order of characters in two character
vectors comes from the same (noisy) process.  I would thus expect you to say
something on the lines of:
> 
> function X uses method Y to do something like that
> function W uses method Z to do something like that
> ?
> 
> look into those, figure out exactly what you are testing and use the most
appropiate function.
> 
> The whys and wherefores are for me to deal with, I just want to know
whether someone has built a function that does, or seems to do, what I asked
for.  As I said, this is R-help, and I seek help for R use.
> findFn("levenshtein")found 57 matches;  retrieving 3 pages
2 3 
Downloaded 44 links in 17 packages.


 stringdist::stringdist( paste0(x, collapse=""), paste0(letters[y],
collapse="") )
[1] 30

-- 
HTH;
David.
> 
> I do concede that my original question might have left many wondering, but
I guess my reply to Boris would have cleared any doubts.  I am therefore puzzled
by the great deal of confusion on your part in understanding the purpose of my
question and, in general, of this list.
> 
> Best wishes
> 
> F
> 
> 
>> 
>> 
>> 
>> On Thursday, August 6, 2015, Boris Steipe <boris.steipe at
utoronto.ca> wrote:
>> You are looking for what is known as the "Cayley distance"
between vectors - an edit distance that allows only transpositions. RSeek
mentions PerMallows
(https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and
Rankluster (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf)
as packages that support work with Cayley distances. It seems to me that
distCayley() in Rankcluster does what you want. From the examples:
>> 
>> x=1:5
>> y=c(2,3,1,4,5)
>> distCayley(x,y)
>> 8
>> 
>> 
>> Cheers,
>> Boris
>> 
>> 
>> 
>> 
>> 
>> On Aug 6, 2015, at 9:51 AM, Federico Calboli <federico.calboli at
helsinki.fi> wrote:
>> 
>>>> 
>>>> On 6 Aug 2015, at 15:40, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>>> 
>>>> Define "goodness of match" .  For exact matches, see
?"==" , all.equal, etc.
>>> 
>>> Fair point.  I would define it as a number that tells me how likely
it is that the same (noisy) process produced both lists.
>>> 
>>> BW
>>> 
>>> F
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Bert
>>>> 
>>>> On Thursday, August 6, 2015, Federico Calboli
<federico.calboli at helsinki.fi> wrote:
>>>> Hi All,
>>>> 
>>>> let?s assume I have a vector of letters drawn only once from
the alphabet:
>>>> 
>>>> x = sample(letters, 15, replace = F)
>>>> x
>>>> [1] "z" "t" "g" "l"
"u" "d" "w" "x" "a"
"q" "k" "j" "f" "n" ?v"
>>>> 
>>>> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
>>>> 
>>>> I would now like to test how good a match y is for x. 
Obviously I can transform the letters in numbers and use a rank test, but I was
left wondering whether this is the only solution and whether there are more
appropriate solutions that are already implemented in R (I am not going to
reinvent the wheel if I can avoid it).
>>>> 
>>>> BW
>>>> 
>>>> F
>>>> 
>>>> 
>>>> --
>>>> Federico Calboli
>>>> Ecological Genetics Research Unit
>>>> Department of Biosciences
>>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>>> FIN-00014 University of Helsinki
>>>> Finland
>>>> 
>>>> federico.calboli at helsinki.fi
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>> 
>>>> 
>>>> --
>>>> Bert Gunter
>>>> 
>>>> "Data is not information. Information is not knowledge.
And knowledge is certainly not wisdom."
>>>>  -- Clifford Stoll
>>>> 
>>> 
>>> 
>>> --
>>> Federico Calboli
>>> Ecological Genetics Research Unit
>>> Department of Biosciences
>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>> FIN-00014 University of Helsinki
>>> Finland
>>> 
>>> federico.calboli at helsinki.fi
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> -- 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And
knowledge is certainly not wisdom."
>>   -- Clifford Stoll
>> 
> 
> 
> --
> Federico Calboli
> Ecological Genetics Research Unit
> Department of Biosciences
> PO Box 65 (Biocenter 3, Viikinkaari 1)
> FIN-00014 University of Helsinki
> Finland
> 
> federico.calboli at helsinki.fi
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

R help - Aug 2015 - testing whether two character vectors contain (the same) items in the same order

[R] testing whether two character vectors contain (the same) items in the same order

[R] testing whether two character vectors contain (the same) items in the same order

[R] testing whether two character vectors contain (the same) items in the same order